CN117976141A - Voice rehabilitation analysis method and system based on acoustic analysis algorithm - Google Patents

Voice rehabilitation analysis method and system based on acoustic analysis algorithm Download PDF

Info

Publication number
CN117976141A
CN117976141A CN202410381430.6A CN202410381430A CN117976141A CN 117976141 A CN117976141 A CN 117976141A CN 202410381430 A CN202410381430 A CN 202410381430A CN 117976141 A CN117976141 A CN 117976141A
Authority
CN
China
Prior art keywords
real
voice rehabilitation
time
model
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410381430.6A
Other languages
Chinese (zh)
Inventor
郭姝含
刘欢
曹栋楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
West China Hospital of Sichuan University
Original Assignee
West China Hospital of Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by West China Hospital of Sichuan University filed Critical West China Hospital of Sichuan University
Priority to CN202410381430.6A priority Critical patent/CN117976141A/en
Publication of CN117976141A publication Critical patent/CN117976141A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention belongs to the technical field of audio data identification, and discloses a voice rehabilitation analysis method and system based on an acoustic analysis algorithm. The method comprises the following steps: constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model; generating a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme; according to the real-time voice rehabilitation training audio data of the user, using an acoustic analysis model to carry out acoustic analysis; according to the real-time voice rehabilitation training audio data of the user, using a frequency spectrum feature extraction model to extract frequency spectrum features; scoring the voice rehabilitation by using a voice rehabilitation scoring model; and correcting the voice rehabilitation training scheme according to the real-time voice rehabilitation scoring result. The invention solves the problems of poor adaptability of the training scheme, lack of consistency and objectivity of rehabilitation analysis, high time cost and poor accuracy in the prior art.

Description

Voice rehabilitation analysis method and system based on acoustic analysis algorithm
Technical Field
The invention belongs to the technical field of audio data identification, and particularly relates to a voice rehabilitation analysis method and system based on an acoustic analysis algorithm.
Background
Sound is a carrier of human language communication. The sound producing system produces sound under the drive of vocal cord vibration and is transmitted out through a channel formed by the throat and the oral cavity. Sound carries different information, which is an indispensable means for communication from person to person. The sound emitted by the sound emitting system can be described by a waveform signal, called voice. When the sounding organ is in a normal state, the vibration of the vocal cords has an obvious periodic rule, and the propagation channel formed by the throat and the mouth also changes regularly, so that the generated voice can also circulate regularly. Patients with damaged vocal cords need to perform voice rehabilitation to help them recover or improve voice function.
In the prior art, voice rehabilitation training schemes are usually standardized, and can not completely adapt to unique voice characteristics and rehabilitation requirements of each patient; voice rehabilitation analysis often depends on manual evaluation and guidance of professional therapists, and analysis results may vary from person to person, and lack consistency and objectivity; the process of evaluating and formulating a training regimen by a professional therapist can be very time consuming and difficult to popularize on a large scale; the accuracy of manual voice rehabilitation analysis is poor.
Disclosure of Invention
The invention aims to solve the problems of poor adaptability, lack of consistency and objectivity of rehabilitation analysis, high time cost and poor accuracy of a training scheme in the prior art, and provides a voice rehabilitation analysis method and system based on an acoustic analysis algorithm.
The technical scheme adopted by the invention is as follows:
a voice rehabilitation analysis method based on an acoustic analysis algorithm comprises the following steps:
constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model;
According to basic inquiry information of a user, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme;
according to the real-time voice rehabilitation training audio data of the user, performing acoustic analysis by using an acoustic analysis model to obtain corresponding real-time acoustic parameters;
according to the real-time voice rehabilitation training audio data of the user, using a frequency spectrum feature extraction model to extract frequency spectrum features and obtaining corresponding real-time frequency spectrum features;
According to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics, carrying out voice rehabilitation scoring by using a voice rehabilitation scoring model, and obtaining a corresponding real-time voice rehabilitation scoring result;
And according to the real-time voice rehabilitation scoring result, using a voice rehabilitation training knowledge graph to correct the voice rehabilitation training scheme, and obtaining a corresponding correction voice rehabilitation training scheme.
Further, constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model, wherein the method comprises the following steps of:
Acquiring professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data;
collecting historical voice rehabilitation training audio data of a plurality of users, and preprocessing the historical voice rehabilitation training audio data to obtain a plurality of historical model sample data provided with real scoring labels;
According to a plurality of preset acoustic parameter indexes, constructing a corresponding acoustic analysis model by using an acoustic analysis algorithm, and constructing a corresponding spectral feature extraction model by using a spectral feature extraction algorithm;
according to the historical model sample data, performing acoustic analysis by using an acoustic analysis model to obtain a plurality of corresponding historical acoustic parameters, and extracting historical acoustic parameter characteristics corresponding to the historical acoustic parameters;
According to the historical model sample data, using a spectrum feature extraction model to extract spectrum features to obtain a plurality of corresponding historical spectrum features;
Carrying out feature fusion on the historical acoustic parameter features and the corresponding historical spectrum features of the same historical model sample data to obtain a plurality of corresponding historical fusion features;
According to a plurality of historical fusion characteristics, performing optimization training by using a deep learning algorithm, constructing an initial voice rehabilitation scoring model, and generating a plurality of corresponding prediction scoring labels;
Obtaining a corresponding model prediction accuracy according to the plurality of prediction scoring labels and the corresponding plurality of real scoring labels;
if the model prediction accuracy is greater than the preset model prediction accuracy threshold, outputting an optimal voice rehabilitation scoring model, otherwise, continuing to perform optimization training.
Further, acquiring professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data, wherein the method comprises the following steps of:
Acquiring professional knowledge big data in the voice rehabilitation field, and carrying out knowledge preprocessing on the professional knowledge big data to obtain professional knowledge after intervention processing;
Constructing a corresponding named entity extraction model and an entity relation extraction model by using a natural language processing algorithm;
Extracting a plurality of knowledge named entities in the preprocessed professional knowledge by using a named entity extraction model, and extracting a plurality of knowledge entity relations among the plurality of knowledge named entities by using an entity relation extraction model;
And constructing a corresponding voice rehabilitation training knowledge graph according to the relationships between the knowledge naming entities and the knowledge entities.
Further, the acoustic parameter indexes comprise volume indexes, tone indexes, formant indexes and time length indexes;
The acoustic analysis model consists of a volume calculation sub-model, a tone calculation sub-model, a formant calculation sub-model and a time length calculation sub-model;
The spectral feature extraction model is constructed based on logfBank algorithm.
Further, the voice rehabilitation scoring model is constructed based on a CNN algorithm and comprises an input layer, a feature fusion module, a feature transformation module, a scoring calculation module and an output layer which are sequentially connected.
Further, according to the basic inquiry information, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and obtaining a corresponding basic voice rehabilitation training scheme, wherein the method comprises the following steps:
Collecting basic inquiry information of a user, and extracting a plurality of inquiry named entities of the basic inquiry information by using a named entity extraction model;
Inputting a plurality of inquiry named entities into a voice rehabilitation training knowledge graph, and searching the named entities to obtain a plurality of corresponding matching knowledge named entities;
And generating a voice rehabilitation training scheme according to the plurality of knowledge entity relations among the plurality of matched knowledge named entities, and acquiring a corresponding basic voice rehabilitation training scheme.
Further, according to the real-time voice rehabilitation training audio data of the user, using an acoustic analysis model to perform acoustic analysis, and obtaining corresponding real-time acoustic parameters, the method comprises the following steps:
Performing first preprocessing on the real-time voice rehabilitation training audio data of the user to obtain real-time voice rehabilitation training audio data after the first preprocessing; the first preprocessing comprises denoising, downsampling and filtering which are sequentially carried out;
Acquiring corresponding real-time volume parameters, real-time tone parameters, real-time formant parameters and real-time duration parameters by using a sound volume operator model, a tone calculation sub-model, a formant calculation sub-model and a time duration calculation sub-model according to the first preprocessed real-time voice rehabilitation training audio data;
and integrating the real-time volume parameter, the real-time tone parameter, the real-time formant parameter and the real-time duration parameter to obtain the corresponding real-time acoustic parameter.
Further, according to the real-time voice rehabilitation training audio data of the user, using a spectrum feature extraction model to extract spectrum features and obtain corresponding real-time spectrum features, the method comprises the following steps:
Performing second preprocessing on the real-time voice rehabilitation training audio data of the user to obtain a plurality of second preprocessed real-time voice rehabilitation training audio data frames; the second preprocessing comprises denoising, pre-emphasis and framing which are sequentially carried out;
Performing STFT processing on the second preprocessed real-time voice rehabilitation training audio data frames to obtain a corresponding spectrogram;
Acquiring a Mel frequency cepstrum coefficient corresponding to the spectrogram, and acquiring a first-order difference and a second-order difference corresponding to the Mel frequency cepstrum coefficient;
And integrating the Mel frequency cepstrum coefficient, the first-order difference and the second-order difference to obtain corresponding real-time frequency spectrum characteristics.
Further, according to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time spectrum characteristics, using a voice rehabilitation scoring model to score voice rehabilitation, and obtaining a corresponding real-time voice rehabilitation scoring result, comprising the following steps:
extracting real-time acoustic parameter characteristics of the real-time acoustic parameters;
inputting the real-time acoustic parameter characteristics and the corresponding real-time frequency spectrum characteristics into an input layer of a voice rehabilitation scoring model;
a feature fusion module is used for carrying out feature fusion on the real-time acoustic parameter features and the real-time spectrum features to obtain fusion features;
Performing feature transformation on the fusion features by using a feature transformation module to obtain corresponding output features;
according to the output characteristics, a score calculating module is used for calculating a definition score and an accuracy score to obtain a corresponding definition score and accuracy score;
and using an output layer to take the definition score and the accuracy score as corresponding real-time voice rehabilitation score results.
The system comprises a cloud computing center and a plurality of user terminals, wherein the cloud computing center is respectively in communication connection with the plurality of user terminals, and comprises a model building unit, a training scheme generating unit, an acoustic analysis unit, a frequency spectrum feature extracting unit, a voice rehabilitation scoring unit and a training scheme correcting unit.
The beneficial effects of the invention are as follows:
The invention discloses a voice rehabilitation analysis method and a voice rehabilitation analysis system based on an acoustic analysis algorithm, which are used for carrying out voice rehabilitation analysis according to real-time audio of voice rehabilitation training by constructing an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model, so that the consistency and objectivity of voice rehabilitation analysis are improved, and a standardized system of voice rehabilitation analysis is realized; the manual evaluation and the guidance of a professional therapist are avoided, and the labor cost and the time cost of voice rehabilitation analysis are reduced; the voice rehabilitation analysis method has the advantages that automatic and intelligent analysis is carried out based on the audio data characteristics, accuracy and practicality of voice rehabilitation analysis are improved, and large-scale popularization can be carried out; by analyzing the voice rehabilitation scoring result of the voice rehabilitation training, the voice rehabilitation training scheme is subjected to real-time and personalized self-adaptive correction, so that the voice rehabilitation training scheme can be more in line with the actual situation of a user, and the effect of the voice rehabilitation training and the experience of the user are improved.
Other advantageous effects of the present invention will be further described in the detailed description.
Drawings
Fig. 1 is a flow chart of a voice rehabilitation analysis method based on an acoustic analysis algorithm in the invention.
Fig. 2 is a block diagram of a voice rehabilitation analysis system based on an acoustic analysis algorithm in the present invention.
Detailed Description
The invention is further illustrated by the following description of specific embodiments in conjunction with the accompanying drawings.
Example 1:
As shown in fig. 1, the embodiment provides a voice rehabilitation analysis method based on an acoustic analysis algorithm, which includes the following steps:
S1: the method comprises the following steps of constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model:
S1-1: the method comprises the steps of collecting professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data, wherein the method comprises the following steps of:
S1-1-1: acquiring professional knowledge big data in the voice rehabilitation field, and carrying out knowledge preprocessing on the professional knowledge big data to obtain professional knowledge after intervention processing;
The knowledge preprocessing comprises data deduplication processing and data screening processing which are sequentially carried out on the large professional knowledge data, wherein the data deduplication processing deletes repeated data in the large professional knowledge data, so that the volume of data is reduced, the data screening processing eliminates the knowledge data which does not accord with the knowledge graph construction, such as error data, missing data and the like, and the standardization degree of the data is improved;
S1-1-2: constructing a corresponding named entity extraction model and an entity relation extraction model by using a natural language processing algorithm;
The named entity extraction model is constructed based on a BERT-BiLSTM-CRF algorithm, a pre-training language sub-model (BERT, bidirectional Encoder Representation from Transformers) is used for carrying out vector characterization on the pre-processed expertise, a two-way long-short-Term Memory network (BiLSTM, bi-directional Long Short-Term Memory) is used for extracting semantic features of the pre-processed expertise vector, and a linear chain conditional random field module (CRF, conditional Random Field) is used for marking a plurality of knowledge named entities in the pre-processed expertise according to the semantic features;
The entity relation extraction model is constructed based on BiGRU-Attention algorithm, the bidirectional cyclic neural network (BiGRU, bidirectional Recurrent Neural Network) extracts the vector of the professional knowledge after pretreatment, the Attention mechanism distributes Attention weight for each channel of the bidirectional cyclic neural network, the influence caused by the knowledge of the relation labeling error is reduced, and a plurality of corresponding entity relations are output based on BiGRU network according to the Attention weight, the professional knowledge vector after pretreatment and a plurality of named entities output by the named entity extraction model;
s1-1-3: extracting a plurality of knowledge named entities in the preprocessed professional knowledge by using a named entity extraction model, and extracting a plurality of knowledge entity relations among the plurality of knowledge named entities by using an entity relation extraction model;
S1-1-4: constructing a corresponding voice rehabilitation training knowledge graph according to the relationships between the knowledge naming entities and the knowledge entities;
s1-2: collecting historical voice rehabilitation training audio data of a plurality of users, and preprocessing the historical voice rehabilitation training audio data to obtain a plurality of historical model sample data provided with real scoring labels;
S1-3: according to a plurality of preset acoustic parameter indexes, constructing a corresponding acoustic analysis model by using an acoustic analysis algorithm, and constructing a corresponding spectral feature extraction model by using a spectral feature extraction algorithm;
The acoustic parameter indexes comprise volume indexes, tone indexes, formant indexes and time length indexes;
The acoustic analysis model consists of a volume calculation sub-model, a tone calculation sub-model, a formant calculation sub-model and a time length calculation sub-model;
The formula of the volume calculation sub-model is:
in the method, in the process of the invention, Calculating a function for the volume; /(I)Indicating the quantity for the sampling point; /(I)The total number of sampling points; /(I)For audio data/>At the sampling point/>Is a magnitude of (a);
the formula of the pitch calculation sub-model is:
in the method, in the process of the invention, For the tone calculation function, an automatic tone detection algorithm is generally adopted; /(I)For audio dataAt the sampling point/>Is provided; /(I)Is a pitch compensation value;
The formant calculation sub-model has the formula:
in the method, in the process of the invention, Is predictive data; /(I)Is a prediction coefficient; /(I)Is audio data; /(I)Indicating the quantity for the sampling point; /(I)For the indication quantity of sampling points, the current sampling point/>, is representedPrevious sample points/>;/>Is the prediction order; /(I)Predicted data after fast fourier transform; /(I)Is a fast fourier transform function; /(I)Is the formant frequency; /(I)Reserving a function for the maximum value; /(I)Is the bandwidth of the formants; /(I)Calculating a function for the bandwidth;
The formula of the duration calculation sub-model is:
in the method, in the process of the invention, Calculating a function for the duration; /(I)For sampling points/>Audio measurement duration of (a); /(I)For sampling points/>Is a silence interval duration of (2); /(I)The time length compensation value is obtained; /(I)Indicating the quantity for the sampling point; /(I)The total number of sampling points;
The spectral feature extraction model is constructed based on logfBank algorithm; the logfBank algorithm is similar to a Mel frequency cepstrum (Mel-Frequency Cepstrum, MFCC) algorithm, is based on fBank feature extraction results, and is subjected to subsequent processing, the calculated amount of the logfBank algorithm is smaller than that of the MFCC algorithm, the correlation of features is higher, and the correlation between features can be better utilized by a deep learning model, so that the recognition accuracy is improved, and the calculated amount is reduced;
s1-4: according to the historical model sample data, performing acoustic analysis by using an acoustic analysis model to obtain a plurality of corresponding historical acoustic parameters, and extracting historical acoustic parameter characteristics corresponding to the historical acoustic parameters;
S1-5: according to the historical model sample data, using a spectrum feature extraction model to extract spectrum features to obtain a plurality of corresponding historical spectrum features;
s1-6: carrying out feature fusion on the historical acoustic parameter features and the corresponding historical spectrum features of the same historical model sample data to obtain a plurality of corresponding historical fusion features;
S1-7: according to a plurality of historical fusion characteristics, performing optimization training by using a deep learning algorithm, constructing an initial voice rehabilitation scoring model, and generating a plurality of corresponding prediction scoring labels;
The voice rehabilitation scoring model is constructed based on a convolutional neural network (Convolutional Neural Networks, CNN) algorithm and comprises an input layer, a feature fusion module, a feature transformation module, a scoring calculation module and an output layer which are sequentially connected; the feature transformation module performs a series of pooling, convolution and feature processing on the input features;
S1-8: obtaining a corresponding model prediction accuracy according to the plurality of prediction scoring labels and the corresponding plurality of real scoring labels;
s1-9: if the model prediction accuracy is greater than a preset model prediction accuracy threshold, outputting an optimal voice rehabilitation scoring model, otherwise, continuing to perform optimization training;
S2: according to basic inquiry information of a user, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme, wherein the voice rehabilitation training scheme comprises the following steps of:
S2-1: collecting basic inquiry information of a user, and extracting a plurality of inquiry named entities of the basic inquiry information by using a named entity extraction model;
S2-2: inputting a plurality of inquiry named entities into a voice rehabilitation training knowledge graph, and searching the named entities to obtain a plurality of corresponding matching knowledge named entities;
s2-3: according to a plurality of knowledge entity relations among a plurality of matched knowledge named entities, voice rehabilitation training scheme generation is carried out, and a corresponding basic voice rehabilitation training scheme is obtained;
S3: according to the real-time voice rehabilitation training audio data of the user, using an acoustic analysis model to carry out acoustic analysis and obtaining corresponding real-time acoustic parameters, comprising the following steps:
s3-1: performing first preprocessing on the real-time voice rehabilitation training audio data of the user to obtain real-time voice rehabilitation training audio data after the first preprocessing; the first preprocessing comprises denoising, downsampling and filtering which are sequentially carried out;
S3-2: acquiring corresponding real-time volume parameters, real-time tone parameters, real-time formant parameters and real-time duration parameters by using a sound volume operator model, a tone calculation sub-model, a formant calculation sub-model and a time duration calculation sub-model according to the first preprocessed real-time voice rehabilitation training audio data;
s3-3: integrating the real-time volume parameter, the real-time tone parameter, the real-time formant parameter and the real-time duration parameter to obtain a corresponding real-time acoustic parameter;
S4: according to the real-time voice rehabilitation training audio data of the user, a spectrum feature extraction model is used for extracting spectrum features, and corresponding real-time spectrum features are obtained, and the method comprises the following steps:
s4-1: performing second preprocessing on the real-time voice rehabilitation training audio data of the user to obtain a plurality of second preprocessed real-time voice rehabilitation training audio data frames; the second preprocessing comprises denoising, pre-emphasis and framing which are sequentially carried out;
the pre-emphasis formula is:
in the method, in the process of the invention, Real-time voice rehabilitation training audio data after pre-emphasis; /(I)Audio data for real-time voice rehabilitation training; /(I)Is a time indication; /(I)Is a pre-emphasis frequency, typically set at 80Hz;
S4-2: performing Short-time Fourier transform (STFT) processing on the plurality of second preprocessed real-time voice rehabilitation training audio data frames to obtain a corresponding spectrogram;
the formulation of the spectrogram is:
in the method, in the process of the invention, Is a spectrogram; /(I)For time/>After pre-emphasis, real-time voice rehabilitation training audio data; /(I)Is a window function; /(I)Are time indication quantities; /(I)Is a frequency indication;
s4-3: acquiring a Mel frequency cepstrum coefficient corresponding to the spectrogram, and acquiring a first-order difference and a second-order difference corresponding to the Mel frequency cepstrum coefficient;
The obtaining formula of the mel frequency cepstrum coefficient is as follows:
in the method, in the process of the invention, Is sound spectrum/>Corresponding mel frequency cepstrum coefficients; /(I)Is a discrete cosine transform function; /(I)As a logarithmic function; /(I)Is sound spectrum/>A corresponding mold; Is sound spectrum/> A corresponding phase; /(I)A plurality of;
The formula of the first-order and second-order difference is:
in the method, in the process of the invention, Is sound spectrum/>Corresponding mel frequency cepstrum coefficients; /(I)Is a first order difference; is a second order difference; /(I) Is a differential indication quantity;
S4-4: integrating the Mel frequency cepstrum coefficient, the first-order difference and the second-order difference to obtain corresponding real-time frequency spectrum characteristics
S5: according to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics, using a voice rehabilitation scoring model to score voice rehabilitation and obtain a corresponding real-time voice rehabilitation scoring result, comprising the following steps:
s5-1: extracting real-time acoustic parameter characteristics of real-time acoustic parameters Wherein/>Extracting a function for the characteristics;
S5-2: inputting the real-time acoustic parameter characteristics and the corresponding real-time frequency spectrum characteristics into an input layer of a voice rehabilitation scoring model;
S5-3: using a feature fusion module to perform feature fusion on the real-time acoustic parameter features and the real-time spectrum features to obtain fusion features Wherein/>Is a feature fusion function;
s5-4: feature transformation is carried out on the fusion features by using a feature transformation module to obtain corresponding output features Wherein/>Is a characteristic transformation function of the CNN network;
s5-5: according to the output characteristics, a score calculating module is used for calculating a definition score and an accuracy score to obtain a corresponding definition score and accuracy score;
in the method, in the process of the invention, Is an output feature; /(I)Predictive values of sharpness and accuracy, respectively, corresponding to predictive scoring tags; /(I)Scoring for clarity; /(I)Scoring the accuracy; /(I)As a function of the classification of the object,The function converts the activation of CNN network output into probability distribution;
S5-6: using an output layer, and taking the definition score and the accuracy score as corresponding real-time voice rehabilitation score results;
s6: and according to the real-time voice rehabilitation scoring result, using a voice rehabilitation training knowledge graph to correct the voice rehabilitation training scheme, and obtaining a corresponding correction voice rehabilitation training scheme.
Example 2:
As shown in fig. 2, the present embodiment provides a voice rehabilitation analysis system based on an acoustic analysis algorithm, for implementing a voice rehabilitation analysis method, where the system includes a cloud computing center and a plurality of user terminals, the cloud computing center is respectively connected with the plurality of user terminals in a communication manner, and the cloud computing center includes a model building unit, a training scheme generating unit, an acoustic analysis unit, a frequency spectrum feature extracting unit, a voice rehabilitation scoring unit and a training scheme correcting unit;
The user terminal is used for collecting basic inquiry information and real-time voice rehabilitation training audio data of a user and sending the basic inquiry information and the real-time voice rehabilitation training audio data to the cloud computing center; receiving a basic voice rehabilitation training scheme and a correction voice rehabilitation training scheme which are sent by a cloud computing center;
the model building unit is used for building a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model;
the training scheme generating unit is used for generating a voice rehabilitation training scheme by using a voice rehabilitation training knowledge graph according to basic inquiry information of a user, acquiring a corresponding basic voice rehabilitation training scheme and transmitting the basic voice rehabilitation training scheme to a corresponding user terminal;
the acoustic analysis unit is used for carrying out acoustic analysis by using an acoustic analysis model according to the real-time voice rehabilitation training audio data of the user to obtain corresponding real-time acoustic parameters;
The spectrum feature extraction unit is used for extracting spectrum features by using a spectrum feature extraction model according to the real-time voice rehabilitation training audio data of the user to obtain corresponding real-time spectrum features;
The voice rehabilitation scoring unit is used for scoring voice rehabilitation according to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics by using a voice rehabilitation scoring model, and obtaining a corresponding real-time voice rehabilitation scoring result;
and the training scheme correction unit is used for correcting the voice rehabilitation training scheme by using the voice rehabilitation training knowledge graph according to the real-time voice rehabilitation scoring result to obtain a corresponding corrected voice rehabilitation training scheme.
The invention discloses a voice rehabilitation analysis method and a voice rehabilitation analysis system based on an acoustic analysis algorithm, which are used for carrying out voice rehabilitation analysis according to real-time audio of voice rehabilitation training by constructing an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model, so that the consistency and objectivity of voice rehabilitation analysis are improved, and a standardized system of voice rehabilitation analysis is realized; the manual evaluation and the guidance of a professional therapist are avoided, and the labor cost and the time cost of voice rehabilitation analysis are reduced; the voice rehabilitation analysis method has the advantages that automatic and intelligent analysis is carried out based on the audio data characteristics, accuracy and practicality of voice rehabilitation analysis are improved, and large-scale popularization can be carried out; by analyzing the voice rehabilitation scoring result of the voice rehabilitation training, the voice rehabilitation training scheme is subjected to real-time and personalized self-adaptive correction, so that the voice rehabilitation training scheme can be more in line with the actual situation of a user, and the effect of the voice rehabilitation training and the experience of the user are improved.
The invention is not limited to the alternative embodiments described above, but any person may derive other various forms of products in the light of the present invention. The above detailed description should not be construed as limiting the scope of the invention, which is defined in the claims and the description may be used to interpret the claims.

Claims (10)

1. A voice rehabilitation analysis method based on an acoustic analysis algorithm is characterized in that: the method comprises the following steps:
constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model;
According to basic inquiry information of a user, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme;
according to the real-time voice rehabilitation training audio data of the user, performing acoustic analysis by using an acoustic analysis model to obtain corresponding real-time acoustic parameters;
according to the real-time voice rehabilitation training audio data of the user, using a frequency spectrum feature extraction model to extract frequency spectrum features and obtaining corresponding real-time frequency spectrum features;
According to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics, carrying out voice rehabilitation scoring by using a voice rehabilitation scoring model, and obtaining a corresponding real-time voice rehabilitation scoring result;
And according to the real-time voice rehabilitation scoring result, using a voice rehabilitation training knowledge graph to correct the voice rehabilitation training scheme, and obtaining a corresponding correction voice rehabilitation training scheme.
2. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 1, wherein: the method comprises the following steps of constructing a voice rehabilitation training knowledge graph, an acoustic analysis model, a frequency spectrum feature extraction model and a voice rehabilitation scoring model:
Acquiring professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data;
collecting historical voice rehabilitation training audio data of a plurality of users, and preprocessing the historical voice rehabilitation training audio data to obtain a plurality of historical model sample data provided with real scoring labels;
According to a plurality of preset acoustic parameter indexes, constructing a corresponding acoustic analysis model by using an acoustic analysis algorithm, and constructing a corresponding spectral feature extraction model by using a spectral feature extraction algorithm;
according to the historical model sample data, performing acoustic analysis by using an acoustic analysis model to obtain a plurality of corresponding historical acoustic parameters, and extracting historical acoustic parameter characteristics corresponding to the historical acoustic parameters;
According to the historical model sample data, using a spectrum feature extraction model to extract spectrum features to obtain a plurality of corresponding historical spectrum features;
Carrying out feature fusion on the historical acoustic parameter features and the corresponding historical spectrum features of the same historical model sample data to obtain a plurality of corresponding historical fusion features;
According to a plurality of historical fusion characteristics, performing optimization training by using a deep learning algorithm, constructing an initial voice rehabilitation scoring model, and generating a plurality of corresponding prediction scoring labels;
Obtaining a corresponding model prediction accuracy according to the plurality of prediction scoring labels and the corresponding plurality of real scoring labels;
if the model prediction accuracy is greater than the preset model prediction accuracy threshold, outputting an optimal voice rehabilitation scoring model, otherwise, continuing to perform optimization training.
3. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 2, wherein: the method comprises the steps of collecting professional knowledge big data in the voice rehabilitation field, and constructing a corresponding voice rehabilitation training knowledge graph by using a natural language processing algorithm according to the professional knowledge big data, wherein the method comprises the following steps of:
Acquiring professional knowledge big data in the voice rehabilitation field, and carrying out knowledge preprocessing on the professional knowledge big data to obtain professional knowledge after intervention processing;
Constructing a corresponding named entity extraction model and an entity relation extraction model by using a natural language processing algorithm;
Extracting a plurality of knowledge named entities in the preprocessed professional knowledge by using a named entity extraction model, and extracting a plurality of knowledge entity relations among the plurality of knowledge named entities by using an entity relation extraction model;
And constructing a corresponding voice rehabilitation training knowledge graph according to the relationships between the knowledge naming entities and the knowledge entities.
4. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 2, wherein: the acoustic parameter indexes comprise volume indexes, tone indexes, formant indexes and time length indexes;
The acoustic analysis model consists of a volume calculation sub-model, a tone calculation sub-model, a formant calculation sub-model and a time length calculation sub-model;
The spectrum characteristic extraction model is constructed based on logfBank algorithm.
5. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 4, wherein: the voice rehabilitation scoring model is constructed based on a CNN algorithm and comprises an input layer, a feature fusion module, a feature transformation module, a scoring calculation module and an output layer which are sequentially connected.
6. A method for voice rehabilitation analysis based on an acoustic analysis algorithm according to claim 3, wherein: according to basic inquiry information, using a voice rehabilitation training knowledge graph to generate a voice rehabilitation training scheme, and acquiring a corresponding basic voice rehabilitation training scheme, wherein the voice rehabilitation training scheme comprises the following steps of:
Collecting basic inquiry information of a user, and extracting a plurality of inquiry named entities of the basic inquiry information by using a named entity extraction model;
Inputting a plurality of inquiry named entities into a voice rehabilitation training knowledge graph, and searching the named entities to obtain a plurality of corresponding matching knowledge named entities;
And generating a voice rehabilitation training scheme according to the plurality of knowledge entity relations among the plurality of matched knowledge named entities, and acquiring a corresponding basic voice rehabilitation training scheme.
7. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 4, wherein: according to the real-time voice rehabilitation training audio data of the user, using an acoustic analysis model to carry out acoustic analysis and obtaining corresponding real-time acoustic parameters, comprising the following steps:
Performing first preprocessing on the real-time voice rehabilitation training audio data of the user to obtain real-time voice rehabilitation training audio data after the first preprocessing; the first preprocessing comprises denoising, downsampling and filtering which are sequentially carried out;
Acquiring corresponding real-time volume parameters, real-time tone parameters, real-time formant parameters and real-time duration parameters by using a sound volume operator model, a tone calculation sub-model, a formant calculation sub-model and a time duration calculation sub-model according to the first preprocessed real-time voice rehabilitation training audio data;
and integrating the real-time volume parameter, the real-time tone parameter, the real-time formant parameter and the real-time duration parameter to obtain the corresponding real-time acoustic parameter.
8. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 4, wherein: according to the real-time voice rehabilitation training audio data of the user, a spectrum feature extraction model is used for extracting spectrum features, and corresponding real-time spectrum features are obtained, and the method comprises the following steps:
Performing second preprocessing on the real-time voice rehabilitation training audio data of the user to obtain a plurality of second preprocessed real-time voice rehabilitation training audio data frames; the second preprocessing comprises denoising, pre-emphasis and framing which are sequentially carried out;
Performing STFT processing on the second preprocessed real-time voice rehabilitation training audio data frames to obtain a corresponding spectrogram;
Acquiring a Mel frequency cepstrum coefficient corresponding to the spectrogram, and acquiring a first-order difference and a second-order difference corresponding to the Mel frequency cepstrum coefficient;
And integrating the Mel frequency cepstrum coefficient, the first-order difference and the second-order difference to obtain corresponding real-time frequency spectrum characteristics.
9. The voice rehabilitation analysis method based on the acoustic analysis algorithm according to claim 5, wherein the voice rehabilitation analysis method comprises the following steps: according to the real-time acoustic parameter characteristics of the real-time acoustic parameters and the corresponding real-time frequency spectrum characteristics, using a voice rehabilitation scoring model to score voice rehabilitation and obtain a corresponding real-time voice rehabilitation scoring result, comprising the following steps:
extracting real-time acoustic parameter characteristics of the real-time acoustic parameters;
inputting the real-time acoustic parameter characteristics and the corresponding real-time frequency spectrum characteristics into an input layer of a voice rehabilitation scoring model;
a feature fusion module is used for carrying out feature fusion on the real-time acoustic parameter features and the real-time spectrum features to obtain fusion features;
Performing feature transformation on the fusion features by using a feature transformation module to obtain corresponding output features;
according to the output characteristics, a score calculating module is used for calculating a definition score and an accuracy score to obtain a corresponding definition score and accuracy score;
and using an output layer to take the definition score and the accuracy score as corresponding real-time voice rehabilitation score results.
10. A voice rehabilitation analysis system based on an acoustic analysis algorithm, for implementing the voice rehabilitation analysis method according to any one of claims 1 to 9, characterized in that: the system comprises a cloud computing center and a plurality of user terminals, wherein the cloud computing center is respectively in communication connection with the plurality of user terminals, and comprises a model building unit, a training scheme generating unit, an acoustic analysis unit, a frequency spectrum characteristic extracting unit, a voice rehabilitation scoring unit and a training scheme correcting unit.
CN202410381430.6A 2024-04-01 2024-04-01 Voice rehabilitation analysis method and system based on acoustic analysis algorithm Pending CN117976141A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410381430.6A CN117976141A (en) 2024-04-01 2024-04-01 Voice rehabilitation analysis method and system based on acoustic analysis algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410381430.6A CN117976141A (en) 2024-04-01 2024-04-01 Voice rehabilitation analysis method and system based on acoustic analysis algorithm

Publications (1)

Publication Number Publication Date
CN117976141A true CN117976141A (en) 2024-05-03

Family

ID=90859869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410381430.6A Pending CN117976141A (en) 2024-04-01 2024-04-01 Voice rehabilitation analysis method and system based on acoustic analysis algorithm

Country Status (1)

Country Link
CN (1) CN117976141A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103405217A (en) * 2013-07-08 2013-11-27 上海昭鸣投资管理有限责任公司 System and method for multi-dimensional measurement of dysarthria based on real-time articulation modeling technology
TWI622980B (en) * 2017-09-05 2018-05-01 醫療財團法人徐元智先生醫藥基金會亞東紀念醫院 Disease detecting and classifying system of voice
CN109727608A (en) * 2017-10-25 2019-05-07 香港中文大学深圳研究院 A kind of ill voice appraisal procedure based on Chinese speech
CN114373452A (en) * 2020-10-15 2022-04-19 华东师范大学 Voice abnormity identification and evaluation method and system based on deep learning
CN116312469A (en) * 2023-05-17 2023-06-23 天津大学 Pathological voice restoration method based on voice conversion
CN116831533A (en) * 2023-08-03 2023-10-03 上海慧敏医疗器械有限公司 Intelligent voice and sound quality disorder rehabilitation system based on ICF-RFT framework
CN117198340A (en) * 2023-09-20 2023-12-08 南京优道言语康复研究院 Dysarthria correction effect analysis method based on optimized acoustic parameters
CN117409819A (en) * 2023-12-15 2024-01-16 北京大学第三医院(北京大学第三临床医学院) Human voice detection and analysis method based on artificial intelligence

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103405217A (en) * 2013-07-08 2013-11-27 上海昭鸣投资管理有限责任公司 System and method for multi-dimensional measurement of dysarthria based on real-time articulation modeling technology
TWI622980B (en) * 2017-09-05 2018-05-01 醫療財團法人徐元智先生醫藥基金會亞東紀念醫院 Disease detecting and classifying system of voice
CN109727608A (en) * 2017-10-25 2019-05-07 香港中文大学深圳研究院 A kind of ill voice appraisal procedure based on Chinese speech
CN114373452A (en) * 2020-10-15 2022-04-19 华东师范大学 Voice abnormity identification and evaluation method and system based on deep learning
CN116312469A (en) * 2023-05-17 2023-06-23 天津大学 Pathological voice restoration method based on voice conversion
CN116831533A (en) * 2023-08-03 2023-10-03 上海慧敏医疗器械有限公司 Intelligent voice and sound quality disorder rehabilitation system based on ICF-RFT framework
CN117198340A (en) * 2023-09-20 2023-12-08 南京优道言语康复研究院 Dysarthria correction effect analysis method based on optimized acoustic parameters
CN117409819A (en) * 2023-12-15 2024-01-16 北京大学第三医院(北京大学第三临床医学院) Human voice detection and analysis method based on artificial intelligence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘智勇主编: "《卫生信息学教程》", vol. 01, 31 December 2021, 华中科技大学出版, pages: 39 *
王昌辉, 谢湘, 赵胜辉: "基于语音识别的汉语发音教学系统", 计算机应用研究, no. 11, 28 November 2005 (2005-11-28), pages 11 - 13 *

Similar Documents

Publication Publication Date Title
CN112489635B (en) Multi-mode emotion recognition method based on attention enhancement mechanism
CN110992987B (en) Parallel feature extraction system and method for general specific voice in voice signal
CN103531205B (en) The asymmetrical voice conversion method mapped based on deep neural network feature
CN101599271B (en) Recognition method of digital music emotion
CN102496363B (en) Correction method for Chinese speech synthesis tone
CN110767210A (en) Method and device for generating personalized voice
CN113297383B (en) Speech emotion classification method based on knowledge distillation
CN113450761B (en) Parallel voice synthesis method and device based on variation self-encoder
CN113436612B (en) Intention recognition method, device, equipment and storage medium based on voice data
CN109065073A (en) Speech-emotion recognition method based on depth S VM network model
CN114783418B (en) End-to-end voice recognition method and system based on sparse self-attention mechanism
CN112002348A (en) Method and system for recognizing speech anger emotion of patient
CN112735404A (en) Ironic detection method, system, terminal device and storage medium
CN113571095B (en) Speech emotion recognition method and system based on nested deep neural network
CN114842878A (en) Speech emotion recognition method based on neural network
CN114495969A (en) Voice recognition method integrating voice enhancement
Liu et al. AI recognition method of pronunciation errors in oral English speech with the help of big data for personalized learning
CN113012717A (en) Emotional feedback information recommendation system and method based on voice recognition
CN114626424B (en) Data enhancement-based silent speech recognition method and device
CN117976141A (en) Voice rehabilitation analysis method and system based on acoustic analysis algorithm
CN117909486B (en) Multi-mode question-answering method and system based on emotion recognition and large language model
CN117041430B (en) Method and device for improving outbound quality and robustness of intelligent coordinated outbound system
CN117235435B (en) Method and device for determining audio signal loss function
CN115312029B (en) Voice translation method and system based on voice depth characterization mapping
Li et al. Research on isolated word recognition algorithm based on machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination