CN115641839A - Intelligent voice recognition method and system - Google Patents

Intelligent voice recognition method and system Download PDF

Info

Publication number
CN115641839A
CN115641839A CN202211093905.9A CN202211093905A CN115641839A CN 115641839 A CN115641839 A CN 115641839A CN 202211093905 A CN202211093905 A CN 202211093905A CN 115641839 A CN115641839 A CN 115641839A
Authority
CN
China
Prior art keywords
voice
teacher
classroom
content
students
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211093905.9A
Other languages
Chinese (zh)
Inventor
魏亚峰
魏子晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuzhou Taiyu Network Technology Co ltd
Original Assignee
Xuzhou Taiyu Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuzhou Taiyu Network Technology Co ltd filed Critical Xuzhou Taiyu Network Technology Co ltd
Priority to CN202211093905.9A priority Critical patent/CN115641839A/en
Publication of CN115641839A publication Critical patent/CN115641839A/en
Withdrawn legal-status Critical Current

Links

Images

Abstract

The invention discloses an intelligent voice recognition method and system, wherein the method respectively obtains voice contents of a teacher and students by recognizing and classifying voices in a classroom, judges the correlation degree of the content of the teacher on class and courses, calculates the attention concentration degree of the students based on the voice contents of the students, calculates the understanding degree of the courses of the students based on the voice related to the classroom, and calculates the correlation between the content of the teacher on class and the attention concentration degree of the students; and the teacher is prompted according to the correlation and the class understanding degree of the students, so that the quantitative evaluation of the teaching quality of the teacher in the class and the attention concentration degree of the students is realized, the real-time feedback of the teaching of the teacher is realized, and the teaching quality of the class is improved.

Description

Intelligent voice recognition method and system
Technical Field
The invention relates to the field of voice recognition, in particular to an intelligent voice recognition method and system.
Background
At present, artificial intelligence has been incorporated into the aspects of our lives. Speech recognition is also used extensively in life. The rapid development and application of artificial intelligence improve the application effect of the voice recognition technology, and the requirements of people in various aspects are met by extracting character data from voice signals.
In the traditional teaching quality assessment, the quality assessment of the class of a teacher is usually obtained in an assessment or questionnaire mode, but the class quality of the teacher and the reaction condition of students cannot be supervised in real time in class. Moreover, such an evaluation result has subjective influence, and objective evaluation cannot be achieved. Meanwhile, the performance of students in class and the influence factors can not be quantitatively analyzed.
Disclosure of Invention
Technical problem to be solved
In order to solve the technical problems, the invention provides an intelligent voice recognition method and an intelligent voice recognition system, wherein the method respectively obtains voice contents of a teacher and students by recognizing and classifying voices in a classroom, judges the correlation degree of the content of the teacher on the class and courses, calculates the attention concentration degree of the students based on the voice contents of the students, calculates the understanding degree of the courses of the students based on the voices related to the classroom, and calculates the correlation between the content of the teacher on the class and the attention concentration degree of the students; and prompting the teacher according to the correlation and the understanding degree of the student courses.
(II) technical scheme
In order to solve the technical problems and achieve the purpose of the invention, the invention is realized by the following technical scheme:
an intelligent speech recognition method, comprising the steps of:
s1: acquiring classroom voice signals through a plurality of microphones, specifically, arranging a plurality of microphones on the front, rear, left and right walls and the ceiling of a classroom, and acquiring the classroom voice signals;
s2: performing signal preprocessing on a voice signal, wherein the voice signal preprocessing comprises the processes of pre-emphasis, endpoint detection, denoising, framing and windowing;
s3: distinguishing the ages of speakers in the voice signals by identifying the ages of the voices, judging roles according to the ages, and determining the identities of the speakers as teachers or students;
s4: respectively recognizing the voice contents of teacher and student, and converting the voice information into text information
S5: judging the correlation degree of the lesson taking content of the teacher and the lessons;
s6: classifying voice contents of students, dividing the voice contents into class-irrelevant voices and class-relevant voices, calculating to obtain the attention concentration degree of the students based on the class-irrelevant voices, and calculating to obtain the course understanding degree of the students based on the class-relevant voices;
s7: calculating the correlation between the content of the teacher in class and the attention concentration degree of the students; and prompting the teacher according to the correlation and the understanding degree of the student course.
Further, the endpoint detection adopts an LTSD algorithm and is classified based on the signal-to-noise ratio of the voice signals.
Further, the endpoint detection includes threshold discrimination, and based on the adaptive threshold value, the calculation method is as follows:
Figure BDA0003838147740000011
wherein E (k) is a noise estimation value, σ (k) is a noise mean square error estimation value, and f (SNR) is a signal-to-noise ratio correlation function.
And the signal-to-noise ratio correlation function is obtained by fitting according to historical data.
Further, the step S3 includes:
s31: and generating a spectrogram, wherein the spectrogram generating process comprises power spectrum calculation and obtaining Mel spectrogram output through a Mel filter bank.
S32: extracting characteristics, wherein the obtained MFCC characteristic parameters of the voice signals are used as characteristic parameters of voice recognition;
s33: and (3) establishing and training a model, training the extracted features based on an LSTM neural network model, and using the trained model for speech age recognition after training. And determining the speaker as a teacher or a student through the recognition result.
Further, the mel filter bank tests the background noise in the classroom environment, extracts the fundamental frequency fn of the background noise, and adds the fundamental frequency fn into the triangular filter bank.
Further, the background noise can be classified into classroom external noise and classroom internal noise, and the classroom internal noise includes sound of the rotation of the fan, sound of a computer or other machines, and the like, so that two fundamental frequency signals inside and outside the classroom can be respectively extracted, and the two fundamental frequency signals are integrated, and the integration mode is as follows:
f add =af 1 +bf 2
wherein a and b are weight coefficients, f 1 、f 2 Respectively, the fundamental frequencies of the noise inside and outside the classroom.
Further, the step S4 further includes correcting the character recognition result by recognizing the voice content. Further judging whether or not the speaker 'S character can be specified in the sentence relating to the own character in the recognized speech content, and if so, comparing the recognized speaker' S character with the character in step S3, and if not, modifying the original judgment result.
Further, the step S7 further includes:
the correlation of the content of the teacher's lesson and the attention concentration of the students is expressed as follows:
Figure BDA0003838147740000021
wherein H is the concentration of the subject of the student's voice irrelevant to the class, c 1 To adjust the coefficients.
The invention also provides an intelligent voice recognition system, which specifically comprises:
the microphone array is used for acquiring classroom voice signals through a plurality of microphones, and optionally, a plurality of microphones are arranged on the front, rear, left and right walls and the ceiling of a classroom to acquire the classroom voice signals.
And the voice signal preprocessing module is used for preprocessing the collected voice signals, and the voice signal preprocessing comprises the processes of pre-emphasis, endpoint detection, denoising, framing and windowing.
And the voice age identification module is used for distinguishing the age of the speaker in the voice signal by identifying the voice age, judging the role according to the age and determining the identity of the speaker as a teacher or a student.
The voice recognition module is used for respectively recognizing the voice contents of the teacher and the students and converting the voice information into text information through recognition;
and the correlation degree calculation module is used for acquiring the electronic teaching plan of the teaching contents, extracting keywords from the electronic teaching plan, comparing the identified teaching contents of the teacher with the keywords in the teaching plan and acquiring the correlation degree between the teaching contents and the keywords in the teaching plan.
The student learning state acquisition module is used for classifying the voice content of the student, dividing the voice content into class-independent voice and class-related voice, calculating the attention concentration degree of the student based on the class-independent voice, and calculating the course understanding degree of the student based on the class-related voice.
The prompting module is used for calculating the correlation between the content of the lessons taken by the teacher and the attention concentration degree of the students; and prompting the teacher according to the correlation and the understanding degree of the student course.
(III) advantageous effects
The invention has the beneficial effects that:
(1) The intelligent voice recognition based quantitative assessment of the teaching quality of teachers and the attention concentration degree of students in a classroom is realized.
(2) Calculating the correlation between the content of the teacher in class and the attention concentration degree of the students through voice recognition; and the teacher is prompted according to the correlation and the class understanding degree of the students, so that real-time feedback of teaching of the teacher is realized, and the class teaching quality is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic flow diagram of an intelligent speech recognition method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a speech signal preprocessing process according to an embodiment of the present application.
Detailed Description
The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It should be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be further noted that the drawings provided in the following embodiments are only schematic illustrations of the basic concepts of the present disclosure, and the drawings only show the components related to the present disclosure rather than the numbers, shapes and dimensions of the components in actual implementation, and the types, the numbers and the proportions of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
Referring to fig. 1, an intelligent speech recognition method includes:
s1: obtaining classroom speech signals through multiple microphones
A plurality of microphones are arranged on the front, rear, left and right walls and the ceiling of a classroom to collect voice signals in the classroom.
S2: signal preprocessing for speech signals
The voice signal is data collected by a microphone, and interference waves such as noise are introduced in the data collection process, so that the voice signal needs to be preprocessed in order to highlight acoustic features in the voice signal and reduce the influence of the noise on a recognition result.
As shown in fig. 2, the speech signal preprocessing includes pre-emphasis, endpoint detection, de-noising, framing, and windowing processes.
S21: pre-emphasis
In order to eliminate the influence of vocal cords and lips on the speech recognition definition in the process of phonation, increase the resolution of high frequency and highlight the high-frequency formant, the pre-emphasis operation needs to be performed on the speech signal, and the process is realized by adopting a high-pass filter and is expressed as follows:
Figure BDA0003838147740000031
wherein ρ ∈ [0.9,1].
S22: endpoint detection
Speech signals have discontinuities and therefore it is important to recognize speech segments and non-speech segments of a speech signal for subsequent speech recognition. The method performs end point detection based on the LTSD algorithm, calculates the signal-to-noise ratio in each window in a sliding window mode, and judges whether the signal-to-noise ratio meets a threshold value or not by using a self-adaptive threshold value method.
In the prior art, the traditional LTSD algorithm is adopted to calculate and process the signal energy of the full frequency band, the signal adaptability is poor, and the noise characteristic cannot be accurately identified. The invention analyzes the noise characteristics of classroom scenes to obtain, in the collected voice signals, the noise energy is mostly concentrated in a low frequency band, and the unvoiced sound which is easy to be identified by mistake is in a high frequency band, so that the noise interference has different influences in different frequency bands, on the basis, the invention improves the traditional LTSD algorithm, classifies based on the signal-to-noise ratio of the voice signals, and realizes a better detection result, and the method comprises the following steps:
(1) And calculating the long-time spectrum envelope in the following way:
Figure BDA0003838147740000032
wherein, X (k, l) is the amplitude spectrum of the first frame speech of the speech signal at the frequency k. The LTSE utilizes a long-time principle to add a long-time window for analysis on the basis of a short-time amplitude spectrum. The LTSD of order N is calculated using the following equation.
Figure BDA0003838147740000041
Where N (k) is the amplitude spectrum of the background noise at frequency k, k =0,1.. NFFT-1.
(2) Noise estimation
When calculating the LTSD features and updating the discrimination thresholds, the noise magnitude spectrum needs to be estimated. And setting data of a few frames before the beginning of a section of voice as non-voice, initializing the noise by the section of data, and then updating the noise according to statistical information. When the number of frames continuously determined as non-speech reaches p frames, updating of noise information is started.
(3) Threshold discrimination
On the basis of the traditional threshold discrimination, the invention improves the selection of the threshold value, sets the self-adaptive threshold value, and has the following calculation mode:
Figure BDA0003838147740000042
wherein E (k) is a noise estimation value, σ (k) is a noise mean square error estimation value, and f (SNR) is a signal-to-noise ratio correlation function.
And the signal-to-noise ratio correlation function is obtained by fitting according to historical data.
S23: framing and windowing
The invention adopts the Hamming window as the window function of the framing operation, and because the Hamming window has the characteristics of wider main lobe width and less side lobe quantity, the Hamming window is taken as the window function of the framing operation, so that the frequency leakage can be reduced.
S3: the age of the speaker in the voice signal is distinguished by identifying the age of the voice, and the identity of the speaker is determined to be a teacher or a student according to the role discrimination of the age.
The classroom application scene of the invention is applied to middle and primary school classrooms, and the ages of the students and teachers are greatly different, so that the speaker in the voice signal can be distinguished as a teacher or a student by identifying the ages of the voices.
The invention realizes the identification of the sound age based on a deep learning algorithm, and specifically comprises the following steps:
s31: generating a spectrogram
The process of generating the spectrogram comprises the steps of calculating a power spectrum, and obtaining output of the Mel spectrogram through a Mel filter bank.
(1) Power spectrum calculation
Performing fast Fourier transform on the voice signal, and calculating to obtain a power spectrum of a spectrogram:
Figure BDA0003838147740000043
(2) Passing through a mel filter bank
The mel filter bank of the present invention is based on a triangular filter bank, wherein the frequency response of each triangular filter is represented as:
Figure BDA0003838147740000044
where f (m) is the center frequency of the mth triangular filter.
Because the noise interference suffered by the classrooms at different positions is different, for example, the background noise interference is different between the classroom close to the road and the classroom at the position where the campus is quieter, in order to increase the filtering accuracy, the invention tests the noise interference in the classroom environmentBackground noise and extracting fundamental frequency f of the background noise n And adding the triangular filter group into a traditional triangular filter group to form a new triangular filter group.
Further, the background noise can be classified into classroom external noise and classroom internal noise, and the classroom internal noise includes sound of the rotation of the fan, sound of a computer or other machines, and the like, so that two fundamental frequency signals inside and outside the classroom can be respectively extracted, and the two fundamental frequency signals are integrated, and the integration mode is as follows:
f add =af 1 +bf 2
wherein a and b are weight coefficients, f 1 、f 2 Respectively, the fundamental frequencies of the noise inside and outside the classroom.
And the integrated fundamental frequency is adopted to obtain a Mel filter bank, so that the identification precision is higher.
(3) Logarithmic discrete cosine transform (DCT cepstrum)
The DCT cepstrum is calculated by the following formula:
Figure BDA0003838147740000051
where N is the frame length of the speech signal x (N) and C (k) is the orthogonality factor.
S32: feature extraction
The age of the speaker is a variation factor that affects the acoustic characteristics of human speech, and therefore, features that vary significantly in the age dimension are extracted as the feature parameters for recognition.
The feature parameters of voice recognition, which are the MFCC feature parameters of the obtained voice signal:
Figure BDA0003838147740000052
wherein S (i, m) is Mel energy; m is the mth Mel filter of M filters, i is the ith frame after the speech signal is framed, and n is the number of spectral lines after DCT cepstrum.
S33: model building and training
And training the extracted features based on an LSTM neural network model, and using the trained model for speech age recognition after training. And determining the speaker as a teacher or a student through the recognition result.
S4: respectively identifying the voice contents of a teacher and students;
and converting the voice information into text information through recognition, and optionally, performing voice recognition based on a convolutional neural network.
Further, the character recognition result can be corrected by recognizing the voice content. Further judging whether the speaker 'S role can be specified or not for the sentence related to the self-role in the recognized speech content, if so, comparing the recognized speaker' S role with the role in step S3, and if not, modifying the original judgment result.
S5: judging the correlation degree of the lesson taking content of the teacher and the lessons;
obtaining an electronic teaching plan of the content in class, and extracting a keyword K = { K = from the electronic teaching plan 1 ,k 2 ,k 3 ,…,k n };
Comparing the identified lesson content of the teacher with the keywords in the teaching plan to obtain the correlation degree between the lesson content and the keywords, wherein the specific calculation mode is as follows:
carrying out sentence division marking on the lesson contents of the teacher, adding 1 to the mark p when the sentence contains the keyword k, and counting the relation between the number m of all sentences and the mark p to obtain the correlation degree between the lesson contents of the teacher and the lessons:
Figure BDA0003838147740000053
s6: classifying voice contents of students, dividing the voice contents into class-irrelevant voices and class-relevant voices, calculating to obtain the attention concentration degree of the students based on the class-irrelevant voices, and calculating to obtain the course understanding degree of the students based on the class-relevant voices;
s61: classifying the voice contents of students, and classifying the voice contents into class-independent voices and class-related voices;
(1) Feature extraction
The invention is based on TF-IDF algorithm, namely word frequency-inverse text frequency algorithm to extract the characteristics of the student voice content. The calculation method is as follows:
Figure BDA0003838147740000054
wherein N is the number of times of occurrence of a specific keyword in the text, N is the total number of terms in the speech, W is the total number of documents in the corpus, and W is the number of documents containing the specific keyword. The larger the value of D, the greater the importance of the word to the text.
The algorithm ranks the weights of the extracted keywords, and the higher the weight is, the more important the ranking is to analyze the whole voice.
(2) Classification model building
Because the classification of the output layer is a problem of two classifications, and the vocabulary related to classroom learning has obvious recognition characteristics, and simultaneously, the characteristic of high effectiveness requirement of speech recognition is considered, the classification is realized by adopting an SVM algorithm. The SVM is a linear classifier, and the SVM algorithm has the advantages of high precision and high operation speed for the problem of two-classification, so that the method is used for classifying the voice contents of the students on the basis of the SVM.
The classification model returns two results, a prediction label and a confidence. The confidence is calculated as follows:
Figure BDA0003838147740000061
wherein k is the number of all the supported discriminants, n is the number of all the categories, s i Scores for all supported discriminant classes.
The basic probability distribution function of the model is as follows:
Figure BDA0003838147740000062
wherein m is 1 、m 2 、m 0 Basic probability distribution functions, w, of positive, negative and complete sets, respectively m (k i ) Is the weight of the mth base classifier, k i Is the true class of the ith sample data, P i Is the posterior probability of the ith sample data.
In order to improve the convergence rate and the classification accuracy of the classification algorithm, a kernel function is improved, and the improved kernel function is as follows:
Figure BDA0003838147740000063
wherein beta is a weight coefficient, sigma is the kernel radius of the RBF kernel function, tanh is a double tangent function, v is the coefficient of the multilayer perceptron kernel function, and g is an offset.
The improved kernel function has the advantages of a plurality of kernel functions, and the classification accuracy is improved.
(3) Result output
And obtaining the number of the speeches irrelevant to the classroom and the related speeches in the student speech content and the time length.
S62: calculating to obtain the attention concentration degree Col of the student based on the voice irrelevant to the classroom;
Figure BDA0003838147740000064
wherein a and b are adjustment coefficients, T 1 For the duration of the classroom-independent speech time within the sampling period, T 0 Is a sampling period, and n is the time length of the voice irrelevant to the classroom in the sampling period.
S63: calculating to obtain the student course understanding degree based on the classroom related voice;
the student can obtain the understanding degree of the student on the course in the classroom communication, the invention judges whether the voice relates to the question of the content taught by the teacher based on the classroom communication voice of the student, and obtains the understanding degree of the student course according to the number and the density of the voice of the question of the student.
Further, the speech density of the question may be the sum of the durations of the relevant speech in a unit time.
S7: calculating the correlation between the content of the teacher in class and the attention concentration degree of the students; and prompting the teacher according to the correlation and the understanding degree of the student courses.
The change factor of the attention concentration of the students comprises the content of lessons given by a teacher or the influence of external factors. When the concentration of the subjects of the voice irrelevant to the class of the students is low, the situation that the attention of the students is dispersed due to the fact that the content of the teachers in class is relatively boring and the like is confirmed, and then the students can talk irrelevant to the content of class learning in class time; when the concentration of the subjects of the voices irrelevant to the class is high, it is confirmed that the attention of the students is distracted due to the influence of some external factors. The correlation of the teacher's content in class and the attention concentration of the students is expressed as follows:
Figure BDA0003838147740000071
wherein H is the concentration of the subject of the voice of the students irrelevant to the classroom, c 1 To adjust the coefficients.
In the embodiment, the voice in the class is recognized and classified, the voice contents of a teacher and the voices of students are respectively obtained, the correlation degree between the teaching content of the teacher and the class and the attention concentration degree of the students are judged, the understanding degree of the student classes is obtained based on the voice related to the class, and the correlation between the teaching content of the teacher and the attention concentration degree of the students is calculated; and prompting the teacher according to the correlation and the understanding degree of the student course. The real-time feedback of the classroom learning condition is realized, and the teaching scientificity is improved.
The embodiment of the present invention further provides an intelligent speech recognition system, which specifically includes:
the microphone array is used for acquiring classroom voice signals through a plurality of microphones, and optionally, a plurality of microphones are arranged on the front, rear, left and right walls and the ceiling of a classroom to acquire the classroom voice signals.
And the voice signal preprocessing module is used for preprocessing the collected voice signals, and the voice signal preprocessing comprises the processes of pre-emphasis, endpoint detection, denoising, framing and windowing.
And the voice age identification module is used for distinguishing the age of the speaker in the voice signal by identifying the voice age, carrying out role discrimination according to the age and determining the identity of the speaker as a teacher or a student.
The voice recognition module is used for respectively recognizing the voice contents of the teacher and the students and converting the voice information into text information through recognition;
and the correlation degree calculation module of the lesson contents and the lessons is used for acquiring the electronic teaching plan of the lesson contents, extracting keywords from the electronic teaching plan, comparing the recognized lesson contents of the teacher with the keywords in the electronic teaching plan and acquiring the correlation degree between the lesson contents and the keywords in the electronic teaching plan.
The student learning state acquisition module is used for classifying the voice content of the student, dividing the voice content into class-independent voice and class-related voice, calculating the attention concentration degree of the student based on the class-independent voice, and calculating the course understanding degree of the student based on the class-related voice.
The prompting module is used for calculating the correlation between the content of the lessons taken by the teacher and the attention concentration degree of the students; and prompting the teacher according to the correlation and the understanding degree of the student course.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention made by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.

Claims (10)

1. An intelligent speech recognition method, comprising the steps of:
s1: acquiring classroom voice signals through a plurality of microphones, specifically, arranging a plurality of microphones on the front, rear, left and right walls and the ceiling of a classroom, and acquiring the classroom voice signals;
s2: performing signal preprocessing on a voice signal, wherein the voice signal preprocessing comprises the processes of pre-emphasis, endpoint detection, denoising, framing and windowing;
s3: distinguishing the ages of speakers in the voice signals by identifying the ages of the voices, and judging roles according to the ages to determine the identities of the speakers as teachers or students;
s4: respectively recognizing the voice contents of teacher and student, and converting the voice information into text information by recognition
S5: judging the correlation degree of the lesson taking content of the teacher and the lessons;
s6: classifying voice contents of students, dividing the voice contents into class-irrelevant voices and class-relevant voices, calculating to obtain the attention concentration degree of the students based on the class-irrelevant voices, and calculating to obtain the course understanding degree of the students based on the class-relevant voices;
s7: calculating the correlation between the content of the teacher in class and the attention concentration degree of the students; and prompting the teacher according to the correlation and the understanding degree of the student courses.
2. The intelligent speech recognition method of claim 1, wherein the endpoint detection employs an LTSD algorithm for classification based on a signal-to-noise ratio of the speech signal.
3. The intelligent speech recognition method of claim 2 wherein the endpoint detection comprises a threshold discrimination based on an adaptive threshold value calculated as follows:
Figure FDA0003838147730000011
wherein E (k) is a noise estimation value, sigma (k) is a noise mean square error estimation value, and f (SNR) is a signal-to-noise ratio correlation function;
and the signal-to-noise ratio correlation function is obtained by fitting according to historical data.
4. The intelligent speech recognition method of claim 1, wherein the step S3 comprises:
s31: generating a spectrogram, wherein the spectrogram generating process comprises power spectrum calculation, and obtaining Mel spectrogram output through a Mel filter bank;
s32: extracting characteristics, wherein the obtained MFCC characteristic parameters of the voice signals are used as characteristic parameters of voice recognition;
s33: and establishing and training a model, training the extracted features based on an LSTM neural network model, using the trained model for voice age recognition after the training is finished, and determining a speaker as a teacher or a student according to a recognition result.
5. The intelligent speech recognition method of claim 4, wherein the Mel filterbank passes the background noise in the classroom environment and extracts the fundamental frequency f of the background noise n It is added to the triangular filter bank.
6. The intelligent speech recognition method of claim 5, wherein the background noise is divided into external classroom noise and internal classroom noise, and the internal classroom noise includes the sound of a fan, a computer or other machines, and so on, so that two fundamental frequency signals inside and outside the classroom can be extracted separately and integrated as follows:
f add =af 1 +bf 2
wherein a and b are weight coefficients, f 1 、f 2 Respectively, the fundamental frequencies of the noise inside and outside the classroom.
7. The intelligent voice recognition method according to claim 1, wherein the step S4 further comprises correcting the character recognition result by recognizing voice content; further judging whether or not the speaker 'S character can be specified in the sentence relating to the own character in the recognized speech content, and if so, comparing the recognized speaker' S character with the character in step S3, and if not, modifying the original judgment result.
8. The intelligent speech recognition method of claim 1, wherein the step S5 further comprises:
comparing the identified lesson content of the teacher with the keywords in the teaching plan to obtain the correlation degree between the lesson content and the keywords, wherein the specific calculation mode is as follows:
carrying out sentence division marking on the content of the lessons of the teacher, adding 1 to the mark p when the sentence contains a keyword k, and counting the relation between the number m of all sentences and the mark p as the correlation degree of the content of the lessons of the teacher and the lessons:
Figure FDA0003838147730000021
9. the intelligent speech recognition method of claim 1, wherein the step S7 further comprises:
the correlation of the content of the teacher's lesson and the attention concentration of the students is expressed as follows:
Figure FDA0003838147730000022
wherein H is the concentration of the subject of the voice of the students irrelevant to the classroom, c 1 To adjust the coefficients.
10. A system for an intelligent speech recognition method according to any of claims 1-9, wherein the system comprises:
the system comprises a microphone array, a microphone array and a control unit, wherein the microphone array is used for acquiring classroom voice signals through a plurality of microphones, and optionally, a plurality of microphones are arranged on the front, rear, left and right walls and the ceiling of a classroom and used for acquiring the classroom voice signals;
the voice signal preprocessing module is used for preprocessing the collected voice signals, and the voice signal preprocessing comprises the processes of pre-emphasis, endpoint detection, denoising, framing and windowing;
the voice age identification module is used for distinguishing the age of the speaker in the voice signal by identifying the voice age, carrying out role discrimination according to the age and determining the identity of the speaker as a teacher or a student;
the voice recognition module is used for respectively recognizing the voice contents of the teacher and the students and converting the voice information into text information through recognition;
the system comprises a lesson content and course correlation degree calculation module, a lesson content and course correlation degree calculation module and a lesson learning module, wherein the lesson content correlation degree calculation module is used for acquiring an electronic teaching plan of the lesson content, extracting keywords from the electronic teaching plan, comparing the identified lesson content of the lesson with the keywords in the teaching plan and acquiring the correlation degree between the lesson content and the keywords in the teaching plan;
the student learning state acquisition module is used for classifying the voice content of the student, dividing the voice content into class-independent voice and class-related voice, calculating the attention concentration degree of the student based on the class-independent voice, and calculating the course understanding degree of the student based on the class-related voice;
the prompting module is used for calculating the correlation between the content of the lessons taken by the teacher and the attention concentration degree of the students; and prompting the teacher according to the correlation and the understanding degree of the student courses.
CN202211093905.9A 2022-09-08 2022-09-08 Intelligent voice recognition method and system Withdrawn CN115641839A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211093905.9A CN115641839A (en) 2022-09-08 2022-09-08 Intelligent voice recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211093905.9A CN115641839A (en) 2022-09-08 2022-09-08 Intelligent voice recognition method and system

Publications (1)

Publication Number Publication Date
CN115641839A true CN115641839A (en) 2023-01-24

Family

ID=84941576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211093905.9A Withdrawn CN115641839A (en) 2022-09-08 2022-09-08 Intelligent voice recognition method and system

Country Status (1)

Country Link
CN (1) CN115641839A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316187A (en) * 2023-11-30 2023-12-29 山东同其万疆科技创新有限公司 English teaching management system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117316187A (en) * 2023-11-30 2023-12-29 山东同其万疆科技创新有限公司 English teaching management system
CN117316187B (en) * 2023-11-30 2024-02-06 山东同其万疆科技创新有限公司 English teaching management system

Similar Documents

Publication Publication Date Title
Deshwal et al. Feature extraction methods in language identification: a survey
CN108962229B (en) Single-channel and unsupervised target speaker voice extraction method
Samantaray et al. A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
Dahmani et al. Vocal folds pathologies classification using Naïve Bayes Networks
Borsky et al. Modal and nonmodal voice quality classification using acoustic and electroglottographic features
Venkatesan et al. Binaural classification-based speech segregation and robust speaker recognition system
CN115641839A (en) Intelligent voice recognition method and system
Moritz et al. Integration of optimized modulation filter sets into deep neural networks for automatic speech recognition
Liu et al. AI recognition method of pronunciation errors in oral English speech with the help of big data for personalized learning
Xu et al. Voiceprint recognition of Parkinson patients based on deep learning
Hammami et al. Recognition of Arabic speech sound error in children
Ghonem et al. Classification of stuttering events using i-vector
Yousfi et al. Isolated Iqlab checking rules based on speech recognition system
Gomathy et al. Gender clustering and classification algorithms in speech processing: a comprehensive performance analysis
Sahoo et al. Detection of speech-based physical load using transfer learning approach
Zouhir et al. Robust speaker recognition based on biologically inspired features
Kammee et al. Sound Identification using MFCC with Machine Learning
İLERİ et al. Comparison of Different Normalization Techniques on Speakers’ Gender Detection
Srinivas LFBNN: robust and hybrid training algorithm to neural network for hybrid features-enabled speaker recognition system
Chao et al. Vocal Effort Detection Based on Spectral Information Entropy Feature and Model Fusion.
Dong Data-Driven Non-Intrusive Speech Quality and Intelligibility Assessment
Mittal et al. Age approximation from speech using Gaussian mixture models
Speights Atkins et al. Towards automated detection of similarities and differences in bilingual speakers
Hautamäki Fundamental Frequency Estimation and Modeling for Speaker Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20230124

WW01 Invention patent application withdrawn after publication