CN116189671A

CN116189671A - Data mining method and system for language teaching

Info

Publication number: CN116189671A
Application number: CN202310467728.4A
Authority: CN
Inventors: 张宝英; 刘燕霞
Original assignee: Lingyu International Culture And Art Communication Co ltd
Current assignee: Lingyu International Culture And Art Communication Co ltd
Priority date: 2023-04-27
Filing date: 2023-04-27
Publication date: 2023-05-30
Anticipated expiration: 2043-04-27
Also published as: CN116189671B

Abstract

The invention relates to the technical field of voice data processing, in particular to a data mining method and system for language teaching. The method comprises the following steps: obtaining language voice data of a learner; optimizing according to the language voice data of the learner, thereby obtaining effective voice data of the learner; recognizing the effective speech data of the learner by using a preset language speech recognition model, thereby generating language text data of the learner; word segmentation processing is carried out according to the learner language text data, part-of-speech tagging is carried out, deep word use evaluation is carried out, and thereby learner level text data is obtained; and extracting vocabulary semantic features and grammar expression capability features of the learner-level text data to obtain vocabulary language features and language grammar expression capability features respectively. According to the invention, through deep mining of the language voice data of the learner, the language level of the learner is accurately estimated, and more accurate language level feedback of the learner is provided.

Description

Data mining method and system for language teaching

Technical Field

The invention relates to the technical field of voice data processing, in particular to a data mining method and system for language teaching.

Background

Besides reading, listening, speaking, and memorizing, the learning of language also needs to be mastered step by a great number of repeated exercises of different types of questions. Through the exercise of different difficulty levels and different types of questions, on one hand, the user can be guided to master the related knowledge and method from shallow to deep, and on the other hand, the user or the tutor can be helped to find the situation that the user masters the related knowledge and method, so that the learning and tutoring can be enhanced more pertinently. Especially, the parts which are not well understood and mastered by the user can be used for strengthening the exercise in a targeted manner, so that the related learning results can be effectively consolidated, and the learning efficiency of the user is improved. Artificial intelligence (Artificial Intelligence, AI for short) refers to technology and applications that allow robots to have mental intelligence like robots. Artificial intelligence includes a number of fields that utilize large amounts of data and algorithms to help computers simulate human mental activities and achieve autonomous decisions and actions. The future artificial intelligence has very broad prospects. With the continuous development of technology and the continuous expansion of application scenes, the artificial intelligence is widely applied in various fields. How to combine artificial intelligence with language teaching makes improving the quality and effectiveness of language teaching a problem.

Disclosure of Invention

The invention provides a data mining method and a system for language teaching to solve at least one of the technical problems.

The application provides a data mining method for language teaching, which comprises the following steps:

step S1: obtaining language voice data of a learner;

step S2: optimizing according to the language voice data of the learner, thereby obtaining effective voice data of the learner;

step S3: recognizing the effective speech data of the learner by using a preset language speech recognition model, thereby generating language text data of the learner;

step S4: word segmentation processing is carried out according to the learner language text data, part-of-speech tagging is carried out, deep word use evaluation is carried out, and thereby learner level text data is obtained;

step S5: extracting vocabulary semantic features and grammar expression capability features of the learner horizontal text data to obtain vocabulary language features and language grammar expression capability features respectively;

step S6: performing vocabulary evaluation according to the vocabulary language features to obtain vocabulary level data, and recognizing the language expression capability features through a language expression capability recognition model to obtain language expression capability data;

Step S7: and comprehensively evaluating the language level of the learner according to the vocabulary level data and the language expression capability data, thereby obtaining the language level evaluation data of the learner.

According to the invention, through carrying out language level assessment on the vocabulary and the dimensionality of the expression capability of the learner language text data, accurate learner language level feedback is provided, and a natural language processing technology (such as word segmentation, part-of-speech tagging and the like) is applied to processing of the learner language data, so that the intellectualization of the learner language level assessment is realized.

Preferably, step S2 is specifically:

step S21: noise reduction processing is carried out on the learner language voice data, so that noise reduction voice data are obtained;

step S22: enhancing the noise-reduced voice data to obtain enhanced voice data;

step S23: optimizing the framing processing of the enhanced voice data so as to obtain framing voice data;

step S24: and windowing the framing voice data so as to obtain the effective voice data of the learner.

According to the invention, through noise reduction processing, enhancement processing, framing processing and windowing processing, the influence of environmental noise and other non-speech factors on speech recognition can be effectively reduced, so that the speech recognition accuracy is improved, the original speech data is optimized, key information in the speech of a learner can be kept while noise and other interference factors are removed, the language capability of the learner can be more accurately evaluated, and the computational complexity and resource consumption of a subsequent recognition model can be reduced and the processing efficiency is improved through preprocessing and optimizing the speech data of the learner.

Preferably, the optimizing framing in step S23 is specifically:

step S231: carrying out framing treatment according to the enhanced voice data so as to obtain preliminary framing voice data;

step S232: clustering calculation is carried out on the current frame voice data of the preliminary framing voice data, so that current frame clustering feature data is obtained;

step S233: identifying the current frame clustering feature data through an effective frame clustering feature identifier, so as to obtain effective voice tag data, wherein the effective voice tag data comprises qualified voice tag data, doubtful voice tag data and invalid voice tag data;

step S234: when the validity voice tag data is determined to be qualified voice tag data, the current frame voice data is determined to be framing voice data;

step S235: when the validity voice tag data is determined to be the suspicious voice tag data, performing energy spectrum validity confirmation operation on the voice data of the current frame;

step S236: and deleting the voice data of the current frame when the valid voice tag data is determined to be invalid voice tag data.

According to the method, the effective voice data can be well screened out through cluster calculation and identification of the effective frame cluster feature identifier, misjudgment and misinformation caused by noise, interference and other factors are avoided, accuracy and reliability of the voice data are improved, framing processing is carried out according to the enhanced voice data, compared with a traditional framing processing mode, the method is more in line with actual conditions, flexibility is higher, different voice environments and scene requirements can be better met, when the effective voice tag data are in doubt voice tag data, the method can carry out energy spectrum effectiveness confirmation operation on the current frame voice data, accuracy of the in doubt voice data is further improved, when the effective voice tag data are in invalid voice tag data, the method can automatically delete the current frame voice data, occupation and waste of the invalid data are avoided, and the utilization rate of the voice data is improved.

Preferably, the step of constructing the active frame cluster feature identifier comprises the steps of:

step S237: acquiring historical effective frame data;

step S238: clustering calculation is carried out according to the historical effective frame data, so that the historical effective frame clustering characteristic data is obtained;

step S239: extracting a central point according to the historical effective frame clustering feature data, so as to obtain a historical effective frame clustering feature central set;

step S240: extracting the optimized edge distance according to the historical effective frame clustering feature center set, so as to obtain an optimized clustering feature center set and a corresponding optimized clustering feature center distance value set;

step S241: and constructing the optimized cluster feature centers and the corresponding optimized cluster feature center distance values so as to construct an effective frame cluster sub-identifier, and carrying out coupling association on all the effective frame cluster sub-identifiers so as to obtain the effective frame cluster feature identifier.

According to the invention, the voice frames can be more accurately classified by clustering calculation and extraction optimization of the clustering feature center of the historical effective frames, so that the classification precision is improved, the influence of noise can be reduced by using a clustering algorithm to count the feature data of the historical effective frames, the robustness of a system is improved, the condition that the voice frame features in different time periods are large in change easily occurs in long voice, and the condition can be relieved by using the historical effective frames as input, so that the recognition accuracy is improved.

Preferably, the step of optimizing edge cluster extraction is specifically:

step S201: performing distance calculation on the historical effective frame clustering feature centers in the historical effective frame clustering feature center set and the rest historical effective frame clustering feature centers, so as to obtain a feature center distance set;

step S202: extracting a minimum distance value and a next-smallest distance value according to the characteristic center distance set, so as to obtain a characteristic center distance minimum distance set and a characteristic center next-smallest distance value set;

step S203: performing relative distance calculation on the historical effective frame clustering feature centers in the historical effective frame clustering feature center set and the historical effective frame clustering feature centers of the last two so as to obtain an optimized edge distance set;

step S204: sequencing according to the optimized edge distance set, so as to obtain an ordered edge distance set;

step S205: determining a historical effective frame cluster feature center corresponding to the minimum value in the ordered edge distance set as an optimized cluster feature center, adding the optimized cluster feature center set, determining a feature center sub-small distance value corresponding to the optimized cluster feature center as an optimized cluster feature center distance value, and deleting the ordered edge distance corresponding to the optimized cluster feature center in the ordered edge distance set;

Step S206: performing relative distance calculation on the historical effective frame distance feature centers in the rest historical effective frame distance feature center set and the optimized cluster feature centers in the optimized cluster feature center set, extracting a maximum value, obtaining a relative maximum distance value, determining the historical effective frame distance feature center as the optimized cluster feature center when the relative maximum distance value is larger than a feature center sub-minimum distance value corresponding to the historical effective frame distance feature center, adding an optimized cluster feature center set, determining a feature center sub-minimum distance value corresponding to the optimized cluster feature center as the optimized cluster feature center distance value, and deleting ordered edge distances corresponding to the optimized cluster feature centers in the ordered edge distance set;

step S207: step S206 is repeated until the ordered set of edge distances is empty.

In the embodiment, due to the adoption of the step of extracting the optimized edge clustering features, the clustering precision of the center of the clustering features of the historical effective frames can be improved by extracting the minimum distance value, the next-smallest distance value and the like, so that whether the voice frames contain effective voices can be judged more accurately. Past algorithms may misjudge non-human voice frames as valid voices, which may result in a higher misjudgment rate when dealing with noisy environments and voices are very low. However, the method can improve the clustering precision and simultaneously reduce misjudgment on the non-human voice frame by calculating the threshold corresponding to the relative maximum distance value. Because the method can more accurately determine whether the voice frame contains effective voice, the accuracy and quality of the subsequent voice recognition can be improved, and the whole voice recognition system is more accurate and stable.

Preferably, the energy spectrum effectiveness confirmation operation performs threshold confirmation on a voice energy value generated by calculating through a voice frame energy spectrum calculation formula, wherein the voice frame energy spectrum calculation formula specifically includes:

；

g is a voice energy value, alpha is a first weight coefficient, w _i For the ith sampling point in the voice frame, beta is a second weight coefficient, w _i+1 For the (i+1) th sample point in the speech frame, sgn (w _i ) Is w _i Gamma is a third weight coefficient, h is a historical voice data correction term, t is a current frame voice data adjustment term, m is a voice frame energy spectrum scaling coefficient, r is a constant term, o is a smooth adjustment term generated according to the current frame voice data, N is the sampling point number of a voice frame, and u is a correction coefficient of a voice energy value.

The invention provides a calculation formula of an energy spectrum of a voice frame, which fully considers a first weight coefficient alpha and an ith sampling point w in the voice frame _i The second weight coefficient beta, the (i+1) th sampling point w in the voice frame _i+1 、w _i+1 Is a sign function sgn (w _i ) The method comprises the steps of calculating energy in a voice frame, zero crossing rate and historical data according to a formula, and then calculating weights according to the weight coefficient and various adjustment factors to obtain an energy spectrum value. The energy spectrum value can be used for judging whether the voice frame contains effective voice, the first weight coefficient alpha, the second weight coefficient beta and the third weight coefficient gamma are respectively the weight coefficients of the voice frame energy, the zero crossing rate and the historical data, the adjustment is carried out according to the actual application scene, so that a more accurate generated value is obtained, the energy spectrum is used for adjusting the size of the energy spectrum according to the energy spectrum scaling coefficient of the voice frame, the current frame voice data adjustment item t is used for adjusting the energy calculation result of the current voice frame, the correction is carried out through the correction coefficient u of the voice energy value, the accurate calculation of the voice energy value is realized, and whether the current voice frame contains effective voice is judged through the comparison with the preset threshold value.

Preferably, the step of constructing the language speech recognition model in step S3 specifically includes:

step S31: standard voice data and corresponding voice tag data are acquired;

step S32: the same format conversion is carried out according to the standard voice data, so that the standard format voice data are obtained;

step S33: performing noise reduction calculation according to the standard format voice data so as to generate standard noise reduction voice data;

step S34: removing silence segments from the standard noise reduction voice data to generate standard voice data;

step S35: carrying out framing treatment on the standard voice data so as to obtain standard framing voice data;

step S36: windowing is carried out according to the standard framing voice data, so that standard windowing voice data are obtained;

step S37: extracting language phoneme features and language phoneme combination features according to the standard windowed voice data, so as to obtain voice phoneme features and language phoneme combination features;

step S38: optimizing convolutional neural network mapping according to the phonetic phoneme characteristics, so as to initially construct a primary language voice recognition model;

step S39: and carrying out correction error iteration on the primary language voice recognition model according to the language phoneme combination characteristics so as to obtain the language voice recognition model.

According to the invention, the accuracy of speech recognition is improved, the accuracy of speech recognition can be improved by using standard format speech data to perform preprocessing operations such as noise reduction and noise elimination, the influence of noise and non-human noise interference is reduced, model training and error correction are performed by using speech phoneme features and speech phoneme combination features, model error can be reduced, different speech inputs are better adapted to improve the reliability of the model, various preprocessing techniques and feature extraction methods are used for processing the speech data, the influence of noise and interference on the model can be reduced, and the reliability and robustness of the model are improved.

Preferably, the step of optimizing the convolutional neural network mapping is specifically:

step S381: performing bidirectional cyclic neural network construction according to the voice phoneme characteristics, thereby obtaining a voice neural network model;

step S382: decoding the voice neural network model to obtain a connection time sequence phoneme sequence model;

step S383: and carrying out reasoning search according to the connection time sequence phoneme sequence model and the corresponding voice tag data, thereby obtaining a primary language voice recognition model.

The invention replaces the traditional unidirectional cyclic neural network with the bidirectional cyclic neural network to construct the voice phoneme characteristics, and performs reasoning search by decoding to obtain a connection time sequence phoneme sequence model and corresponding voice tag data, thereby obtaining a primary language voice recognition model. The method for optimizing the convolutional neural network mapping can improve the accuracy and stability of voice recognition, and has better effect particularly in a complex voice environment. The method adopts a voice phoneme characteristic construction mode based on a bidirectional cyclic neural network, and can capture more context information, so that the voice recognition is more accurate, and the stability of the voice recognition is improved: the improved decoder can reduce the influence of noise and other distortion factors on voice recognition, and improves the stability of voice recognition.

Preferably, the learner level text data includes word part-of-speech usage level data and word part-of-speech familiarity level data, and step S4 is specifically:

step S41: word segmentation is carried out according to the learner language text data, so that word data are obtained;

step S42: part-of-speech tagging is performed on the word data, so that word part-of-speech data are obtained;

Step S43: labeling word part-of-speech data based on a context rule, so as to obtain word semantic data;

step S44: carrying out vocabulary type recognition according to the word semantic data so as to obtain word part-of-speech use degree data;

step S45: and carrying out statistical calculation on the vocabulary distribution condition according to the word semantic data so as to obtain the word part familiarity degree data.

The word part of speech using degree data and the word part of speech familiarity degree data can be used for evaluating the language level of a learner, including vocabulary quantity, proficiency degree and the like, and the pre-statistics analysis processing is performed for further deep mining of the subsequent learner language teaching data through statistics and analysis of the word part of speech using degree data and the word part of speech familiarity degree data, so that more deep learner language ability level data is provided.

The present application provides a data mining system for language teaching, the system comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform a data mining method for language teaching as described above.

The invention has the advantages that the effective speech data of the learner can be obtained and the preset language speech recognition model is applied, so that the speech data of the learner can be more efficiently converted into text data, and the accuracy of speech recognition is improved. Through word segmentation, part-of-speech tagging and vocabulary and grammar expressive feature extraction, the language level of a learner, including vocabulary and grammar capabilities, can be more fully understood. The language level comprehensive evaluation of the learner is carried out according to the vocabulary level data and the language expression capability data, so that targeted suggestions and feedback can be provided for the education staff, and personalized teaching can be realized. Through deep mining of language voice data of students, more targeted evaluation results are provided, so that practitioners or professionals can better distribute language education resources, and the utilization efficiency of the education resources is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting implementations made with reference to the following drawings in which:

FIG. 1 illustrates a flow chart of steps of a data mining method for language teaching of an embodiment;

FIG. 2 is a flow chart showing the steps of a learner-effective voice data acquisition method according to one embodiment;

FIG. 3 is a flow chart illustrating steps of an optimized framing processing method of an embodiment;

FIG. 4 is a flow chart illustrating steps of a method for efficient frame cluster feature identifier construction in accordance with one embodiment;

FIG. 5 is a flowchart illustrating steps of a method for optimizing edge cluster extraction, according to one embodiment;

FIG. 6 is a flow chart illustrating steps of a method for constructing a language speech recognition model of an embodiment;

FIG. 7 is a flowchart illustrating steps of a method for optimizing convolutional neural network mapping in accordance with one embodiment;

FIG. 8 is a flow chart illustrating the steps of a learner-level text data acquisition method according to one embodiment.

Detailed Description

The following is a clear and complete description of the technical method of the present patent in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.

Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.

It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

Referring to fig. 1 to 8, the present application provides a data mining method for language teaching, including the following steps:

step S1: obtaining language voice data of a learner;

specifically, for example, in a network online class, speech input may be performed through a microphone device, thereby obtaining learner language speech data.

Specifically, for example, a recording device (such as a mobile phone, a recording pen, etc.) is used to record the voice data of the learner, and then the recording file is imported into the analysis system for processing.

Specifically, for example, noise reduction processing, enhancement processing, framing processing, and windowing processing are performed on the learner-effective speech data, thereby obtaining the learner-effective speech data.

specifically, speech recognition is performed, for example, using a pre-trained deep neural network (e.g., convolutional neural network, recurrent neural network, transducer, etc.).

specifically, the learner language text data is word-segmented, for example, using an existing open source word segmentation tool (e.g., jieba, NLTK, spaCy, etc.). Part-of-speech tagging can be performed on the segmented text data, and a part-of-speech tagging tool (such as Stanford POS Tagger, spaCy and the like) or a pre-trained deep learning model (such as BERT, GPT and the like) can be used; according to the word segmentation and part-of-speech labeling results, the level of the learner can be further evaluated, for example, high-frequency vocabulary, low-frequency vocabulary and professional vocabulary used by the learner in the expression are counted, and the vocabulary quantity and vocabulary use preference of the learner are analyzed; analyzing the learner's performance in terms of grammar structure and sentence complexity to evaluate its grammar ability, etc.; and generating learner horizontal text data according to the depth word evaluation result. Such data may include information such as vocabulary scores, grammar ability scores, vocabulary usage distribution, and the like.

specifically, for example, the learner level text data includes vocabulary semantic features, such as vocabulary diversity: calculating vocabulary diversity index (e.g., vocabulary richness) in learner text, sentence complexity analysis: sentence complexity indicators (e.g., average sentence length, clause usage frequency, etc.) in the learner text are calculated reflecting the learner's complexity in language expression.

specifically, thresholds for the vocabulary level, such as a basic vocabulary, a medium-level vocabulary, and a high-level vocabulary, are set, for example, according to teaching targets and actual situations of learners. And classifying the vocabulary, comparing the vocabulary language features of the students with a set threshold value, dividing the students into different vocabulary levels, and scoring the vocabulary levels of the students based on expert rules or preset scoring rules. Training a language expression ability recognition model, such as a classification model (such as a support vector machine, a decision tree and the like) or a deep learning model (such as a neural network), by using the labeled learner text data, inputting the language grammar expression ability characteristics into the model for reasoning, and outputting the language expression ability assessment results of the learner, such as different levels of a primary level, a middle level, a high level and the like by the model.

Specifically, for example, the vocabulary level data and the language expression capability data are assigned weights, for example, the vocabulary level data is weighted by 0.6 and the language expression capability data is weighted by 0.4. And adding the two according to the weight to obtain the comprehensive evaluation data of the language level of the learner.

Preferably, step S2 is specifically:

specifically, the noise reduction processing is performed using, for example, spectral subtraction (Spectral Subtraction). Spectral subtraction is by estimating the power spectrum of noise and subtracting the noise power spectrum from the input speech signal.

Step S22: enhancing the noise-reduced voice data to obtain enhanced voice data;

Specifically, for example, a speech enhancement method based on an additive factor is used. The speech signal is enhanced by adding additive factors such as random harmonic signals or white noise to the speech signal.

specifically, for example, an autocorrelation function is obtained by performing an autocorrelation analysis on each speech frame, such as def autocorrelation (signal) by calculating an autocorrelation function, a param signal, a one-dimensional array representing a signal, a return signal, a one-dimensional array of length n, representing an autocorrelation function, n=len (signal), corr= [ ], for in range (n): r_k=0, for i in range (n-k): r_k+ = signal [ i ] ×signal [ i+k ], corr.application (r_k), return corr, which receives as input a one-dimensional array, wherein each value represents an autocorrelation value at the delay. The function traverses all possible delay values and calculates the corresponding autocorrelation values. The period of the speech signal is obtained using an autocorrelation function. The framing point is determined by judging whether the period is stable. And cutting according to the framing point to obtain framing voice data.

Specifically, for example, each voice frame is windowed, for example, a hamming window is selected, and the windowing formula is: w (N) =0.54-0.46×cos (2×pi×n/(N-1)) \n, where N is the number of sampling points for a frame of speech.

Preferably, the optimizing framing in step S23 is specifically:

specifically, the voice data is subjected to the segmentation processing by, for example, a preset number of frames, such as 25 ms.

Specifically, for example, for features of framed speech data, such as MFCC, PLP, etc., all frames are clustered using a k-means clustering algorithm. In practice, the appropriate k value may be selected so that frames of similar characteristics are classified as one. And then calculating the distance between the current frame and each cluster center, classifying the current frame into the category of the cluster center closest to the current frame, and obtaining the cluster characteristic data of the current frame.

specifically, for example, threshold ranges of qualified voices, in-doubt voices, and invalid voices may be set in advance for each cluster feature. And comparing the cluster characteristic data of the current frame with a threshold value, and distributing corresponding effective voice tags for the current frame according to a comparison result.

Specifically, for example, for a speech frame labeled "valid", the frame is taken as framing speech data; for a speech frame labeled "in doubt", an energy spectrum effectiveness confirmation process is required. For example, a certain energy threshold may be used to detect the energy of each sampling point in the speech frame, and if the energy exceeds the threshold, the sampling point is judged to be valid, otherwise, the sampling point is judged to be invalid. According to whether the sampling point is valid or not, a series of validity marks can be obtained, according to the marks, the voice frame is divided into a plurality of validity subsections, and then framing processing is carried out on each subsection. For a speech frame marked as "invalid", the speech data of the frame is directly deleted without any processing.

step S237: acquiring historical effective frame data;

specifically, for example, when processing the current frame data, a fixed-length time window may be set. And tracing back from the current frame to obtain the effective frame data in the time window.

Specifically, the index information for each active frame is stored in a list or other data structure, for example, during processing of voice data. When the historical effective frame data is required to be acquired, the corresponding effective frame can be quickly positioned according to the index information, and the required data can be extracted.

specifically, the historical effective frame data may be clustered using, for example, a K-means clustering algorithm. Firstly, a proper number of cluster centers (such as K) are selected, then the cluster centers are classified according to the distance between each effective frame data and the cluster centers, and the cluster centers are iteratively updated until convergence conditions are met. The finally obtained clustering result can be used for extracting the clustering characteristic data of the historical effective frames.

specifically, for example, a speech sample has 100000 frames, and valid frames therein are clustered into several groups after feature extraction, preprocessing, and denoising. And extracting the central point of each cluster group as a central set of the cluster characteristics of the historical effective frames according to the cluster results. The K-Means algorithm is used to cluster the valid frames and the number of clusters is set to 10. After each training, the center points of 10 clustering groups are extracted from the obtained clustering results, and are stored in a central set of historical effective frame clustering features.

specifically, for example, for each cluster center, the distances between the cluster center and other cluster centers in the central set of the cluster features of the historical effective frame are calculated, and the distance values are sorted from small to large to obtain a distance value set. Optimizing a cluster feature center set: and selecting k cluster centers with the smallest distance value according to the distance value set, and taking the k cluster centers as new cluster feature centers to obtain an optimized cluster feature center set. The corresponding optimized cluster feature center distance value set: and for each optimized cluster feature center, calculating the distance between the optimized cluster feature center and other cluster centers in the historical effective frame cluster feature center set, and sequencing the distance values from small to large to obtain a corresponding optimized cluster feature center distance value set.

Specifically, the class of the sample to be classified is determined, for example, according to the distance of the sample to be classified from the samples in the training sample set. And comparing the frames to be identified with the optimized clustering feature centers through the K-nearest neighbor classifier for the effective frame clustering feature identifier, selecting K optimized clustering feature centers closest to the frames to be identified, voting according to the category to which the K optimized clustering feature centers belong, and taking the category with the highest vote as the category of the frames to be identified.

And segmenting each sound characteristic signal, and transmitting all the segmented effective frames into a clustering sub-identifier so as to classify by utilizing the optimized clustering center. And calculating the distance between each feature vector and all cluster centers by using Euclidean distance or cosine distance and other methods, and selecting the cluster center closest to the feature vector as a class label to which the feature vector belongs. The effective frame clustering feature identifier is constructed: and carrying out coupling association on all the effective frame clustering sub-identifiers so as to obtain the effective frame clustering feature identifier. For a new sound signal, firstly segmenting the new sound signal, then transmitting all the segmented effective frames to a clustering sub-identifier for classification, and finally summarizing classification results to finish the feature identification of the sound signal.

Preferably, the step of optimizing edge cluster extraction is specifically:

specifically, for example, cosine similarity is used to calculate the cosine value of the included angle between two feature center vectors, and the formula is as follows: cos (θ) = (AB)/(|a×|b|), where a and B represent two feature center vectors, respectively, AB represents the dot product of the vectors, and|a| and|b| represent the modular length of the vectors.

Specifically, for example, the feature center distance values of all the samples calculated are sorted, and the smallest distance value and the next smallest distance value are selected.

specifically, for example, calculation is performed using the euclidean distance, and the sum of squares of the difference values of the two points in each dimension is calculated. And calculating the Euclidean distance between the historical effective frame clustering feature center and the latest two historical effective frame clustering feature centers in the feature space.

specifically, the ordering is performed, for example, by an ordering function, such as the sort () function in a function library employing python.

Specifically, for example, the cluster center point with the smallest distance is sequentially selected from the distance list, is used as an effective cluster center point, and is deleted from the distance list. Then, for the rest of the cluster center points, if the distance from the selected cluster center point is larger than the next small distance, the rest of the cluster center points are used as effective cluster center points and are deleted from the distance list. This process is repeated until the distance list is empty, i.e. all valid cluster center points have been extracted.

；

step S31: standard voice data and corresponding voice tag data are acquired;

in particular, some samples containing speech and corresponding tag data are obtained, for example from a database or an open source dataset.

specifically, for example, voice files of different sampling rates, code rates, formats, etc. are uniformly converted into the same standard format, such as a 16kHz sampling rate, 16bit PCM encoded wave format.

specifically, for example, various noise reduction algorithms such as spectral subtraction, wavelet soft threshold denoising, and the like are used to denoise the voice, so as to obtain denoised voice data.

specifically, for example, energy judgment is performed on voice, and a portion (mute section) that is too quiet is removed, leaving voice data including human voice.

Specifically, for example, the voice data is subjected to framing processing according to a fixed time length (typically 20 ms), and a piece of voice frame data is obtained.

specifically, for example, a window function weighting process such as a hamming window is performed on each frame of voice data to obtain windowed voice data.

specifically, for example, each frame of speech data is converted into a corresponding speech feature vector using various feature extraction methods such as MFCC, FBANK, etc., and phoneme combination features such as phoneme states, etc., are further extracted to generate feature vectors for training the classifier.

specifically, feature extraction and classification is performed, for example, using convolutional neural networks or other deep learning models, the primary speech recognition model is trained, and the model is optimized and parameterized.

Specifically, the model is iteratively optimized, for example, by an algorithm such as error back propagation, to obtain a speech recognition model with higher accuracy.

Specifically, a Long Short Term Memory (LSTM) network or a gate loop unit (GRU) is used as a basic unit of BiRNN, for example, to capture timing information.

specifically, the output of the neural network model is converted into a phoneme sequence, for example, using a joint timing classification (CTC) decoding algorithm.

Specifically, for example, a Beam Search algorithm is used to Search on the connected time-series phoneme sequence model, the best phoneme sequence is selected from the candidate sequences, and training and optimization are performed in combination with the voice tag data, so that a primary language voice recognition model is obtained.

specifically, for example, the word segmentation operation is performed by using a natural language processing function library such as a jieba word segmentation library and an NLTK.

specifically, the parts of speech are tagged using tools such as the harbour part of speech tagging tools, stanford POS Tagger, and the like.

specifically, for example, a dependency parsing tool, such as Stanford Parser, spaCy, etc., is used to capture the relationship and context information between words.

specifically, for example, words are divided into different vocabulary types according to semantic categories, and data such as the use frequency and accuracy of the learner on the vocabulary of the type are recorded so as to evaluate the part-of-speech use degree of the learner.

Specifically, for example, the number of words of different vocabulary types and the mastery degree of the learner on the vocabulary of the type are counted, and the familiarity of the learner on different parts of speech is calculated.

at least one processor; the method comprises the steps of,

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data mining method for language teaching, comprising the steps of:

step S1: obtaining language voice data of a learner;

2. The method according to claim 1, wherein step S2 is specifically:

step S22: enhancing the noise-reduced voice data to obtain enhanced voice data;

3. The method according to claim 2, wherein the optimizing framing process in step S23 is specifically:

carrying out framing treatment according to the enhanced voice data so as to obtain preliminary framing voice data;

clustering calculation is carried out on the current frame voice data of the preliminary framing voice data, so that current frame clustering feature data is obtained;

Identifying the current frame clustering feature data through an effective frame clustering feature identifier, so as to obtain effective voice tag data, wherein the effective voice tag data comprises qualified voice tag data, doubtful voice tag data and invalid voice tag data;

when the validity voice tag data is determined to be qualified voice tag data, the current frame voice data is determined to be framing voice data;

when the validity voice tag data is determined to be the suspicious voice tag data, performing energy spectrum validity confirmation operation on the voice data of the current frame;

and deleting the voice data of the current frame when the valid voice tag data is determined to be invalid voice tag data.

4. A method according to claim 3, wherein the step of constructing a valid frame cluster feature identifier comprises the steps of:

acquiring historical effective frame data;

clustering calculation is carried out according to the historical effective frame data, so that the historical effective frame clustering characteristic data is obtained;

extracting a central point according to the historical effective frame clustering feature data, so as to obtain a historical effective frame clustering feature central set;

extracting the optimized edge distance according to the historical effective frame clustering feature center set, so as to obtain an optimized clustering feature center set and a corresponding optimized clustering feature center distance value set;

And constructing the optimized cluster feature centers and the corresponding optimized cluster feature center distance values so as to construct an effective frame cluster sub-identifier, and carrying out coupling association on all the effective frame cluster sub-identifiers so as to obtain the effective frame cluster feature identifier.

5. The method according to claim 4, wherein the step of optimizing edge cluster extraction is specifically:

6. A method according to claim 3, wherein the energy spectrum effectiveness confirmation task performs threshold confirmation on the speech energy value generated by calculating through a speech frame energy spectrum calculation formula, wherein the speech frame energy spectrum calculation formula is specifically:

；

7. The method according to claim 1, wherein the step of constructing the language speech recognition model in step S3 is specifically:

standard voice data and corresponding voice tag data are acquired;

the same format conversion is carried out according to the standard voice data, so that the standard format voice data are obtained;

performing noise reduction calculation according to the standard format voice data so as to generate standard noise reduction voice data;

removing silence segments from the standard noise reduction voice data to generate standard voice data;

carrying out framing treatment on the standard voice data so as to obtain standard framing voice data;

windowing is carried out according to the standard framing voice data, so that standard windowing voice data are obtained;

extracting language phoneme features and language phoneme combination features according to the standard windowed voice data, so as to obtain voice phoneme features and language phoneme combination features;

Optimizing convolutional neural network mapping according to the phonetic phoneme characteristics, so as to initially construct a primary language voice recognition model;

and carrying out correction error iteration on the primary language voice recognition model according to the language phoneme combination characteristics so as to obtain the language voice recognition model.

8. The method according to claim 7, wherein the step of optimizing the convolutional neural network mapping is specifically:

performing bidirectional cyclic neural network construction according to the voice phoneme characteristics, thereby obtaining a voice neural network model;

decoding the voice neural network model to obtain a connection time sequence phoneme sequence model;

and carrying out reasoning search according to the connection time sequence phoneme sequence model and the corresponding voice tag data, thereby obtaining a primary language voice recognition model.

9. The method according to claim 1, wherein the learner-level text data includes word part-of-speech usage level data and word part-of-speech familiarity level data, and step S4 is specifically:

word segmentation is carried out according to the learner language text data, so that word data are obtained;

part-of-speech tagging is performed on the word data, so that word part-of-speech data are obtained;

labeling word part-of-speech data based on a context rule, so as to obtain word semantic data;

Carrying out vocabulary type recognition according to the word semantic data so as to obtain word part-of-speech use degree data;

and carrying out statistical calculation on the vocabulary distribution condition according to the word semantic data so as to obtain the word part familiarity degree data.

10. A data mining system for language teaching, the system comprising:

at least one processor; the method comprises the steps of,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform a data mining method for language teaching according to any one of claims 1 to 9.