CN117591944B

CN117591944B - Learning early warning method and system for big data analysis

Info

Publication number: CN117591944B
Application number: CN202410081396.0A
Authority: CN
Inventors: 洪永霖; 冯逸华; 许丽清; 宋玮
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2024-01-19
Filing date: 2024-01-19
Publication date: 2024-03-19
Anticipated expiration: 2044-01-19
Also published as: CN117591944A

Abstract

The invention relates to the technical field of education, in particular to a learning early warning method and system for big data analysis, comprising the following steps: based on student online learning activity data, a long-term and short-term memory network algorithm is adopted, and the student behavior time pattern analysis result is generated by analyzing the learning behavior data time sequence of the student, identifying pattern change and trend and performing preliminary classification of the student behavior pattern. According to the invention, the time sequence characteristics of student behaviors are accurately captured through a long-term and short-term memory network algorithm, the behavior pattern recognition accuracy is improved, the Apriori association rule learning algorithm is used for mining the internal relation between the behaviors and the achievements of the students, the C4.5 decision tree algorithm is used for enhancing the prediction of the learning difficulty of the students, providing early intervention basis, combining a Bayesian network with a community detection network analysis algorithm, improving the learning risk propagation understanding, and the graph convolution network algorithm is used for supporting the design of individual learning paths in the optimization of online learning strategies and improving the effectiveness of teaching contents and interaction strategies.

Description

Learning early warning method and system for big data analysis

Technical Field

The invention relates to the technical field of education, in particular to a learning early warning method and system for big data analysis.

Background

The technical field of education involves analyzing learning behaviors and evaluating learning progress and success of students by using big data technology, in which data science and machine learning methods are widely applied to identify challenges encountered by students in learning process, improving effectiveness and individuation of educational resources. In particular in an online learning environment, these techniques can help teachers and educational institutions better understand the learning patterns of students, thereby providing more targeted support and intervention.

The learning early warning method for big data analysis is a method for monitoring and predicting problems encountered in online learning of students by utilizing big data technology. The method is mainly used for early identifying students facing academic risks by analyzing various data such as online learning behaviors, participation degree, achievement trends and the like of the students. The method aims to improve the learning effect of students and reduce the learning rate through timely intervention and support. For example, if a student's engagement in an online course suddenly drops, or is performing poorly for several consecutive tests, the early warning system can issue a reminder to prompt the teacher or school to take action to support the student, and this early warning method is typically implemented by collecting and analyzing student's behavioral data on an online learning platform, including logging frequency, online learning duration, course interaction, job submission, and test performance, which are analyzed to identify potential learning difficulties and risks by employing data mining and machine learning algorithms.

The traditional learning early warning method aiming at big data analysis cannot effectively capture long-term dependency relationship, so that behavior pattern recognition is inaccurate, and the traditional method lacks effective tools to reveal complex association rules, so that internal connection between behaviors and achievements is difficult to understand deeply. In the aspect of risk prediction and management, the traditional method lacks real-time data processing capability, cannot identify and respond to potential learning risks in time, so that the opportunity of timely intervention is missed, and the traditional social network analysis is limited to network structure analysis of the surface in the aspect of risk propagation modes, and neglects a deep risk propagation mechanism. In the aspect of online learning strategy optimization, the traditional method lacks individuation and dynamic adaptability, and is difficult to meet diversified learning requirements.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a learning early warning method and a learning early warning system aiming at big data analysis.

In order to achieve the above purpose, the present invention adopts the following technical scheme: a learning early warning method for big data analysis comprises the following steps:

s1: based on student online learning activity data, adopting a long-term and short-term memory network algorithm, and analyzing the learning behavior data time sequence of the students, identifying mode changes and trends, and performing preliminary classification of the student behavior modes to generate student behavior time mode analysis results;

S2: based on the student behavior time pattern analysis result, adopting an Apriori association rule learning algorithm, and optimizing the relation between the student behavior and the score by analyzing and excavating frequent item sets and strong rules between the student behavior pattern and the learning score to generate a behavior and score association analysis result;

s3: based on the correlation analysis result of the behaviors and the achievements, classifying the risk grades of the students according to the correlation rules of the behaviors and the achievements by adopting a C4.5 decision tree algorithm, predicting potential learning difficulties, and generating a final risk grade prediction model;

s4: based on the final risk level prediction model, a Bayesian network algorithm is adopted, and a real-time risk monitoring result is generated by calculating the probability distribution of the learning risk and by monitoring the learning risk of the students in real time;

s5: based on the real-time risk monitoring result, adopting a community detection network analysis algorithm, and identifying key nodes and modes of risk transmission by constructing a social network model of a student group to generate a risk transmission mode analysis result;

s6: based on the risk propagation mode analysis result, adopting a graph convolution network algorithm, and optimizing course content and interaction strategies in an online learning environment by analyzing a relationship graph of student interaction and learning content to generate an optimized online learning strategy.

As a further scheme of the invention, the student behavior time mode analysis result is specifically a time distribution diagram of an online learning activity of a student, the behavior and performance correlation analysis result is specifically a correlation measurement between the learning behavior of the student and the learning performance of the student, the final risk level prediction model comprises risk characteristics, prediction accuracy and a potential intervention scheme, the real-time risk monitoring result comprises student learning risk assessment, risk variation trend and a student list, the risk propagation mode analysis result comprises a network structure diagram of risk propagation in a student group, key nodes, a risk propagation path and a risk diffusion trend, and the optimized online learning strategy comprises a risk mode, course content and learning activity adjusted by student interaction characteristics, a student interaction scheme after improvement and a course design optimized by improving the learning effect.

As a further scheme of the invention, based on the online activity data of students, a long-term and short-term memory network algorithm is adopted, and the mode change and trend are identified by analyzing the learning behavior data time sequence of the students, so as to carry out preliminary classification of the student behavior mode, and the step of generating the analysis result of the student behavior time mode is specifically as follows;

S101: based on student online learning activity data, converting the data into standard distribution by adopting a data processing method through Z score normalization, dividing time series data into windows with fixed sizes by adopting sliding window division, and carrying out data optimization processing to generate a standard chemogenetic behavior data set;

s102: based on the standard chemo-biological behavior data set, adopting a long-term and short-term memory network algorithm, processing time sequence data through a forgetting gate, an input gate and an output gate mechanism, capturing a time dependency relationship, and generating a time sequence feature analysis result;

s103: based on the time sequence feature analysis result, adopting a support vector machine algorithm, and identifying the change and trend in the student behavior mode through kernel function conversion and hyperplane optimization to generate a behavior mode identification result;

s104: based on the behavior pattern recognition result, a K-means clustering algorithm is adopted, and the student behavior pattern is classified by selecting an initial centroid and iteratively optimizing the distance from a data point to a nearest centroid, so that a student behavior pattern classification result is generated.

As a further scheme of the invention, based on the analysis result of the student behavior time mode, adopting an Apriori association rule learning algorithm, and optimizing the relation between the student behavior and the score by analyzing and excavating frequent item sets and strong rules between the student behavior mode and the learning score, wherein the step of generating the analysis result of the relationship between the behavior and the score is specifically as follows;

S201: based on the student behavior pattern classification result, adopting an association rule mining processing technology, constructing a transaction database, encoding an item set, arranging the data set, and generating a processed behavior and achievement data set;

s202: based on the processed behavior and achievement data set, adopting an Apriori algorithm, and mining a frequent item set through item set generation and support calculation to generate a frequent item set analysis result;

s203: based on the frequent item set analysis result, adopting an FP-Growth algorithm, and analyzing a strong association rule between the behavior and the score by constructing a frequent pattern tree and carrying out conditional pattern base mining to generate a strong association rule result of the behavior and the score;

s204: and optimizing the relation between the student behavior and the score by adopting a principal component analysis method based on the strong correlation rule result of the behavior and the score through eigenvalue decomposition and dimension reduction, and generating an optimization analysis result of the correlation of the behavior and the score.

As a further scheme of the invention, based on the analysis result of the correlation between the behaviors and the achievements, a C4.5 decision tree algorithm is adopted, the risk grades of students are classified according to the correlation rules of the behaviors and the achievements, potential learning difficulties are predicted, and the step of generating a final risk grade prediction model is specifically as follows;

S301: based on the behavior and score relevance optimization analysis result, adopting a data coding technology, converting data into a format required by a decision tree algorithm through single-heat coding and tag coding, and generating a coded risk analysis data set;

s302: based on the coded risk analysis data set, a C4.5 decision tree algorithm is adopted, a prediction model is constructed through information gain ratio calculation and tree pruning technology, and a preliminary risk level prediction model is generated;

s303: based on the preliminary risk level prediction model, a cross verification technology is adopted, the accuracy of the model is evaluated through data segmentation and multiple rounds of verification, and an optimized risk level prediction model is generated;

s304: and based on the optimized risk level prediction model, carrying out final classification of the risk level by adopting a Bayesian network algorithm through probability inference and causal relationship modeling, and generating a final risk level prediction model.

As a further scheme of the invention, based on the final risk level prediction model, a Bayesian network algorithm is adopted, and the steps of generating a real-time risk monitoring result are specifically as follows by calculating probability distribution of learning risks and monitoring the learning risks of students in real time;

S401: based on the final risk level prediction model, a Bayesian network algorithm is adopted, differential risk factors and correlations thereof are represented by constructing nodes and edges, probability distribution of each risk state is calculated through dependence among conditional probability expression factors, and a probability distribution calculation result is generated;

s402: based on the probability distribution calculation result, adopting a stream data processing algorithm, and capturing student behavior data, analyzing a data stream in real time to monitor the change of student behaviors, identifying potential risk states and generating real-time risk monitoring data;

s403: based on the real-time risk monitoring data, an isolated forest algorithm is adopted, the observation points are isolated through randomly selecting features and randomly dividing values, abnormal behaviors are identified, and an abnormal behavior identification result is generated;

s404: based on the abnormal behavior identification result, the dynamic change of the risk level is intuitively displayed by utilizing a data visualization tool, and the change trend of the risk level is intuitively reflected through a chart and a thermodynamic diagram form, so that a real-time risk monitoring result is generated.

As a further scheme of the invention, based on the real-time risk monitoring result, adopting a community detection network analysis algorithm, and identifying key nodes and modes of risk transmission by constructing a social network model of a student group, wherein the step of generating a risk transmission mode analysis result is specifically as follows;

S501: based on the real-time risk monitoring result, a social network analysis method is adopted, an interaction network diagram among students is constructed, interaction relations among the students are represented through nodes and edges, main connection points in a social network are identified, and a social network diagram construction result is generated;

s502: based on the social network diagram construction result, adopting Lu Mo algorithm, reducing connection between communities by optimizing the connection density in communities, identifying differential communities in a network, and generating a community detection result;

s503: based on the community detection result, simulating a susceptible person, an infected person and a remover in the infectious disease transmission process by adopting an SIR model to analyze the transmission path and speed of risks in the student group and generate a risk transmission path analysis result;

s504: based on the risk propagation path analysis result, calculating the centrality of each node in the network by using a betweenness centrality algorithm, evaluating the importance of individual nodes in the risk propagation process, and generating a risk propagation mode analysis result.

As a further scheme of the invention, based on the risk propagation mode analysis result, a graph convolution network algorithm is adopted, and the course content and the interaction strategy in the online learning environment are optimized by analyzing the relationship graph of student interaction and learning content, so that the step of generating the optimized online learning strategy is specifically as follows;

S601: based on the risk propagation mode analysis result, a graph convolution network algorithm is adopted to represent the relationship between student interaction and learning content as a graph structure, and the interaction mode is analyzed by applying convolution operation to extract characteristics on the graph to generate a student interaction mode analysis result;

s602: based on the student interaction mode analysis result, adopting an Apriori algorithm to analyze association rules between course content and student interaction, identifying key association points between the student interaction and learning content, and generating a content interaction relation analysis result;

s603: based on the analysis result of the content interaction relationship, adopting a natural language processing technology, analyzing student feedback by an emotion analysis method, and adjusting course content to be more fit with student demands to generate a course content adjustment scheme;

s604: based on the course content adjustment scheme, a self-adaptive learning model is adopted, a teaching strategy and an interaction mode are adjusted according to feedback and interaction conditions of students, an online learning effect is optimized, and an optimized online learning strategy is generated.

The learning early warning system for big data analysis is used for executing the learning early warning method for big data analysis, and comprises a behavior pattern analysis module, a score association mining module, a risk grade classification module, a risk probability calculation module, a real-time monitoring module, a social network analysis module and a content strategy optimization module.

As a further scheme of the invention, the behavior pattern analysis module analyzes the time sequence of the login frequency and the learning duration of the students by adopting a long-short-period memory network algorithm based on the online learning activity data of the students, and the LSTM algorithm discovers the periodicity and the abnormal points of the behaviors of the students by identifying the pattern change and the trend in the time sequence and generates a behavior pattern identification result;

the score association mining module analyzes the relation between the student behaviors and the scores by adopting an Apriori algorithm based on the behavior pattern recognition result, and reveals the direct connection between the behavior pattern and the scores by mining frequent item sets and strong association rules to generate score influence factor mapping;

the risk level classification module classifies the risk level of the student based on the score influence factor mapping by adopting a C4.5 decision tree algorithm, establishes a decision tree according to the relation between the behavior and the score, predicts the learning risk of the student and generates a risk classification model;

the risk probability calculation module calculates probability distribution of risks by adopting a Bayesian network algorithm based on a risk classification model, and allocates probability values to each type of risks through probability reasoning to generate a risk probability analysis result;

the real-time monitoring module adopts a dynamic Bayesian network to conduct real-time monitoring of student learning risks based on risk probability analysis results, predicts risk variation trend by updating risk data in real time, and generates real-time risk monitoring data;

The social network analysis module is used for analyzing social networks of student groups by adopting a community detection network analysis algorithm based on real-time risk monitoring data, and the community detection algorithm is used for identifying target social nodes and paths of risk propagation and generating social risk propagation analysis results;

the content policy optimization module optimizes course content and interaction policy by adopting a graph convolution network algorithm based on social risk propagation analysis results, and the graph convolution network adjusts teaching policy by analyzing student interaction and learning content relation graphs to generate an optimized teaching policy scheme.

Compared with the prior art, the invention has the advantages and positive effects that:

according to the invention, by applying the long-term memory network algorithm in the analysis of the student behavior time pattern, the time sequence characteristics of the student behavior can be captured more accurately, and the accuracy of behavior pattern recognition is improved. The Apriori association rule learning algorithm optimizes the correlation analysis between the behavior mode and the learning score on the aspect of excavating the relation between the behaviors and the scores of the students, so that the internal relation between the behaviors and the scores is revealed more effectively. The application of the C4.5 decision tree algorithm in the risk level prediction enhances the prediction capability of potential learning difficulty of students and provides a reliable basis for early intervention. The Bayesian network algorithm and the community detection network analysis algorithm are combined with real-time monitoring and social network analysis, so that understanding of learning risk propagation is enhanced, and a more comprehensive view angle is provided for risk management. The application of the graph rolling network algorithm in optimizing the online learning strategy provides data support for the personalized learning path design, and the pertinence and the effectiveness of the online teaching content and the interaction strategy are enhanced.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention;

FIG. 2 is a S1 refinement flowchart of the present invention;

FIG. 3 is a S2 refinement flowchart of the present invention;

FIG. 4 is a S3 refinement flowchart of the present invention;

FIG. 5 is a S4 refinement flowchart of the present invention;

FIG. 6 is a S5 refinement flowchart of the present invention;

FIG. 7 is a S6 refinement flowchart of the present invention;

fig. 8 is a system flow diagram of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In the description of the present invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention. Furthermore, in the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Embodiment one: referring to fig. 1, the present invention provides a technical solution: a learning early warning method for big data analysis comprises the following steps:

s2: based on the analysis result of the student behavior time mode, adopting an Apriori association rule learning algorithm, and optimizing the relation between the student behavior and the score by analyzing and excavating frequent item sets and strong rules between the student behavior mode and the learning score to generate an analysis result of the relationship between the behavior and the score;

s4: based on the risk level prediction model, a Bayesian network algorithm is adopted, and a real-time risk monitoring result is generated by calculating the probability distribution of learning risks and by monitoring the learning risks of students in real time;

S5: based on the real-time risk monitoring result, adopting a community detection network analysis algorithm, and identifying key nodes and modes of risk propagation by constructing a social network model of a student group to generate a risk propagation mode analysis result;

s6: based on the risk propagation mode analysis result, adopting a graph convolution network algorithm, optimizing course content and interaction strategies in an online learning environment by analyzing a relationship graph of student interaction and learning content, and generating an optimized online learning strategy.

The student behavior time mode analysis result is specifically a time distribution diagram of an online learning activity of a student, the behavior and performance correlation analysis result is specifically a correlation measurement between the learning behavior of the student and the learning performance of the student, the risk level prediction model comprises risk characteristics, prediction accuracy and a potential intervention scheme, the real-time risk monitoring result comprises student learning risk assessment, risk change trend and a student list, the risk propagation mode analysis result comprises a network structure diagram of risk propagation in a student group, key nodes, risk propagation paths and risk diffusion trend, and the optimized online learning strategy comprises a risk mode, course content and learning activity adjusted by student interaction characteristics, an improved student interaction scheme and a course design optimized by improving learning effect.

In the step S1, the online learning activity data of the students is processed through a long-short-term memory network (LSTM) algorithm, and the LSTM algorithm is particularly suitable for processing time series data and can capture long-term dependency relations in learning behavior data. The specific operation comprises the setting of an input layer, a forgetting door, an input door and an output door. The input layer receives standardized learning activity data including login duration, number of browsed pages, and number of participation discussions. The forget gate decides which information to discard from the cell state, the input gate updates the cell state, and the output gate generates an output based on the current cell state. The process realizes the preliminary classification of the student behavior time patterns by gradually learning and recognizing the pattern change and trend of the student behaviors, and finally generates the analysis result of the student behavior time patterns. The result is a time distribution graph that reveals dynamic changes in student behavior patterns, providing underlying data for subsequent steps.

In the S2 step, the analysis result of the student behavior time mode is processed by adopting an Apriori association rule learning algorithm, the Apriori algorithm discovers frequent item sets in an iterative mode, firstly, frequent item sets of single items are generated, then, the number of the items is gradually increased, and the non-frequent item sets are removed in the pruning process. The algorithm is executed on the student behavior time mode data, and the association rule between the learning behavior and the learning score is identified by calculating the support degree and the confidence degree between the item sets. The process finally generates a behavior and score association analysis result, wherein the analysis result comprises frequent item sets and strong association rules, and the relationship between the student behavior and the score is revealed, so that a basis is provided for optimizing a learning strategy.

In the S3 step, the analysis result of the correlation between the behavior and the achievement is processed through a C4.5 decision tree algorithm, the C4.5 algorithm firstly divides the data set into a training set and a testing set, and the optimal splitting attribute is selected by using the information gain ratio to construct a decision tree. In the tree construction process, the algorithm considers association rules of different behaviors and achievements, students are divided into different risk grades, and after the decision tree is generated, the classification accuracy of the model is evaluated through verification of a test set. The finally generated risk level prediction model comprises risk characteristics, prediction accuracy and potential intervention schemes, and can predict potential learning difficulties of students.

In step S4, the output of the risk level prediction model is processed through a Bayesian network algorithm, and the algorithm constructs a probability graph model, wherein nodes represent different risk factors, and edges represent probability relations among the factors. The algorithm evaluates various risk factors faced by the student by calculating a conditional probability distribution. By monitoring these risk factors in real time, the algorithm can generate real-time risk monitoring results, including student learning risk assessment, risk variation trend and student list, providing real-time data support for educational intervention.

In the S5 step, real-time risk monitoring results are processed through a community detection network analysis algorithm, the algorithm identifies key nodes and modes of risk propagation by constructing a social network model of a student group, and the algorithm divides the network into different communities and identifies central nodes and edge nodes based on the connection density of the nodes in the network. By analyzing the nodes and the relations, the algorithm can generate risk propagation mode analysis results, including network structure diagrams, key nodes, risk propagation paths and risk diffusion trends, and provide policy basis for controlling and preventing risk propagation.

In step S6, the risk propagation pattern analysis result is processed through a graph rolling network (GCN) algorithm. The GCN algorithm can effectively process the graph structure data, and by executing convolution operation on nodes in the graph, the characteristic representation of the nodes is learned, the algorithm analyzes the relation graph of student interaction and learning content, and the course content and interaction strategy in the online learning environment are optimized. In this way, the algorithm generates an optimized online learning strategy, including course content, student interaction characteristics and an improved student interaction scheme, which are adjusted based on the risk mode, and provides support for improving learning effect and optimizing course design.

Referring to fig. 2, based on online activity data of a student, a long-short-term memory network algorithm is adopted, mode change and trend are identified by analyzing a learning behavior data time sequence of the student, preliminary classification of the student behavior mode is performed, and a student behavior time mode analysis result is generated specifically by the steps of;

s102: based on a standard chemo-biological behavior data set, adopting a long-term and short-term memory network algorithm, processing time sequence data through a forgetting gate, an input gate and an output gate mechanism, capturing a time dependency relationship, and generating a time sequence characteristic analysis result;

In the S101 substep, the online learning activity data of the students are processed by adopting a data processing method, and the original data are converted into standard distribution through Z score normalization. The step involves subtracting the average value of the overall data from each data point, dividing the data point by the standard deviation, and the processed data point represents the deviation degree of the data point from the average value of the data set, and is represented by the standard fraction, so that the comparison and analysis between data sets of different scales or units are facilitated. For example, the window size is set to 30 minutes, the step size is 5 minutes, and each window contains all data points within 30 minutes. After the sliding window is segmented, data optimization processing is carried out, noise and abnormal values are removed, and the quality and reliability of a data set are improved. The finally generated standardized student behavior data set provides an accurate and efficient data base for the subsequent steps.

In the step S102, based on the standard chemo-behavioral data set, a long-short-term memory network (LSTM) algorithm is used to process the time-series data, where LSTM is a special Recurrent Neural Network (RNN) capable of learning long-term dependencies in the time-series data. The core of the LSTM network is a unit consisting of a forget gate, an input gate and an output gate. The forget gate is responsible for deciding which information should be forgotten, the input gate decides which new information should be added to the cell state, and the output gate controls the transition from the cell state to the output. The operation of each gate depends on the activation function of the gate unit, typically a sigmoid function, producing a value between 0 and 1, determining how much information each gate unit releases. With the structure, the LSTM can effectively capture the time dependence in the time sequence data and generate a time sequence characteristic analysis result. This result reveals not only the dynamic changes in the learning behavior of the student, but also provides a key temporal feature for behavior pattern recognition.

In the step S103, based on the time series feature analysis result, a Support Vector Machine (SVM) algorithm is used to identify the variation and trend in the student behavior mode, the SVM is a supervised learning algorithm mainly used for classification and regression analysis, and the SVM maps the original feature space to a higher-dimensional space through a kernel function, and searches for an optimal classification hyperplane in the new space. This determination of hyperplane depends on the maximization of support vectors and boundary gaps, which helps to improve classification accuracy and generalization ability. When the SVM processes student behavior pattern recognition, a complex nonlinear relation is processed through kernel function conversion, an optimal hyperplane is searched by using an optimization algorithm, different behavior patterns are effectively distinguished, and the generated behavior pattern recognition result can accurately reflect the diversity and complexity of student behaviors.

In the step S104, based on the behavior pattern recognition result, a K-means clustering algorithm is adopted to classify the behavior patterns of the students, the K-means clustering is an unsupervised learning algorithm, and the data are divided into K clusters in an iterative mode. The algorithm first randomly selects K data points as the initial centroid, then calculates the distance of each data point to each centroid, and assigns the data points to the cluster where the nearest centroid is located. The centroid of each cluster is then recalculated and the process is repeated until the centroid position stabilizes or a preset number of iterations is reached. In this way, the K-means clustering algorithm can classify the student behavior patterns with similar characteristics into one class, and the generated classification result of the student behavior patterns is helpful for understanding the overall structure and the class of the student behaviors, so that the basis is provided for personalized learning intervention and support.

Referring to fig. 3, based on the analysis result of the student behavior time pattern, adopting an Apriori association rule learning algorithm, and optimizing the relationship between the student behavior and the score by analyzing and mining frequent item sets and strong rules between the student behavior pattern and the learning score, wherein the step of generating the analysis result of the relationship between the behavior and the score is specifically as follows;

s201: based on student behavior pattern classification results, adopting an association rule mining processing technology, and generating a processed behavior and achievement data set by constructing a transaction database, encoding an item set and arranging the data set;

s202: based on the processed behavior and achievement data set, adopting an Apriori algorithm, and mining frequent item sets through item set generation and support calculation to generate frequent item set analysis results;

s204: based on the strong association rule result of the behaviors and the achievements, a principal component analysis method is adopted, the relationship between the behaviors and the achievements of the students is optimized through eigenvalue decomposition and dimension reduction, and an optimization analysis result of the association of the behaviors and the achievements is generated.

In the sub-step S201, the classification result of the student behavior pattern is analyzed by adopting the association rule mining processing technology. A transaction database is constructed that organizes data into a series of transactions, each transaction containing a set of items. For example, a transaction contains a specific student's behavioral patterns and their corresponding achievements, encodes a set of items, converts the various behavioral patterns and achievements into unique identifiers for computer processing, collates and converts the data sets into association rule mining, generates a processed behavioral and achievements data set detailing the correspondence between the various behavioral patterns and achievements, and provides a basis for mining potential associations between behavioral and achievements.

In the step S202, based on the processed behavior and achievement data sets, frequent item sets are mined by adopting an Apriori algorithm, the Apriori algorithm is firstly generated through the item sets, and gradually expands to larger item sets from a single item, and meanwhile, the support degree of each item set is calculated. Support refers to the frequency of occurrence of a set of items in all transactions, e.g., one set of items is a combination of a particular behavioral pattern and excellent performance, and support calculations indicate the frequency of occurrence of such a combination in all transactions. Only those term sets with a support above a preset threshold are considered frequent. The core of the Apriori algorithm is the pruning technique that it employs, i.e., if one item set is not frequent, then all larger item sets containing that item set must also be infrequent. Therefore, the number of item sets to be checked can be greatly reduced, the generated frequent item set analysis results reveal in detail which behavior patterns and score level combinations are common in student groups, and a foundation is laid for in-depth analysis of the relationship between behaviors and scores.

In the sub-step S203, based on the analysis result of the frequent item set, a strong association rule between the behavior and the achievement is analyzed by using an FP-Growth algorithm, which, unlike the Apriori algorithm, does not need to generate a candidate set, thereby avoiding a large number of scan data set operations, the algorithm first builds a frequent pattern tree (FP-tree), which is a tree structure of a compressed data set, wherein information of all the frequent item sets is contained, and the algorithm mines the frequent item sets by recursively building a conditional pattern base and corresponding conditional FP-tree. The conditional schema base is a subset of the database that is conditioned on one frequent item and the remaining items. This process is repeated on each frequent item until all frequent items are considered, and the FP-Growth algorithm can efficiently discover the frequent item set and further analyze the strong association rule between the behavior and the score. The generated strong association rule result of the behavior and the achievement provides important insight for understanding and optimizing the learning behavior of the students.

In the sub-step S204, a Principal Component Analysis (PCA) method is employed to optimize the relationship of student behavior and performance based on the behavior and performance strong association rule results. PCA is a statistical technique used for data reduction and feature extraction. Through eigenvalue decomposition of the data set, the PCA can find out the main direction of change (i.e., principal components) in the data, reconstruct the data according to these principal components, and the dimensions of the original data set are reduced while retaining the most important change information. For the correlation analysis of student behavior and performance, PCA can identify behavior pattern features which can most influence performance, and the understanding of the relationship between student behavior and performance is optimized through the combination of the main features. The generated behavior and performance relevance optimization analysis results provide a clearer and simplified view, and help the educators and students to better understand and improve the learning behaviors of the students.

The set of processed behavior and performance data sets contains various student behavior patterns and corresponding performance levels, for example, one data item is: the method comprises the steps of analyzing data of long learning time and high participation degree and score grade A by using an association rule mining technology, identifying a frequently-occurring behavior mode and score combination, and mining out a strong association rule by using an Apriori algorithm and an FP-Growth algorithm, wherein the high learning time and the high participation degree are generally in strong association with the score grade A. And further optimizing the relations by applying a principal component analysis method, and extracting behavior pattern characteristics with the greatest influence on the achievement. The finally generated behavior and score relevance optimization analysis result clearly indicates that a significant correlation exists between certain specific learning behavior modes and high scores, and provides basis for formulating effective teaching strategies and learning intervention measures.

Referring to fig. 4, based on the analysis result of the correlation between the behaviors and the achievements, a C4.5 decision tree algorithm is adopted to classify the risk grades of the students according to the correlation rule of the behaviors and the achievements, and the step of predicting the potential learning difficulty and generating a final risk grade prediction model is specifically as follows;

s301: based on the correlation optimization analysis result of the behaviors and the achievements, adopting a data coding technology, converting the data into a format required by a decision tree algorithm through single-hot coding and tag coding, and generating a coded risk analysis data set;

s304: based on the optimized risk level prediction model, a Bayesian network algorithm is adopted, and final classification of risk levels is carried out through probability inference and causal relationship modeling, so that a final risk level prediction model is generated.

In the step S301, the result of the optimization analysis of the relevance of the behavior and the achievement is converted into the format required by the decision tree algorithm by a data encoding technique, and the key of the data encoding is to convert the classified data into a digital form so that the algorithm can process effectively. Two main encoding techniques are single-hot encoding and tag encoding. The one-hot encoding converts each class into a vector of 0 and 1, with one position being 1 and the others being 0, in a manner that is applicable to class variables that have no sequential relationship. The label coding assigns each category with a unique integer, is suitable for categories with certain orders or sequences, and the student behavior mode and the score level are converted into numerical forms to generate a coded risk analysis data set. This dataset will be used for the construction of further risk level prediction models, where each data point details the coded behavior patterns and performance relationships.

In the step S302, a prediction model is built by using a C4.5 decision tree algorithm based on the encoded risk analysis dataset, C4.5 is a decision tree generation algorithm, and features for dividing the data are selected by an information gain ratio, which is a concept based on information entropy, for evaluating the effectiveness of the features for dividing the dataset. The algorithm first calculates the information gain ratio for each feature, selects the feature with the highest gain ratio to segment the data set, and then repeats this process on the segmented subset until a stop condition is met. The C4.5 algorithm also includes tree pruning techniques to remove portions of the decision tree that lead to overfitting. Tree pruning simplifies decision trees by deleting those nodes or branches that have little impact on the final decision. Through this series of processes, a preliminary risk level prediction model is generated that is able to predict the risk level that a student may be exposed to based on their behavioral patterns and performance relationships.

In the sub-step S303, the accuracy of the model is evaluated using a cross-validation technique based on the preliminary risk level prediction model, which is a model evaluation method that performs multiple rounds of validation by dividing the data into a plurality of parts. The data set is divided into a training set and a testing set, the model is trained on the training set, the test is carried out on the testing set, the cross validation is repeatedly carried out for a plurality of rounds, and different data subsets are selected as the testing set each time. The method can reduce deviation and variance in the model evaluation process, improve the robustness and reliability of the model, and generate an optimized risk level prediction model through cross verification, so that the performance of the model is comprehensively verified and optimized, and the accuracy and generalization capability are improved.

In the step S304, based on the optimized risk level prediction model, a bayesian network algorithm is adopted to perform final risk level classification. Bayesian networks are graph models based on probability inference and are used for representing causal relationships among variables, and risk levels are classified more finely by using bayesian network algorithms. Algorithms are derived by constructing a network containing various variables and then based on probability theory. The method not only considers the direct relation among variables, but also considers the potential association, thereby providing a more comprehensive and accurate risk level prediction mode. The generated risk level prediction model can accurately classify different risk levels, and provides important decision support for risk management and intervention measures of students.

The encoded risk analysis dataset is assumed to contain a combination of behavior patterns and performance relationships, including "high engagement-high performance", "low engagement-low performance". In step S302, the C4.5 algorithm first calculates the information gain ratio of these features, then divides the data based on the most efficient features, and builds a decision tree. In step S303, the model is evaluated and optimized by cross-validation techniques, ensuring its accuracy and generalization ability. Finally, in step S304, the bayesian network algorithm further refines the classification of the risk level, and generates a final risk level prediction model. The model can clearly display the influence of different behavioral patterns and achievement combinations on the risk level, and helps educational and students to know and manage potential learning risks.

Referring to fig. 5, based on a risk level prediction model, a bayesian network algorithm is adopted, and the steps of generating a real-time risk monitoring result by calculating probability distribution of learning risks and real-time monitoring of learning risks of students are specifically as follows;

s401: based on a risk level prediction model, a Bayesian network algorithm is adopted, differential risk factors and correlations thereof are represented by constructing nodes and edges, probability distribution of each risk state is calculated through dependence among conditional probability expression factors, and a probability distribution calculation result is generated;

s402: based on the probability distribution calculation result, adopting a stream data processing algorithm, and analyzing the data stream in real time by capturing student behavior data to monitor the change of student behaviors, and identifying potential risk states to generate real-time risk monitoring data;

s403: based on real-time risk monitoring data, an isolated forest algorithm is adopted, an observation point is isolated through randomly selecting characteristics and randomly dividing values, abnormal behaviors are identified, and an abnormal behavior identification result is generated;

s404: based on the abnormal behavior recognition result, the dynamic change of the risk level is intuitively displayed by utilizing the data visualization tool, and the change trend of the risk level is intuitively reflected through a chart and a thermodynamic diagram form, so that a real-time risk monitoring result is generated.

In the S401 substep, the risk level prediction model is processed by a bayesian network algorithm. A bayesian network is a probabilistic graph model that represents variables (here, differential risk factors) and their interrelationships by building nodes and edges. Nodes represent risk factors including student learning behavior, performance, and edges represent probabilistic relationships between these factors. The core of the algorithm is to construct a Conditional Probability Table (CPT), each entry in the table representing the probability of the occurrence of some factors (child nodes) given the other factors (parent nodes). For example, the CPT may contain a probability that a performance will occur under a particular learning behavior. The algorithm calculates the probability distribution of each risk state through the probabilistic reasoning of the network. This process involves calculating joint probability distributions using bayesian theorem and probability chain rules, resulting in estimates of the probabilities of the different risk states. The generated probability distribution calculation results provide a quantitative basis for understanding and predicting the risk status of students.

In the sub-step S402, the change of student behavior is monitored in real time using a stream data processing algorithm based on the probability distribution calculation result. Stream data processing algorithms focus on processing continuously arriving data streams, such as student behavior data captured in real-time. These data streams are analyzed in real-time to monitor any abnormal changes in student behavior and identify potential risk conditions. The algorithm uses window functions in the process that define the range over which data is observed over a particular period of time. For example, a time window may be set to analyze daily or weekly student behavior patterns. The data stream is processed within these windows and compared to the previously calculated risk state probabilities to identify possible risks. The generated real-time risk monitoring data provides support for real-time intervention and risk management.

In the sub-step S403, based on the real-time risk monitoring data, an isolated forest algorithm is used to identify abnormal behavior, and an isolated forest is an effective abnormality detection algorithm that isolates observation points by randomly selecting features and randomly classifying values. The algorithm constructs a plurality of isolated trees, each tree is divided by randomly selecting a feature and randomly selecting a segmentation point in the value range of the feature, and the process is repeated until a single point is isolated or reaches the preset tree depth limit, and the isolation degree is used as the basis of abnormal scoring: points that are more easily isolated are considered more outliers. In this way, the isolated forest algorithm can effectively identify abnormal behaviors which are obviously different from most data, and the generated abnormal behavior identification result provides basis for timely identifying and intervening abnormal student behaviors.

In the step S404, based on the abnormal behavior recognition result, the dynamic change of the risk level is intuitively displayed by using the data visualization tool. Data visualization is the process of converting complex data sets into more easily understood graphical representations, and the trend of risk level changes is intuitively revealed using charts, thermodynamic diagrams, and the like. The chart can show the change of risk levels in the time sequence, and the thermodynamic diagram can display different colors according to the levels of the risks, so that the distribution and the density of the risks are visually represented, and the visual results not only enable risk monitoring to be more visual and easy to understand, but also help education workers and students to quickly identify and respond to potential risks, and therefore more effective risk management and intervention are achieved.

A scenario is envisaged in which a risk level prediction model is constructed from on-line learning behavior and performance information of students, and in step S401, a bayesian network algorithm calculates risk state probabilities based on these behaviors and performances. Then in step S402, the stream data processing algorithm captures the student' S online behavior in real time, including suddenly reduced login times, and analyzes the data in real time through a set time window. In step S403, the orphan forest algorithm identifies such abrupt behavior changes as anomalies, and in step S404, this information is visualized, for example by thermodynamic diagrams, to show the distribution of risk levels in the student population, so that the educator can quickly identify and focus on those student populations at higher risk. The series of processes provide a comprehensive solution for real-time monitoring and response to student learning risks.

Referring to fig. 6, based on real-time risk monitoring results, a community detection network analysis algorithm is adopted, key nodes and modes of risk propagation are identified by constructing a social network model of a student group, and a step of generating a risk propagation mode analysis result is specifically as follows;

s501: based on real-time risk monitoring results, a social network analysis method is adopted, interaction network diagrams among students are constructed, interaction relations among the students are represented through nodes and edges, main connection points in a social network are identified, and a social network diagram construction result is generated;

S502: based on the social network diagram construction result, adopting Lu Mo algorithm, reducing the connection between communities by optimizing the connection density in communities, identifying the differential communities in the network, and generating a community detection result;

s503: based on community detection results, simulating susceptible persons, infected persons and removers in the infectious disease transmission process by adopting an SIR model to analyze the transmission paths and speeds of risks in student groups and generate risk transmission path analysis results;

In the step S501, real-time risk monitoring results are processed through a social network analysis method, and an interaction network diagram among students is constructed, which is realized through analysis of social interaction data among the students, wherein nodes in the network diagram represent the students, and edges represent interaction relations among the students. In the construction process, the data format used is typically an edge list or adjacency matrix. The side list records the information of each pair of interactive students, the adjacent matrix displays the connection state among the nodes in a matrix form, the interaction mode among chemical students can be visualized through the construction of a network diagram, main connection points in a social network are identified, for example, core students which interact with a plurality of other students, and the generated social network diagram construction result not only reveals the social structures among the students, but also provides a basis for identifying key individuals affecting risk propagation.

In the step S502, based on the social network diagram construction result, community detection is performed by adopting Lu Mo algorithm, and Lu Mo algorithm is a community detection method based on modularity optimization, and the community structure in the network is identified by optimizing the density of connections in communities and reducing the connections between communities. The algorithm first assigns each node to an independent community and then locally optimizes the community affiliation of each node to improve the overall modularity of the network. This process is repeated until the modularity reaches a local maximum. The result of community detection is to divide the network into several densely connected communities that reflect natural aggregation among students based on common interests or behaviors. The generated community detection result is helpful for understanding the internal structure of the student social network, and key information is provided for further analysis of risk propagation.

In the sub-step S503, based on the community detection result, the propagation path and speed of risk in the student population are analyzed using SIR (susceptible-infected-removed) model, which is a classical infectious disease propagation model for simulating the change of state of an individual in the course of infection. In this application, "susceptible" is understood to be a student that has not been exposed to risk, an "infected person" is a student that has been exposed to risk, and a "remover" is a student that is no longer at risk. The model simulates the spread of risk by setting the infection rate and recovery rate. The infection rate determines the probability of a susceptible person becoming an infected person, while the recovery rate determines the rate of removal of the infected person. Through simulation of the SIR model, the risk propagation paths and speeds under different social network structures can be analyzed, and a risk propagation path analysis result is generated. This helps understand the propagation dynamics of risks in student populations, providing scientific basis for the formulation of intervention strategies.

In the sub-step S504, the centrality of each node in the network is calculated using a betweenness centrality algorithm based on the risk propagation path analysis result. The betting center is an index for measuring importance of a node in a network, and the frequency of occurrence of a node in all shortest paths is calculated. A high-betweenness node is shown to play a key role in connecting different parts of the network. In risk propagation pattern analysis, this means that these nodes are critical mediators of risk propagation. By calculating the betweenness centrality of each node in the network, the importance of individual nodes in the risk propagation process can be evaluated, and a risk propagation mode analysis result can be generated. This result provides a basis for identifying students that play a critical role in risk spread so that interventions can be targeted.

Assume that in a school, students learn and interact through an online learning platform. In step S501, a social network diagram is constructed by analyzing the discussion frequency and interaction pattern of students in a forum. In step S502, the application Lu Mo algorithm identifies a community of students based on a common interest or course. In step S503, propagation paths of learning risks in these communities are simulated by SIR models, and in step S504, student nodes that play a key role in risk propagation are identified by calculating the betting centrality. This series of analyses helps school administrators identify and understand the patterns of risk propagation among student populations, thereby formulating effective interventions.

Referring to fig. 7, based on the risk propagation mode analysis result, adopting a graph convolution network algorithm to optimize course content and interaction strategy in an online learning environment by analyzing a relationship graph of student interaction and learning content, and generating an optimized online learning strategy specifically comprises the steps of;

s602: based on the analysis result of the student interaction mode, adopting an Apriori algorithm to analyze association rules between course content and student interaction, identifying key association points between the student interaction and learning content, and generating a content interaction relation analysis result;

s603: based on the analysis result of the content interaction relationship, adopting a natural language processing technology, analyzing student feedback by an emotion analysis method, and adjusting course content to be more suitable for student demands to generate a course content adjustment scheme;

In a sub-step S601, the result of the risk propagation pattern is analyzed using a graph rolling network (GCN) algorithm. A graph convolutional network is a neural network that processes graph structure data, and is capable of efficiently extracting features on a graph structure. First, the relationship between student interactions and learning content is represented as a graph structure, with nodes representing students or learning content, e.g., if a student participates in a particular discussion forum or group, which forms an edge between the student and the forum or group, the GCN applies convolution operations on the graph to extract features, which involve aggregating information of neighboring nodes at each node, and learning the representation of the node by nonlinear transformation. The method can capture complex dependency relations among nodes, so that interaction modes among students can be analyzed. The generated analysis result of the student interaction mode reveals the structure and characteristics of student interaction, and provides a deep view for understanding how students interact in a network environment.

In the step S602, based on the analysis result of the student interaction mode, the association rule between the course content and the student interaction is analyzed by adopting an Apriori algorithm. The Apriori algorithm is a classical association rule mining algorithm that discovers frequent patterns and association rules between items from a large amount of data. In this process, the algorithm first identifies the frequently occurring sets of items and then generates association rules based on these frequent sets of items. These rules help identify key points of association between student interactions and learning content, for example, a particular discussion thread will attract the participation of a particular group of students. The generated content interaction relation analysis result provides a key hole for optimizing course content and enhancing student participation.

In the step S603, based on the analysis result of the content interaction relationship, the student feedback is analyzed by using a natural language processing technique. Natural language processing techniques include emotion analysis, which may be accomplished by analyzing the student's text in a forum, job feedback, or assessment. Emotion analysis aims at identifying and extracting emotion tendencies in text, including positive, negative or neutral. This process typically involves text preprocessing, feature extraction, and emotion classification, which can provide insight into the student's feelings and needs of course content, and further adjust the course content to more closely match the student's needs. The generated course content adjustment scheme aims at enhancing the learning experience of students and improving the learning effect.

In the step S604, based on the course content adjustment scheme, the online learning strategy is optimized by adopting an adaptive learning model, which is a method for dynamically adjusting the teaching strategy according to feedback and interaction conditions of students. This process involves analyzing the student's learning behavior, performance, and feedback, and then adjusting the teaching content, difficulty, and resources provided based on these data. For example, if the data indicates that most students are experiencing difficulty on a certain course topic, the teaching strategy is adjusted to provide more resources and exercises about that topic. By the mode, the optimized online learning strategy can more effectively meet individual learning demands of students, and the overall effect of teaching is improved.

Assuming an online learning platform, which includes a plurality of courses and interactive forums, in step S601, interaction patterns of students in the different courses and forums are analyzed through a graph convolution network. In step S602, the Apriori algorithm reveals association rules between students engaging in particular forum activities and course content. In step S603, the natural language processing technique analyzes the student feedback on the course, indicating the area in need of improvement, and in step S604, the adaptive learning model adjusts teaching strategies based on the analysis results, for example, increases custom content and interaction opportunities for student demands. The series of steps effectively improves the teaching quality of the online learning platform and the learning experience of students.

Referring to fig. 8, a learning early warning system for big data analysis is used for executing the learning early warning method for big data analysis, and the system includes a behavior pattern analysis module, a score association mining module, a risk level classification module, a risk probability calculation module, a real-time monitoring module, a social network analysis module, and a content policy optimization module.

The behavior pattern analysis module analyzes a time sequence of student login frequency and learning duration by adopting a long-short-period memory network algorithm based on student online learning activity data, and the LSTM algorithm discovers periodicity and abnormal points of student behaviors by identifying pattern changes and trends in the time sequence to generate behavior pattern identification results;

the risk probability calculation module calculates probability distribution of risks by adopting a Bayesian network algorithm based on the risk classification model, and allocates probability values to each type of risks through probability reasoning to generate a risk probability analysis result;

the real-time monitoring module is used for carrying out real-time monitoring on learning risks of students by adopting a dynamic Bayesian network based on risk probability analysis results, and predicting risk variation trend by updating risk data in real time to generate real-time risk monitoring data;

In the behavior pattern analysis module, student online learning activity data is processed through a long short term memory network (LSTM) algorithm. The LSTM algorithm is particularly suitable for time series data because of its ability to learn and memorize long-term dependencies. The data format is usually time sequence, and records the information such as the login frequency and the learning duration of the student. LSTM identifies pattern changes and trends, particularly periodicity and outliers, by analyzing these time series data. For example, the algorithm identifies significant changes in learning time before and after the end of the session. Through the analysis, the generated behavior pattern recognition result not only reveals the learning habit and behavior pattern of the student, but also can find potential learning difficulty or risk, and provides basis for subsequent intervention.

In the score association mining module, an Apriori algorithm is adopted to analyze the relationship between student behaviors and scores. The module uses the behavior pattern recognition result as input and combines the student's achievement data. The data format is a record containing behavioral patterns and achievements. The Apriori algorithm reveals the direct link of behavior patterns to achievements by mining frequent item sets and strong association rules. For example, students who learn frequently stay up night have reduced performance. The resulting performance impact factor map provides the educator with insight as to which specific behavioral patterns impact the performance of the study.

The risk level classification module classifies the risk level of the student by adopting a C4.5 decision tree algorithm. The input data is a performance influencing factor map, and the format is a structured data set, which contains the student's behavior patterns and performance. And C4.5, predicting the learning risk of the student according to the relation between the behavior and the achievement by constructing a decision tree by the algorithm. The algorithm selects the optimal splitting attribute according to the information gain ratio, and generates a risk classification model. The model can effectively classify the risk level of students and provide support for implementing targeted teaching strategies and intervention measures.

The risk probability calculation module calculates probability distribution of risks by adopting a Bayesian network algorithm based on the risk classification model, wherein the input data of the module is output of the risk classification model and is the risk level of students, and the Bayesian network represents the risk state by constructing nodes (representing different risk levels) and edges (representing risk transition probabilities). The algorithm calculates the probability value of each type of risk through probability reasoning, and generates a risk probability analysis result. This result helps to more accurately assess and quantify the risks faced by students, providing data support for the formulation of corresponding preventive and interventional measures.

The real-time monitoring module monitors the learning risk of the students in real time by using a dynamic Bayesian network. The module receives as input the risk probability analysis results in the form of probability distribution and time series data. The dynamic Bayesian network can process time sequence data, and changes of risk states are reflected in real time through continuously updated data streams. The module can predict the development trend of risks and generate real-time risk monitoring data, thereby providing a powerful tool for timely responding and processing learning risks.

The social network analysis module analyzes social networks of student groups by adopting a community detection network analysis algorithm. The input data are real-time risk monitoring data, and the data format is social interaction records. By analyzing social interactions among students, the community detection algorithm can identify target social nodes and paths of risk propagation, and social risk propagation analysis results are generated. This result is crucial to understanding the mechanism by which risks propagate in a community of students, helping to formulate effective social intervention strategies.

And the content strategy optimization module optimizes course content and interaction strategies through a graph convolution network algorithm. The module uses the social risk propagation analysis result as input, and the data format is a graph structure, which represents the relationship between student interaction and learning content. The graph convolution network can identify deep connection between the interaction mode and the content strategy by deeply analyzing the graphs of the relations, so as to adjust the teaching strategy. The generated optimized teaching strategy scheme aims at improving teaching quality and student learning experience, and is of great importance to continuous improvement of an online education platform.

The present invention is not limited to the above embodiments, and any equivalent embodiments which can be changed or modified by the technical disclosure described above can be applied to other fields, but any simple modification, equivalent changes and modification made to the above embodiments according to the technical matter of the present invention will still fall within the scope of the technical disclosure.

Claims

1. A learning early warning method for big data analysis is characterized by comprising the following steps:

based on student online learning activity data, adopting a long-term and short-term memory network algorithm, and analyzing the learning behavior data time sequence of the students, identifying mode changes and trends, and performing preliminary classification of the student behavior modes to generate student behavior time mode analysis results;

based on the student behavior time pattern analysis result, adopting an Apriori association rule learning algorithm, and optimizing the relation between the student behavior and the score by analyzing and excavating frequent item sets and strong rules between the student behavior pattern and the learning score to generate a behavior and score association analysis result;

Based on the correlation analysis result of the behaviors and the achievements, classifying the risk grades of the students according to the correlation rules of the behaviors and the achievements by adopting a C4.5 decision tree algorithm, predicting potential learning difficulties, and generating a final risk grade prediction model;

based on the final risk level prediction model, a Bayesian network algorithm is adopted, and a real-time risk monitoring result is generated by calculating the probability distribution of the learning risk and by monitoring the learning risk of the students in real time;

based on the real-time risk monitoring result, adopting a community detection network analysis algorithm, and identifying key nodes and modes of risk transmission by constructing a social network model of a student group to generate a risk transmission mode analysis result;

based on the risk propagation mode analysis result, adopting a graph convolution network algorithm, and optimizing course content and interaction strategies in an online learning environment by analyzing a relationship graph of student interaction and learning content to generate an optimized online learning strategy.

2. The learning early warning method for big data analysis according to claim 1, wherein the student behavior time mode analysis result is specifically a time distribution diagram of an online learning activity of a student, the behavior and performance correlation analysis result is specifically a correlation measurement between the learning behavior of the student and the learning performance of the student, the final risk level prediction model comprises risk characteristics, prediction accuracy and potential intervention schemes, the real-time risk monitoring result comprises student learning risk assessment, risk variation trend and a student list, the risk propagation mode analysis result comprises a network structure diagram of risk propagation in a student group, key nodes, risk propagation paths and risk propagation trend, and the optimized online learning strategy comprises a risk mode, course content and learning activity adjusted by student interaction characteristics, improved student interaction schemes and course designs optimized by improving learning effects.

3. The learning and early warning method for big data analysis according to claim 1, wherein the step of generating the analysis result of the student behavior time pattern is specifically performed by analyzing the learning behavior data time sequence of the student, identifying the pattern change and the trend, and performing preliminary classification of the student behavior pattern based on the online activity data of the student by adopting a long-short-period memory network algorithm;

based on student online learning activity data, converting the data into standard distribution by adopting a data processing method through Z score normalization, dividing time series data into windows with fixed sizes by adopting sliding window division, and carrying out data optimization processing to generate a standard chemogenetic behavior data set;

based on the standard chemo-biological behavior data set, adopting a long-term and short-term memory network algorithm, processing time sequence data through a forgetting gate, an input gate and an output gate mechanism, capturing a time dependency relationship, and generating a time sequence feature analysis result;

based on the time sequence feature analysis result, adopting a support vector machine algorithm, and identifying the change and trend in the student behavior mode through kernel function conversion and hyperplane optimization to generate a behavior mode identification result;

Based on the behavior pattern recognition result, a K-means clustering algorithm is adopted, and the student behavior pattern is classified by selecting an initial centroid and iteratively optimizing the distance from a data point to a nearest centroid, so that a student behavior pattern classification result is generated.

4. The learning early warning method for big data analysis according to claim 1, characterized in that, based on the analysis result of the student behavior time pattern, adopting Apriori association rule learning algorithm, optimizing the relationship between the student behavior and the score by analyzing and mining frequent item sets and strong rules between the student behavior pattern and the learning score, and generating the analysis result of the relationship between the behavior and the score is specifically;

based on the student behavior pattern classification result, adopting an association rule mining processing technology, constructing a transaction database, encoding an item set, arranging the data set, and generating a processed behavior and achievement data set;

based on the processed behavior and achievement data set, adopting an Apriori algorithm, and mining a frequent item set through item set generation and support calculation to generate a frequent item set analysis result;

based on the frequent item set analysis result, adopting an FP-Growth algorithm, and analyzing a strong association rule between the behavior and the score by constructing a frequent pattern tree and carrying out conditional pattern base mining to generate a strong association rule result of the behavior and the score;

And optimizing the relation between the student behavior and the score by adopting a principal component analysis method based on the strong correlation rule result of the behavior and the score through eigenvalue decomposition and dimension reduction, and generating an optimization analysis result of the correlation of the behavior and the score.

5. The learning early warning method for big data analysis according to claim 1, wherein the step of classifying the risk level of the student according to the association rule of the behavior and the achievement by adopting a C4.5 decision tree algorithm based on the analysis result of the association of the behavior and the achievement to predict the potential learning difficulty, and the step of generating the final risk level prediction model is specifically as follows;

based on the behavior and score relevance optimization analysis result, adopting a data coding technology, converting data into a format required by a decision tree algorithm through single-heat coding and tag coding, and generating a coded risk analysis data set;

based on the coded risk analysis data set, a C4.5 decision tree algorithm is adopted, a prediction model is constructed through information gain ratio calculation and tree pruning technology, and a preliminary risk level prediction model is generated;

based on the preliminary risk level prediction model, a cross verification technology is adopted, the accuracy of the model is evaluated through data segmentation and multiple rounds of verification, and an optimized risk level prediction model is generated;

And based on the optimized risk level prediction model, carrying out final classification of the risk level by adopting a Bayesian network algorithm through probability inference and causal relationship modeling, and generating a final risk level prediction model.

6. The learning early warning method for big data analysis according to claim 1, characterized in that the step of generating a real-time risk monitoring result by calculating probability distribution of learning risk and by real-time monitoring of student learning risk is specifically described as follows, based on the final risk level prediction model, using a bayesian network algorithm;

based on the final risk level prediction model, a Bayesian network algorithm is adopted, differential risk factors and correlations thereof are represented by constructing nodes and edges, probability distribution of each risk state is calculated through dependence among conditional probability expression factors, and a probability distribution calculation result is generated;

based on the probability distribution calculation result, adopting a stream data processing algorithm, and capturing student behavior data, analyzing a data stream in real time to monitor the change of student behaviors, identifying potential risk states and generating real-time risk monitoring data;

based on the real-time risk monitoring data, an isolated forest algorithm is adopted, the observation points are isolated through randomly selecting features and randomly dividing values, abnormal behaviors are identified, and an abnormal behavior identification result is generated;

Based on the abnormal behavior identification result, the dynamic change of the risk level is intuitively displayed by utilizing a data visualization tool, and the change trend of the risk level is intuitively reflected through a chart and a thermodynamic diagram form, so that a real-time risk monitoring result is generated.

7. The learning early warning method for big data analysis according to claim 1, wherein the step of generating a risk propagation mode analysis result is specifically performed by constructing a social network model of a student group, identifying key nodes and modes of risk propagation, and adopting a community detection network analysis algorithm based on the real-time risk monitoring result;

based on the real-time risk monitoring result, a social network analysis method is adopted, an interaction network diagram among students is constructed, interaction relations among the students are represented through nodes and edges, main connection points in a social network are identified, and a social network diagram construction result is generated;

based on the social network diagram construction result, adopting Lu Mo algorithm, reducing connection between communities by optimizing the connection density in communities, identifying differential communities in a network, and generating a community detection result;

based on the community detection result, simulating a susceptible person, an infected person and a remover in the infectious disease transmission process by adopting an SIR model to analyze the transmission path and speed of risks in the student group and generate a risk transmission path analysis result;

Based on the risk propagation path analysis result, calculating the centrality of each node in the network by using a betweenness centrality algorithm, evaluating the importance of individual nodes in the risk propagation process, and generating a risk propagation mode analysis result.

8. The learning early warning method for big data analysis according to claim 1, characterized in that, based on the risk propagation mode analysis result, adopting a graph convolution network algorithm to optimize course content and interaction strategy in an online learning environment by analyzing a relationship graph of student interaction and learning content, and generating an optimized online learning strategy specifically comprises the steps of;

based on the risk propagation mode analysis result, a graph convolution network algorithm is adopted to represent the relationship between student interaction and learning content as a graph structure, and the interaction mode is analyzed by applying convolution operation to extract characteristics on the graph to generate a student interaction mode analysis result;

based on the student interaction mode analysis result, adopting an Apriori algorithm to analyze association rules between course content and student interaction, identifying key association points between the student interaction and learning content, and generating a content interaction relation analysis result;

Based on the analysis result of the content interaction relationship, adopting a natural language processing technology, analyzing student feedback by an emotion analysis method, and adjusting course content to be more fit with student demands to generate a course content adjustment scheme;

based on the course content adjustment scheme, a self-adaptive learning model is adopted, a teaching strategy and an interaction mode are adjusted according to feedback and interaction conditions of students, an online learning effect is optimized, and an optimized online learning strategy is generated.

9. The learning early warning system for big data analysis according to any one of claims 1 to 8, wherein the system comprises a behavior pattern analysis module, a score association mining module, a risk level classification module, a risk probability calculation module, a real-time monitoring module, a social network analysis module and a content policy optimization module.

10. The learning early warning system for big data analysis according to claim 9, wherein the behavior pattern analysis module analyzes a time sequence of a student login frequency and a learning duration by adopting a long-short-term memory network algorithm based on student online learning activity data, and the LSTM algorithm discovers periodicity and abnormal points of student behaviors by identifying pattern changes and trends in the time sequence to generate a behavior pattern identification result;