CN113962424A

CN113962424A - Performance prediction method based on PCANet-BiGRU, processor, readable storage medium and computer equipment

Info

Publication number: CN113962424A
Application number: CN202110800902.3A
Authority: CN
Inventors: 薛景; 孔健睿; 陈铭璋; 李恺玥
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2022-01-21

Abstract

The invention provides a PCANet-BiGRU-based score prediction method, which utilizes basic statistical data and score data of students collected from an online learning platform; dividing the original achievement data into three independent parts: the method comprises the steps of training a set, verifying the set and testing the set, cleaning and preprocessing data of the training set, and constructing a data matrix; inputting the matrix data into a PCANet network of a principal component analysis network to extract the characteristics of the result data; the data processed by the PCANet is input into a Bi-GRU layer of a bidirectional gating circulation unit neural network, and the ordinary performance of the student is predicted.

Description

Performance prediction method based on PCANet-BiGRU, processor, readable storage medium and computer equipment

Technical Field

The invention relates to a score prediction method, in particular to a student score prediction method based on PCANet-BiGRU, and belongs to the technical field of education.

Background

With the wide application of online teaching platforms such as MOOC, rain class and the like of China university, education is more efficient, convenient, more independent and humanized. And the key for ensuring and improving the remote education quality is to strengthen the management of the whole process of on-line learning of students. As is well known, online learning is a long-acting mechanism, and the effort and cognition of ordinary learning basically determine the quality of the learning effect.

However, how to carry out scientific research on the learning process, learning intervention and learning results in a quantitative mode is not enough due to the problems of data sparsity and method scientificity.

The problems existing in the prior art are as follows:

at present, most performance prediction modes comprise a statistical school, a traditional machine learning school and a deep learning school, the statistical school is low in calculation consumption but mostly cannot achieve high accuracy, and machine learning and deep learning have high accuracy but have high calculation complexity.

Disclosure of Invention

The invention aims to provide a PCANet-BiGRU-based score prediction method, which solves the problems that the calculation consumption of a statistical school method is low but high accuracy rate cannot be achieved mostly; the method solves the problem that the machine learning and the deep learning have higher accuracy but too high computational complexity.

The purpose of the invention is realized as follows: a achievement prediction method based on PCANet-BiGRU comprises the following steps:

step 1) using the basic statistical data and score data of students collected from the online learning platform, specifically comprising: acquiring basic statistical data (such as a school number, a course name and a school year) and achievement data (such as arrangement times, completion times and achievement of homework at ordinary times) of a corresponding student from a specific online learning platform (such as MOOC (university of China);

step 2) dividing the original achievement data collected in the step 1) into three independent parts: training set, verification set and test set, and carrying out data cleaning and normalization processing on the training set (because the total score of the daily results is different at each time, normalization processing is needed to be carried out in order to eliminate the influence of dimension on data prediction), constructing a data matrix, wherein the data cleaning specifically comprises the following steps: based on the training set data acquired in the previous stage, if a missing value exists in a certain row of data required to be used in training, deleting the row of data, namely deleting the raw data; if the line of data cannot be used during training, even if the line has missing values, the line of data still remains (based on the training integrated performance data acquired at the previous stage, the situation that the performance of the operation is missing due to the fact that the operation is missed at ordinary times at a certain time can occur, and the abnormal data can cause great deviation in model training, so that the missing data must be cleaned);

step 3) inputting the matrix data into a principal component analysis network PCANet, and extracting the characteristics of the result data, wherein the principal component analysis network PCANet consists of a PCA convolution layer, a nonlinear processing layer and a characteristic pooling layer;

and 4) inputting the data processed by the PCANet into a Bi-GRU layer of a bidirectional gating circulation unit neural network, and predicting the ordinary achievement of the student.

As a further limitation of the present invention, the training set in step 2) is used to train a model, and initial parameters of the model are found by fitting to establish the model, i.e. the weight and bias parameters of the model are determined; the verification set is used for determining parameters of an optimization adjustment model such as a network structure or control model complexity; and the test set checks how well the model that is finally selected is performing. The division ratio of the common training set, the verification set and the test set is 6:2: 2; under the condition of few data sets, generally randomly extracting 20% of data as a test set, and then adopting a cross validation algorithm on the rest data; the cross validation algorithm comprises the following specific steps:

a. randomly equally dividing training data into k parts;

b. selecting k-1 parts of training in turn, verifying the remaining part, and calculating the square sum of prediction errors;

c. and finally averaging the square sum of the k prediction errors to be used as a basis for selecting an optimal model structure.

As a further limitation of the present invention, on the basis of the structure of the gated circulation unit GRU, the bidirectional gated circulation unit neural network Bi-GRU in step 4) simultaneously increases forward and backward propagation in a hidden layer thereof, and captures a long-term dependence relationship of the learning achievement at different times by forward and backward bidirectional operations, so as to obtain a more accurate achievement prediction.

As a further limitation of the invention, the PCA convolutional layer, for each data j of the input layer l, has a convolution kernel P around it_jSampling on a window, then sliding a convolution kernel, concatenating all sample blocks as a representation X of the sample_i＝[x_i,1,x_i,2,...,x_i,n]And averaging the values; performing the operation on the N data sets to obtain a new feature matrix X; then, Principal Component Analysis (PCA) is performed on the matrix; PCA is a common method for data analysis and modeling, and mainly retains the most important characteristics of high-dimensional data, removes noise and unimportant characteristics, reduces the dimension and greatly reduces the data processing cost and speed; the specific algorithm steps are as follows:

a. recording a matrix X with n rows and m columns;

b. normalizing each row of X;

c. solving the covariance matrix C of X:

d. and (3) solving a characteristic value E and a characteristic vector D corresponding to the C:

[E,D]＝eig(C)

wherein, the eig is a function of the eigenvalue and the eigenvector;

e. arranging the eigenvector D according to the size of the corresponding eigenvalue, and selecting the first k columns to form a new matrix, wherein the new matrix is the eigenvector of the data after dimension reduction;

f. and taking the K groups of feature vectors as a PCA filter, taking the PCA filter as a convolution kernel K, and performing convolution on the data set and the K groups of data to finish the convolution operation of extracting the data.

As a further limitation of the present invention, the nonlinear part is used to enhance the characteristic expressiveness of the data, specifically: and carrying out nonlinear processing on the data convolved by two layers of PCAs: carrying out binarization on each convolution result, and selecting a Hervesaide step function

Carrying out binarization; then, the result after binarization is weighted to obtain an integral graph T of the ith numerical value on the output characteristic of the ith layer_i ^l：

As a further limitation of the present invention, the feature pooling layer uses local histograms for PCANet's feature pooling (since the non-linear processing layer has an output range of [0,2 ]^l-1]Values in between, so PCANet does not apply to the max pooling and mean pooling common to CNNs), integer plot T_i ^lThe local histograms are divided into blocks, and each histogram is counted and vectorized, denoted as Bhist (T)_i ^l) The feature vector derived from concatenating the vectors generated by the k integer maps is represented as: f. of_i＝[Bhist(T_i ¹),...,Bhist(T_i ^k)]^T。

As a further limitation of the present invention, the GRU in the Bi-GRU neural network has two gate functions: "reset gate" and "update gate"; wherein, the reset gate r_tFor controlling the state h of the preceding time period_t-1For the current state to be determined

The degree of influence of (c); and "update gate" z_tIs used to determine h_t-1In the message to h_tThe amount of information of (a); the GRU neural network unit is updated in the way of

Wherein, tanh is the hyperbolic tangent function, i.e.

The output value is always in the interval (-1, 1); f is a sigmod function, the output value of which is always in the interval (0,1) so as to determine the importance degree of information and be more helpful to determine the updating or abandoning of data; i.e. i_tAs input at time t, and y_tAs output at time t; w_ir、W_iz、W_ihThe weight matrices input to the reset gate, the update gate, and the state, respectively; w_hr、W_hz、W_hh、W_oRespectively obtaining weight matrixes from a state to a reset gate, an update gate, a current state and an initial state at a certain moment; b_r、b_z、b_hRespectively, a "reset gate", an "update gate", and an offset of the state.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

A processor for running a program, wherein the program performs the above method when running.

Compared with the prior art, the invention adopting the technical scheme has the following technical effects:

the PCANet-BiGRU result prediction model provided by the invention considers the accurate prediction results of low consumption and deep learning of the statistical PCA classification model, so that the student results are predicted, students possibly failing to meet the examination are early warned in time, the students who learn later can timely check for missing and fill in the deficiency, the learning results of the students are improved, and the PCANet-BiGRU result prediction model has important teaching guidance and practice significance.

Drawings

FIG. 1 is a network structure of a PCANet and Bi-GRU based achievement early warning model.

Fig. 2 shows a network structure of PCANet.

FIG. 3 is a schematic diagram of the internal structure of GRU

Detailed Description

The technical scheme of the invention is further explained in detail by combining the attached drawings:

learning is a long-term sequence behavior, in which there is a good or bad state, and different sequence performances lead to different outcome outcomes; aiming at the serial defects of overlong parameter training time, special parameter adjusting skills and the like of the structure, a mixed prediction model based on a principal component analysis network (PCANet) and a bidirectional gated cyclic unit (Bi-GRU) neural network shown in figure 1 is provided and applied to courses of C language, C + +, Python and the like on an online learning platform; on the basis of the structure of the GRU, the Bi-GRU captures the long-term dependence of the ordinary learning achievement in different periods through the forward and backward bidirectional operation in the hidden layer of the GRU, so as to obtain more accurate achievement prediction; the experimental result shows that the PCANet-BiGRU result prediction model effectively predicts the results and improves the accuracy and efficiency of the result prediction.

The PCANet-BiGRU achievement prediction model specifically comprises the following steps: PCANet and BiGRU;

PCANet as shown in fig. 2, the PCANet mainly follows the structure and concept of CNN, but its convolution kernel is composed of Principal Component Analysis (PCA) kernel, nonlinear layer is hash algorithm, and features are generated by histogram statistics, and the PCANet is composed of PCA convolution layer, nonlinear processing layer and feature pooling layer and is used for extracting spatial features of achievement data.

PCA convolutional layer

For each data of the input layer, a convolution kernel P is formed around the data_jSampling on a window, then sliding a convolution kernel, concatenating all sample blocks as a representation X of the sample_i＝[x_i,1,x_i,2,...,x_i,n]And averaging the values; performing the operation on the N data sets to obtain a new feature matrix X; then, Principal Component Analysis (PCA) is performed on the matrix; PCA is a common method for data analysis and modeling, and mainly retains the most important characteristics of high-dimensional dataRemoving noise and unimportant features, and reducing dimensions, thereby greatly reducing the data processing cost and speed; the specific algorithm steps are as follows:

a marks that a matrix X has n rows and m columns;

b normalizing each row of X;

c, solving a covariance matrix C of X:

d, solving a characteristic value E and a characteristic vector D corresponding to the C: [ E, D ] ═ eig (c), where eig is a function of the eigenvalues and eigenvectors;

e, arranging the eigenvectors D according to the sizes of the corresponding eigenvalues, and selecting the first k columns to form a new matrix, wherein the new matrix is the eigenvector of the data after dimension reduction;

and f, taking the K groups of feature vectors as a PCA filter, taking the PCA filter as a convolution kernel K, and performing convolution on the data set and the K groups of data to finish the convolution operation of extracting the data.

PCANet adopts two layers of PCA convolution layers for convolution; if only one layer of convolution is performed, the feature extraction effect is not satisfactory, and if three or more layers of convolution are performed, the calculation amount is increased dramatically due to the increased dimensionality.

Non-linear processing layer

In order to enhance the characteristic expressiveness of the data, the data convolved by two layers of PCA is subjected to nonlinear processing: carrying out binarization on each convolution result; selecting a Hervesseld step function

And (3) carrying out binarization: in H (x), if the data convolved by PCA is greater than 0, the function is set to 1, otherwise, the function is set to 0; then, the result after binarization is weighted to obtain an integral graph T of the ith numerical value on the output characteristic of the ith layer_i ^l：

Feature pooling layer

The output range is [0,2 ] due to the non-linear processing layer^l-1]The values in between, so PCANet does not use the maximum pooling and average pooling common to the second part CNN, but rather uses local histograms for the feature pooling operation of PCANet; integer graph T_i ^lThe local histograms are divided into blocks, and each histogram is counted and vectorized, denoted as Bhist (T)_i ^l) (ii) a The feature vector derived from concatenating the vectors generated for the k integer maps is represented as: f. of_i＝[Bhist(T_i ¹),...,Bhist(T_i ^k)]^T。

The GRU neural network in the Bi-GRU network belongs to one of the Recurrent Neural Networks (RNNs) and can fully reflect time sequence data; however, when the RNN faces long-term dependency, it is difficult to transmit earlier information to the following, and important information may be missed; in addition, RNN also suffers from problems such as disappearance of gradients and explosion. The LSTM neural network introduces an input gate, a forgetting gate and an output gate to regulate information flow, so as to solve the problems.

The GRU is a variant of the long-time memory network LSTM; the input gate and the forgetting gate in the LSTM are combined into an updating gate to construct a GRU; the GRU not only keeps the effect of memorizing the network LSTM in long and short time, but also has fewer parameters, has a simpler structure, is easier to realize in calculation, and is not easy to cause problems such as overfitting.

As shown in fig. 3, each GRU neural network element has two gate functions: "reset gate" and "update gate"; 'reset gate' r_tFor controlling the state h of the preceding time period_t-1For the current state to be determined

The degree of influence of (c); update gate z_tIs used to determine h_t-1In the message to h_tThe amount of information of (2).

The updating mode of the GRU neural network unit is as follows:

wherein tanh is a hyperbolic tangent function, i.e.

In the GRU structure of the above model, the transmission of information is unidirectional from front to back. However, it is considered that the performance data is influenced not only by the data of the previous period but also by the data of the latter period. Therefore Bi-directional gated recurrent neural networks Bi-GRU are used. On the basis of the structure of the GRU, forward and backward propagation is added in a hidden layer of the Bi-GRU, and the long-term dependence of the ordinary learning achievement in different periods is captured by forward and backward bidirectional operation, so that more accurate achievement prediction is obtained.

PCANet-BiGRU

The PCANet has the characteristics of local perception and weight sharing, so that the PCANet is used for extracting spatial features related to the position of the achievement data. Combining the PCANet and the BiGRU, and mining the time and space characteristics of the achievement data through the space perception characteristic of the PCANet and the two-way memory characteristic of the BiGRU to realize the achievement prediction. The PCANet-BiGRU performance prediction process is as follows:

step 1, preprocessing the usual job result data to construct a matrix

Step 2, inputting emergency data into PCANet to extract data space characteristics

And 3, inputting the data processed by the PCANet into the Bi-GRU layer, wherein the model structure of the data for predicting the ordinary performance is shown in the figure.

A processor for running a program, wherein the program performs the above method when running. An example of a specific application of the present invention is given below:

data acquisition

The experimental data of the invention come from the information of each course stored in the MOOC on-line learning system. These classes were opened in 2019 in the system, where student performance information was stored in the form of Excel tables. The student achievement evaluation standard in the data set is based on the achievement of each ordinary homework, and is basically the ordinary homework submitted in a week. The system preprocesses the evaluated data into vectors as the input of the neural network according to the internal standard for the ordinary performances.

Data set partitioning

Randomly extracting 20% of the data set as a test set, and then adopting a cross validation algorithm on the rest data:

a, randomly dividing training data into k parts;

b, selecting k-1 parts of training in turn, verifying the remaining part, and calculating the square sum of the prediction errors;

and c, finally averaging the square sum of the k prediction errors to be used as a basis for selecting an optimal model structure.

Data cleansing

And (3) performing data cleaning on the data of the training set: based on the training integrated performance data acquired in the previous stage, the situation that the performance of the operation is lost due to the fact that the operation is missed at ordinary times at a certain time can occur, and the abnormal data can cause great deviation of model training, so that the lost data must be cleaned. Based on the training set data acquired in the previous stage, if a missing value exists in a certain row of data required to be used in training, deleting the row of data, namely deleting the raw data; if the column data is not used during training, the row of data is retained even if the column has missing values.

Data normalization

The total score of each ordinary achievement is different, and in order to eliminate the influence of the dimension on data prediction, normalization processing is carried out on the achievement.

Construction of models using training sets

Constructing a PCANet-BiGRU network structure, constructing a network model by using a training set, and determining the weight and the bias parameters of the model through fitting;

to prevent overfitting, DropOut technique is used in the training process to enhance the generalization ability of the deep neural network with feature map perturbation.

Determining hyper-parameters of an optimized tuning model using a validation set

Since the learning rate has a significant influence on the performance of the model, and too large or too small of the learning rate causes oscillation of the network and cannot converge to an optimal solution, an optimization method that can update the learning rate by using the general information in the gradient descent is required to improve the model training speed. When the number of the PCA layers is too small, the extraction features are incomplete, and when the number of the PCA layers is too large, huge calculation complexity is brought. Therefore, according to the optimization of the verification set, the PCA convolution layer number is determined to be 2, the Bi-GRU layer number is determined to be 4, the Bi-GRU layer neuron number is determined to be 30, the optimized learning algorithm is Adam, and the learning rate is 0.1.

Evaluating models using test sets

Evaluation index

In order to measure the performance of the prediction model, the square root error RMSE and the average absolute percentage error MAPE are used as performance evaluation indexes to carry out model evaluation.

Wherein N represents the number of samples, y_iRepresents the actual achievement of a certain student,

representing the predicted performance of the student. MAPE reflects the total deviation of the result predicted value, and the prediction accuracy of the model can be measured; the RMSE reflects the error between the predicted value and the true value, which can measure the accuracy of the predicted value.

Analysis of Experimental results

The experiment predicts the general achievement in normal times according to the general achievement of MOOC platform C language, C + + and Python. The results are shown in the following table

Form a achievement result prediction form

Subject of normal score	C language	C + + language	Python language
				RMSE	0.0252	0.0256	0.0263
MAPE/％	2.6139	2.6732	2.7141

As shown in the table, the RMSE and MAE values of the prediction results are both below 0.03, and the MAPE values of the prediction results are both below 2.8%, which shows that the model can effectively predict the performances of different courses of the MOOC, the prediction results are more accurate, and the prediction requirements of the actual MOOC performances can be met. Compared with the conventional performance prediction model using GM (1,1) and PSO-SVM models for performance prediction, the model is remarkably optimized in the aspect of prediction accuracy.

The PCANet-BiGRU model established in the method has obvious advantages in running time.

TABLE II training time comparison table

From the above table, the PCANet-BiGRU has a significant advantage in operation efficiency over the CNN-GRU. The more times of training, the greater the advantage

The result prediction can enable the online learning condition of the students to be better known by teachers, and promote the effective progress of the student results. The invention provides a deep neural network prediction model combining PCANet and Bi-GRU, which utilizes PCANet to extract data hidden features, reduces the size of data, and utilizes Bi-GRU to extract internal dynamic rules to realize early warning of performance. Compared with the previous model, the model can integrate data in time, early warn the score of students, remarkably improve the utilization rate of the data and the time, finally promote the development of education informatization and have higher practical value.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A achievement prediction method based on PCANet-BiGRU is characterized by comprising the following steps:

step 1) utilizing basic statistical data and score data of students collected from an online learning platform;

step 2) dividing the original achievement data collected in the step 1) into three independent parts: training set, verification set and test set, and carrying out data cleaning and normalization processing on the training set to construct a data matrix, wherein the data cleaning specifically comprises the following steps: based on the training set data acquired in the previous stage, if a missing value exists in a certain row of data required to be used in training, deleting the row of data, namely deleting the raw data; if the data of the row is not used during training, the row of data is still reserved even if the row has missing values;

2. The PCANet-BiGRU-based performance prediction method of claim 1, wherein the training set in step 2) is used to train a model, and the model is built by fitting to find initial parameters of the model, i.e. determining weights of the model and biasing the parameters; the verification set is used for determining parameters of an optimization adjustment model such as a network structure or control model complexity; and the test set checks how well the model that is finally selected is performing. The division ratio of the common training set, the verification set and the test set is 6:2: 2; under the condition of few data sets, generally randomly extracting 20% of data as a test set, and then adopting a cross validation algorithm on the rest data; the cross validation algorithm comprises the following specific steps:

a. randomly equally dividing training data into k parts;

3. The PCANet-BiGRU-based achievement prediction method of claim 1, wherein the step 4) bidirectional gated loop unit neural network Bi-GRU increases forward and backward propagation in a hidden layer of the bidirectional gated loop unit GRU on the basis of the structure of the gated loop unit GRU, and captures long-term dependence of learning achievement at different periods through forward and backward bidirectional operation to obtain more accurate achievement prediction.

4. The PCANet-BiGRU-based performance prediction method of claim 1, wherein the PCA convolutional layer, for each data j of the input layer/, has a convolution kernel P around it_jSampling on a window, then sliding a convolution kernel, concatenating all sample blocks as a representation X of the sample_i＝[x_i,1,x_i,2,...,x_i,n]And averaging the values; performing the operation on the N data sets to obtain a new feature matrix X; then, Principal Component Analysis (PCA) is performed on the matrix; PCA is a common method for data analysis and modeling, and mainly retains the most important characteristics of high-dimensional data, removes noise and unimportant characteristics, reduces the dimension and greatly reduces the data processing cost and speed; the specific algorithm steps are as follows:

a. recording a matrix X with n rows and m columns;

b. normalizing each row of X;

c. solving the covariance matrix C of X:

[E,D]＝eig(C)

wherein, the eig is a function of the eigenvalue and the eigenvector;

5. The PCANet-BiGRU-based performance prediction method of claim 4, wherein the non-linear part is used for enhancing the feature expressiveness of the data, and specifically comprises: and carrying out nonlinear processing on the data convolved by two layers of PCAs: carrying out binarization on each convolution result, and selecting a Hervesaide step function

6. The PCANet-BiGRU-based performance prediction method of claim 5, wherein the feature pooling layer performs a feature pooling operation of PCANet using a local histogram, integer graph T_i ^lThe local histograms are divided into blocks, and each histogram is counted and vectorized, denoted as Bhist (T)_i ^l) The feature vector derived from concatenating the vectors generated by the k integer maps is represented as: f. of_i＝[Bhist(T_i ¹),...,Bhist(T_i ^k)]^T。

7. The PCANet-BiGRU-based performance prediction method of claim 5, wherein the GRU in the Bi-GRU neural network has two gate functions: "reset gate" and "update gate"; wherein "Reset gate r_tFor controlling the state h of the preceding time period_t-1For the current state to be determined

Wherein tanh is a hyperbolic tangent function, i.e.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the program is executed by the processor.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 7.