CN112801362A

CN112801362A - Academic early warning method based on artificial neural network and LSTM network

Info

Publication number: CN112801362A
Application number: CN202110101091.8A
Authority: CN
Inventors: 欧阳宁; 成浩; 谷盛民; 石将煌; 梁达林
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-05-14
Anticipated expiration: 2041-01-26
Also published as: CN112801362B

Abstract

The invention discloses an academic early warning method based on an artificial neural network and an LSTM network, which is characterized by comprising the following steps of: 1) processing missing data based on the RBF core; 2) extracting self-adaptive features based on multi-dimensional normal distribution; 3) artificial neural network training based on the refined network; 4) training an adaptive excitation function LSTM network; 5) in conjunction with a software platform. The method has the advantages of good universality, low false detection rate and high prediction accuracy.

Description

Academic early warning method based on artificial neural network and LSTM network

Technical Field

The invention relates to the field of big data processing and machine learning, in particular to a academic early warning method based on an artificial neural network and an LSTM network.

Background

The academic early warning has the function of predicting the future performance trend of students and judging whether the students can be graduated, and has very good application value in colleges and universities. Some colleges and universities have used academic early warning systems to assist schools in improving the graduation rate of students and reducing the failure rate. Along with the improvement of the enrollment scale of colleges and universities, too many students exist, and the manual tracking of the completion degree of the student's academic industry is not practical, so that mass data are trained by utilizing the modern advanced deep learning algorithm to form an academic early warning system with accurate prediction, early warning is made for the student's academic industry in advance, the college students can be aware of the danger that the academic industry can not be normally completed in advance, and the occurrence of the condition that the student's academic industry fails to meet the requirements can be effectively reduced.

In the traditional academic early warning method based on machine learning, the algorithms such as a Support Vector Machine (SVM), a Random forest (Random forest) and the like are applied to the academic early warning system, the time characteristics of the scores of college students, the reading time of libraries and the like cannot be used, the front and back time lines cannot be effectively connected, the influence of a front course on a rear course cannot be well processed, and the missing data cannot be well processed. In recent years, the university of oceanic labor proposes a 'academic early warning' mechanism, introduces a 'academic early warning' concept, and carries out early warning of students with low academic completion degree to different degrees through the academic early warning mechanism.

Seep Hochreiter et al propose a long-short term memory network (LSTM) algorithm, and solve the problems that the cyclic neural network (RNN) cannot forget information and cannot memorize information for a long time; tao et al propose an algorithm for improving the SVM based on KFCM, apply the algorithm for improving SVM to the field of academic early warning, improve the traditional algorithm for machine learning, and obtain a better prediction effect; ren et al propose an algorithm based on the FT _ BP neural network, use deep learning and improve the conventional BP network, and apply it to the field of academic early warning.

Although these studies have solved some of the problems of academic forewarning to some extent in recent years, they still have many shortcomings. Firstly, academic early warning has the conditions of complex conditions and non-uniform data, the application scene of the existing method is narrow, the data with uniform standards is required to be used as support, and the key points of the methods are all in algorithm, and the importance of a data set and the time and space correlation among different characteristics are not considered; secondly, the KFCM-SVM algorithm has a good effect when the machine learning algorithm deals with a small number of data sets, but the KFCM-SVM algorithm is not good for scenes with large data and complex data, and the KFCM-SVM algorithm has the problems that the machine learning has high false detection rate and the accuracy rate needs to be further improved due to the characteristic limitation of the algorithm.

Disclosure of Invention

The invention aims to provide a academic early warning method based on an artificial neural network and an LSTM network, aiming at the defects of the prior art. The method has the advantages of good universality, low false detection rate and high prediction accuracy.

The technical scheme for realizing the purpose of the invention is as follows:

an academic early warning method based on an artificial neural network and an LSTM network comprises the following steps:

1) processing missing data based on the RBF core: firstly, cleaning student data: normalizing the student information with complete data by x' ═ x-mu)/(Max-Min); then, the original data of the student information is processed by K (v1, v2) ═ exp (-gamma | | v1-v2 |)²The method maps the student information source from the low-dimensional space to the high-dimensional space, and the method has the advantages of completely retaining all information of the original data, not considering missing values, not considering problems such as linear inseparability and the like, and the mapping process is shown as the formula (1):

wherein, x ', y ' are student information data arrays, y ' is a corresponding array label, α is a RBF kernel parameter gamma value, the smaller the α value is, the larger the influence is, the smaller the value is, and the larger the influence is, thereby each element of the obtained high-dimensional space is:

further obtaining a data K (v1, v2, v3..) vector, and then carrying out normalization processing on the data K (v1, v2, v3..) vector by using x (y-mu)/(Max-Min), wherein y is a student information data array to be normalized, and x is a normalized student information array;

2) self-adaptive feature extraction based on multi-dimensional normal distribution: extracting a result information vector G (x1, x2, x3..) in the student data subjected to data cleaning processing and normalization in the step 1), performing multivariate normal distribution screening, and defining n-dimensional result information in the result information vector G (x1, x2, x3..) as any linear combination of x1, x2, x3., and xn, wherein Y is a-a₁x₁+a₂x₂+...+a_nx_nSubject to a normal distribution, there is a random vector Z ═ Z₁,...,Z_M]^TWherein each element follows a normal distribution and a random vector μ ═ μ₁,...,μ_N]^TAnd N × M satisfies X ═ AZ + μ; if the n-dimensional achievement information vector G (x1, x2, x3..) meets the three conditions, the n-dimensional achievement information vector G is called to meet the multivariate normal distribution, namely the achievement information vector G (x1, x2, x3..) obeys f_xFor the student information with the score meeting the multivariate normal distribution, the student information is planned to be a standard class, and the student information with the score not meeting the multivariate normal distribution is planned to be a singular class, as shown in a formula (2):

here, x is a normalized student information array, μ is an average value, and k is a constant index;

3) artificial neural network training based on refined network: training the data related to the student performance information obtained in the step 2) by adopting an artificial neural network, wherein the training adopts an elastic back propagation (Rprep) algorithm, in the general back propagation algorithm, the change amount of the weight in the learning process is determined by the partial derivative (gradient) of the error function to the weight, and in the Rprep algorithm, the change amount delta w of the weight_i,jIs directly equal to the learning rate eta_i,j(t), therefore the gradient of the error function does not affect the change value of the weight, the gradient only affects the sign of the change value of the weight in the Rprep algorithm, i.e. affects the direction of the change of the weight, the change amount of the weight in the training process is directly equal to the learning rate corresponding to each weight, the sign of the change amount of the weight depends on the sign of the gradient of the error function, the gradient of the error function only determines the direction of the update of the weight, does not determine the strength of the update of the weight, if the gradient of the error function is positive, the corresponding weight needs to be reduced, and w can be enabled to be changed_i,jSubtracting eta_i,j(t), if the gradient is negative, the corresponding weight should be increased to make the error function approach the minimum value, as shown in equation (3): :

now that it is clear how the weights are updated, the learning rate η is described_i,jHow (t) is updated, when first it should be considered how the sign of the gradient at any two time points, t and (t-1), will change, there are two cases in total: if the signs of the gradients of the error functions at two time points (t-1) and t are different, which indicates that the minimum value has been crossed at t, which indicates that the last update step of the weight value is too large, η_i,j(t) should be greater than η_i,j(t-1) is smaller to make the search for the lowest value more accurate, and the learning rate of the previous step and a value eta greater than 0 and smaller than 1 are mathematically made^upMultiply to get the current learning rate, however, when the symbol is two timesThe same sign indicates that the lowest point of the error function has not been reached, the corresponding learning rate can be increased by some to speed up the learning step, and therefore the learning rate of the previous step can be multiplied by an η greater than 1^downTo obtain the current learning rate, as shown in equation (4):

4) training an adaptive excitation function LSTM network: adopt LSTM network to train student's all-purpose card consumption relevant information every day to carry out the cascade output final graduation probability result with artifical neural network training result, LSTM network structure among the prior art is: the LSTM module has three inputs: c. C^t-1、h^t-1And x^tOutputs through the LSTM module are respectively c^t、h^tAnd y^tWherein x is^tRepresents the input of the current round, h^t-1Representing the state quantity output of the previous round, c^t-1Carriers representing one information global in the previous round, y^tRepresents the output of the current wheel, h^tRepresenting the output of the state quantity of the current wheel, c^tAn information carrier representing the global of the round, x^tAnd h^t-1Combining the vectors into a vector, multiplying the vector by a vector W, and wrapping a layer of tanh function outside the vector W to obtain a vector z; using activation function sigmoid to convert x in LSTM network structure in prior art^tAnd h^t-1Are combined into a vector and then multiplied by a matrix W^f、WⁱAnd W^oTo obtain z^f、zⁱAnd z^o，W^f，Wⁱ，W^oWeight matrixes of the forgetting gate, the input gate and the output gate are used for multiplying the variable input by each gate, z^f，zⁱ，z^oThe outputs of the gates are multiplied by a weight W plus an offset, and then the vectors are used to obtain c from equation (5)^t：

c^t＝z^fc^t-1+zⁱz (5)，

H is obtained from the formula (6)^t：h^t＝z^otanh(c^t) (6)，

The output y of the wheel is obtained by the formula (7)^t：y^t＝σ(Wh^t) (7)，

Changing the original tanh excitation function into a weighted average function of the adaptive excitation function Relu + tanh, and adopting the data x transmitted by each gate of the LSTM

Exciting in a u + v-1 form, effectively avoiding the problem of gradient disappearance of tanh, and keeping the nonlinear characteristic of tanh, so that LSTM is utilized to train the consumption related information of the one-card students, early warning and classifying are carried out on each student, a personal analysis report is generated for each student in a targeted manner, if high-risk early warning is carried out, corresponding early warning information is received, and the early warning information is cascaded with the artificial neural network in the step 3) so as to predict the graduation probability of the student;

5) in combination with a software platform: a software platform suitable for the steps is established by utilizing the existing early warning algorithm software, a user port is divided into a teacher end and a student end, the student has the right to check the personal learning score and the personal related information, the class related score information of the student, and the corresponding subject ranking on class grade, and if the student end is a high-risk early warning student end, the student end receives corresponding early warning information; the teacher end has the right to check the information of all students of the course taken by the teacher, check all kinds of information of the class and all kinds of information of the grade, if the students receive the early warning, the teacher also receives the prompt of the early warning of the students, the teacher can conveniently pay attention to the grades of the students, meanwhile, the individual, class and course charts of the students are generated, and the teacher can conveniently track the learning condition and the state of each student in real time.

The technical scheme combines a neural network and an LSTM network algorithm, performs data cleaning on data, performs missing data mapping based on RBF (radial basis function) kernel, performs data normalization processing and adaptive feature extraction based on multidimensional normal distribution, performs refined primary and secondary artificial neural network training and LSTM network training based on an adaptive excitation function, performs prediction in a mode of cascade connection of the artificial neural network and the LSTM network, has a good visual interface by combining with a software platform, can provide visual early warning data information graphs, prediction graphs and reports for students and teachers, can automatically divide early warning grades for the students, and provides corresponding early warning suggestion functions.

The method effectively improves the early warning accuracy by cleaning various data of students, mapping missing data, normalizing, extracting the self-adaptive characteristics based on multi-dimensional normal distribution, then utilizing the artificial neural network and LSTM network distribution training and a cascading prediction mode, and can achieve high accuracy under the condition of less characteristic data.

The method has the advantages of good universality, low false detection rate and high prediction accuracy.

Drawings

FIG. 1 is a schematic flow chart of an exemplary method;

FIG. 2 is a diagram illustrating the comparison between the prediction accuracy of the embodiment and the prediction accuracy of other methods;

FIG. 3 is a schematic diagram of an LSTM module in the prior art;

fig. 4 is a schematic structural diagram of an LSTM module in the embodiment.

Detailed Description

The invention will be further elucidated with reference to the drawings and examples, without however being limited thereto.

Example (b):

referring to fig. 1, a academic early warning method based on an artificial neural network and an LSTM network includes the following steps:

here, x is a normalized student information array, and μ is an average value;

now that it is clear how the weights are updated, the learning rate η is described_i,jHow (t) is updated, when first it should be considered how the sign of the gradient at any two time points, t and (t-1), will change, there are two cases in total: if the signs of the gradients of the error functions at two time points (t-1) and t are different, which indicates that the minimum value has been crossed at t, which indicates that the last update step of the weight value is too large, η_i,j(t) should be greater than η_i,j(t-1) is smaller to make the search for the lowest value more accurate, and the learning rate of the previous step and a value eta greater than 0 and smaller than 1 are mathematically made^upMultiply to obtain the current learning rate, however, whenThe same sign of the two times indicates that the lowest point of the error function has not been reached, the corresponding learning rate can be increased by some to speed up the learning step, and therefore the learning rate of the previous step can be multiplied by an eta greater than 1^downTo obtain the current learning rate, as shown in equation (4):

4) training an adaptive excitation function LSTM network: the LSTM network is adopted to train the consumption related information of the one-card-through-card each day of the student, and the training result and the artificial neural network are cascaded to output the final graduation probability result, as shown in figure 3, the LSTM network structure in the prior art is as follows: the LSTM module has three inputs: c. C^t-1、h^t-1And x^tOutputs through the LSTM module are respectively c^t、h^tAnd y^tWherein x is^tRepresents the input of the current round, h^t-1Representing the state quantity output of the previous round, c^t-1Carriers representing one information global in the previous round, y^tRepresents the output of the current wheel, h^tRepresenting the output of the state quantity of the current wheel, c^tAn information carrier representing the global of the round, x^tAnd h^t-1Combining the vectors into a vector, multiplying the vector by a vector W, and wrapping a layer of tanh function outside the vector W to obtain a vector z; using activation function sigmoid to convert x in LSTM network structure in prior art^tAnd h^t-1Are combined into a vector and then multiplied by a matrix W^f、WⁱAnd W^oTo obtain z^f、zⁱAnd z^o，W^f，Wⁱ，W^oWeight matrixes of the forgetting gate, the input gate and the output gate are used for multiplying the variable input by each gate, z^f，zⁱ，z^oThe outputs of the gates are multiplied by a weight W plus an offset, and then the vectors are used to obtain c from equation (5)^t：

c^t＝z^fc^t-1+zⁱz (5)，

H is obtained from the formula (6)^t：h^t＝z^otanh(c^t) (6)，

As shown in FIG. 4, the LSTM network structure in this example is modified from the network structure based on FIG. 3 in that the original tanh excitation function is changed into a weighted average function of the adaptive excitation functions Relu + tanh, and the data x incoming to each gate of the LSTM is taken as

Through multiple consideration and tests, the accuracy of the method can stably reach 94.21%, the highest accuracy can reach 98.17%, and the average false detection rate is stably 1.97%, as shown in fig. 2, compared with the existing machine learning algorithm SVM, the RF accuracy is obviously improved.

Claims

1. An academic early warning method based on an artificial neural network and an LSTM network is characterized by comprising the following steps:

1) processing missing data based on the RBF core: firstly, cleaning student data: normalizing the student information with complete data by x' ═ x-mu)/(Max-Min); then, the original data of the student information is processed by K (v1, v2) ═ exp (-gamma | | v1-v2 |)²The method maps the student information source from a low-dimensional space to a high-dimensional space, and the mapping process is shown as formula (1):

wherein, x ', y ' are student information data arrays, y ' is a corresponding array label, and α is a RBF kernel parameter gamma value, so that each element of the obtained high-dimensional space is:

further obtaining a data K (v1, v2, v3..) vector, and then carrying out normalization processing on the data K (v1, v2, v3..) vector by using x (y-mu)/(Max-Min), wherein y is a student information data array to be normalized, x is a normalized student information array, mu is an average value, Max and Min are the maximum value and the minimum value of all x elements;

wherein x is a normalized student information array, mu is an average value, and k is a constant index;

3) artificial neural network training based on refined network: training the data related to the student performance information obtained in the step 2) by adopting an artificial neural network, wherein the training adopts an elastic back propagation (Rprep algorithm), and the variation delta w of the weight in the Rprep algorithm_i,jIs directly equal to the learning rate eta_i,j(t), the gradient of the error function does not influence the change value of the weight, the gradient of the error function only influences the sign of the change value of the weight in the Rprep algorithm, namely influences the change direction of the weight, the change amount of the weight in the training process is directly equal to the learning rate corresponding to each weight, the sign of the change amount of the weight depends on the sign of the gradient of the error function, the gradient of the error function only determines the update direction of the weight, the update strength of the weight is not determined, if the gradient of the error function is positive, the corresponding weight is reduced, and the w is enabled to be W_i,jSubtracting eta_i,j(t), if the gradient is negative, then the corresponding weight is increased to bring the error function to the minimum, as shown in equation (3): :

thus, it is clear how the weight is updated, and then the learning rate eta_i,j(t) update, the gradient at any two time points, t and (t-1), will change sign, and there are two cases of change in total: if the signs of the gradients of the error functions at two time points (t-1) and t are different, which indicates that the minimum value has been crossed at t, which indicates that the last update step of the weight value is too large, η_i,j(t) ratio η_i,j(t-1) smaller learning rate of the previous step and a value eta greater than 0 and smaller than 1^upMultiplying to obtain current learning rate, when the signs of two times are identical, indicating that the lowest point of error function has not been reached yet, making the learning rate of previous step multiply by an eta greater than 1^downObtaining the current learning rate as shown in formula (4):

4) training an adaptive excitation function LSTM network: adopt LSTM network to train student's all-purpose card consumption relevant information every day to carry out the cascade output final graduation probability result with artifical neural network training result, LSTM network structure among the prior art is: the LSTM module has three inputs: c. C^t-1、h^t-1And x^tThe LSTM module outputs are respectively c^t、h^tAnd y^tWherein x is^tRepresents the input of the current round, h^t-1Representing the state quantity output of the previous round, c^t-1Carriers representing one information global in the previous round, y^tRepresents the output of the current wheel, h^tRepresenting the output of the state quantity of the current wheel, c^tAn information carrier representing the global of the round, x^tAnd h^t-1Combining into a vector, multiplying by vector W, wrapping a layer of tanh function outside to obtain vector z, and adopting activation function sigmoid to convert x in LSTM network structure in the prior art^tAnd h^t-1Are combined into a vector and then multiplied by a matrix W^f、WⁱAnd W^oTo obtain z^f、zⁱAnd z^o，W^f，Wⁱ，W^oWeight matrixes of the forgetting gate, the input gate and the output gate are used for multiplying the variable input by each gate, z^f，zⁱ，z^oC is obtained from equation (5) for each gate output multiplied by the weight W plus the offset^t：

c^t＝z^fc^t-1+zⁱz (5)，

H is obtained from the formula (6)^t：h^t＝z^otanh(c^t) (6)，

The u + v is excited in a form of 1, so that consumption related information of the one-card students is trained through LSTM, early warning classification is carried out on each student, a personal analysis report is generated for each student in a targeted manner, if the high-risk early warning is carried out, corresponding early warning information is received, and the early warning information is cascaded with the artificial neural network in the step 3) so as to predict the graduation probability of the student;