CN108170765A - Recommend method based on the poverty-stricken mountains in school behavioral data multidimensional analysis - Google Patents

Recommend method based on the poverty-stricken mountains in school behavioral data multidimensional analysis Download PDF

Info

Publication number
CN108170765A
CN108170765A CN201711415918.2A CN201711415918A CN108170765A CN 108170765 A CN108170765 A CN 108170765A CN 201711415918 A CN201711415918 A CN 201711415918A CN 108170765 A CN108170765 A CN 108170765A
Authority
CN
China
Prior art keywords
student
data
matrix
feature
eigenmatrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711415918.2A
Other languages
Chinese (zh)
Other versions
CN108170765B (en
Inventor
孙浪
施星靓
刘胜军
李晓洁
孟虎
李海松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HEFEI CITY CLOUD DATA CENTER Co Ltd
Original Assignee
HEFEI CITY CLOUD DATA CENTER Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HEFEI CITY CLOUD DATA CENTER Co Ltd filed Critical HEFEI CITY CLOUD DATA CENTER Co Ltd
Priority to CN201711415918.2A priority Critical patent/CN108170765B/en
Publication of CN108170765A publication Critical patent/CN108170765A/en
Application granted granted Critical
Publication of CN108170765B publication Critical patent/CN108170765B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Abstract

The present invention relates to method is recommended based on the poverty-stricken mountains in school behavioral data multidimensional analysis, solve be difficult to realize the defects of poor student to be subsidized precisely recommends compared with prior art.The present invention includes the following steps:The acquisition of historical behavior data;The feature extraction of historical behavior data;The training of recommended models;The acquisition of behavioral data to be analyzed;The feature extraction of behavioral data to be analyzed;The acquisition of recommendation results.The present invention is based on student generate in school data, by extracting the feature of multiple dimensions, disaggregated model is established using these features, by disaggregated model accurate judgement student poor situation and make decisions.

Description

Recommend method based on the poverty-stricken mountains in school behavioral data multidimensional analysis
Technical field
The present invention relates to big data analysis technical field, specifically based on the poverty in school behavioral data multidimensional analysis Production-goods boosting recommends method.
Background technology
The arriving in big data epoch provides new theory and technical support for poverty-stricken mountains work, is also colleges and universities' profit Quick, convenient, efficiently accurate financing work is promoted to bring new opportunity with big data.It is excavated using big data with analyzing skill Art, mathematical modeling theory help manager's students ' true behavior pattern during school find " recessive poverty " with it is doubtful The student of " falseness is assert " realizes and precisely subsidizes.
At present also is in for accurate subsidize in school poor student the exploratory stage, the country is right temporarily without unified appraisal procedure In there is no unified demarcation standard in school poor student, the management of subsidy shortage system, standardization to poor student so that subsidize work Work is very cumbersome, and causes the waste of many data resources.Though there is portion of techniques to propose some viewpoints and thinking, equal nothing Method meets practical application or is difficult to realize, such as:Patent No. 201710223971.6, patent name are based on data mining The patent application document of student's poverty Forecasting Methodology.Though it is to be directed to student to be analyzed in school data, directly make With big data platform hadoop and spark, and model has used random forest, and student is not carried out targetedly in school data Technical data classifying and dividing so that classification results are unsatisfactory.
Therefore, how using big data technology accurately to realize that the subsidy of poor student is thought to have become the technology that need to be solved Problem
Invention content
The purpose of the present invention is to solve the defects of poor student to be subsidized precisely recommends is difficult to realize in the prior art, carry Method is recommended to solve the above problems for a kind of poverty-stricken mountains based in school behavioral data multidimensional analysis.
To achieve these goals, technical scheme is as follows:
A kind of poverty-stricken mountains based in school behavioral data multidimensional analysis recommend method, include the following steps:
The acquisition of historical behavior data obtains the historical behavior data of the multiple dimensions of previous student, historical behavior data packet Include the household economy data of previous student, all-in-one campus card consumption data, student achievement data, library borrow data;
The feature extraction of historical behavior data, extracts the dimensional characteristics of previous student's historical behavior data, and establishes feature Matrix;
The training of recommended models, using household economy situation eigenmatrix A, all-in-one campus card consumption eigenmatrix B, Student's latent factor matrix PiRecommended models are trained with matrix S is borrowed;
The acquisition of behavioral data to be analyzed, obtains the behavioral data of the multiple dimensions of student to be analyzed, and behavioral data includes past The household economy data of phase student, all-in-one campus card consumption data, student achievement data, library borrow data;
The feature extraction of behavioral data to be analyzed, extracts the dimensional characteristics of students ' behavior data to be analyzed, and establishes and treat point Analyse eigenmatrix;
The acquisition of recommendation results, the recommended models being analysed to after eigenmatrix input training, obtains poverty-stricken mountains and pushes away Recommend set and poverty-stricken mountains non-recommended set.
The feature extraction of the historical behavior data includes the following steps:
According to the household economy data structure household economy situation eigenmatrix A extracted;
Calculate the maximum value Max of each previous student's family economic consumptioni, minimum M ini, median Mediani, it is average Number Avgi, quartile Quartilei, standard deviation Standardi, working day and weekend spending amount ratio Ratei, with square The form of battle array is deposited into memory,
Household economy situation eigenmatrix A is defined as follows:
A=[Maxi,Mini,Mediani,Avgi,Quartilei,Standardi,Ratei]T
All-in-one campus card consumption eigenmatrix B is built according to the all-in-one campus card consumption data extracted, it is specific Step is as follows:
Calculate the consumption number of times Times of each previous studenti, consumption total value Cost_Amounti, single maximum consumption Single_max_amounti, single minimum consumption Single_min_amounti, consumption mean value Cost_avgi, consumption median Cost_mediani
The Engel coefficient of each previous student is calculated, calculation formula is as follows:
Wherein, P1Represent each previous students' dining hall spending amount, P2Represent each previous college student's consumption total amount, EiTable Show the Engel coefficient of each previous student;
All-in-one campus card consumption eigenmatrix B is established, expression formula is as follows:
B=[Timesi, Cost_Amounti, Single_max_amounti, Single_min_amounti, Cost_ avgi, Cost_mediani, Ei]T
Student's latent factor matrix P is calculated according to the previous student achievement data extractedi, it is as follows:
Each element r in previous student performance matrix R is built,
Wherein, rijRepresent student uiIn course cjIn achievement,For student piIn course qjIn average achievement;
Build each element W in previous students' needs matrix Wij,
WijRepresent student uiIn course cjIn curricula-variable situation, 1 represents selection course, and 0 represents non-selected course;
Matrix decomposition is carried out to achievement matrix R, is realized by optimizing following object function:
Wherein, PiRepresent student's latent factor, QjRepresent the course factor, λ represents punishment parameter;Parameter PiAnd QjPass through friendship It is acquired for least square method and stochastic gradient descent method, alternating least-squares are using following formula come undated parameter:
Wherein, EkIt is the unit matrix of a k*k, k is the dimension of the given feature to be extracted, i.e., the achievement of single student Vector of the situation for a k dimension;piIt is student's latent factor matrix table dendrography life to the preference of course, qjIt is that course is potential Factor matrix represents the quality of course itself;
Previous student library according to extracting borrows data structure and borrows matrix S, is as follows:
The every a line for borrowing matrix S represents the Borrow Situation of a student, and each row represent the feelings that a books are borrowed Each element in condition wherein matrix, suiRepresent whether student u has borrowed books i, suiRepresent that the book is borrowed when being 1, suiFor Represent that the book is not borrowed when 0;
Matrix decomposition is done to book borrowing and reading matrix S, is realized by optimizing following object function,
Wherein, HuRepresent student to the preference heterogeneity of books, GiRepresent the latent factor of books, λ represents punishment parameter;Ginseng Number HukAnd GikIt is acquired by stochastic gradient descent method, gradient formula is as follows:
Gradient decline is minimized:
Wherein, α is iteration step length, HukThe preference score for being student u in k this classification, GkiIt is books i in this class of k Preference score on not;
Obtain the reading interest H of student uu, reading interest HuReflect the corresponding reading interest feature of the student.
The training of the recommended models includes the following steps:
In the model preprocessing stage, with the stop condition that recursive form setting model designs, setting is as follows:
Data set class label vector is created, is denoted as Clsaslist { c1、c2、...、ck};
If all students belong to same class label C in training set Dk, then such label C is directly returnedk
If feature set X is sky, by the class label C of training set D middle school student number maximumkAs return value;
Model training stage, the information theory based on Shannon create tree-model, calculate the information gain of each feature respectively, choose Information gain maximum feature is selected as first class node, the entire model of recurrence, until all features exhaust, model Training terminates;It is as follows:
The entropy of each feature in feature set X to training set D is calculated, calculation formula is as follows:
Wherein, D is training set, piRepresent the frequency of i-th of classification, the quantity of m presentation classes;
Training set D according to feature set X is divided, feature set X is calculated and the information that training set D is divided it is expected, calculate Formula is as follows:
Wherein, Values (X) represents the property value set of feature set X, and j represents a property value, DjIt is to belong in training set D Property value be j subset;
Calculate respectively family economic conditions eigenmatrix A, all-in-one campus card consumption eigenmatrix B, student it is potential because The submatrix Pi and information gain gain () for borrowing matrix S, calculation formula are as follows:
Gain (D, X)=info (D)-infoX(D);
For household economy situation eigenmatrix A, all-in-one campus card consumption eigenmatrix B, student's latent factor square Battle array Pi is compared with the information gain for borrowing matrix S;
Root node of the eigenmatrix of information gain maximum as tree-model is found out, the second level of child nodes of tree-model is divided into Two nodes, one of node are poverty-stricken mountains non-recommended set;
Information gain is recalculated to remaining three eigenmatrixes and is compared, it is maximum to find out gain in three eigenmatrixes Second node of the eigenmatrix as the second level of child nodes;
Second node of the second level of child nodes extends the third level of child nodes as tree-model, third level of child nodes down It is divided into two nodes, one of node is poverty-stricken mountains non-recommended set;
Information gain is recalculated to remaining two eigenmatrixes and is compared, it is maximum to find out gain in two eigenmatrixes Second node of the eigenmatrix as third level of child nodes;
Second node of third level of child nodes extends the 4th level of child nodes as tree-model, the 4th level of child nodes down It is divided into two nodes, wherein, a node is poverty-stricken mountains non-recommended set, another node is the feature square of gain minimum Battle array.
The verification of recommended models is further included, the verification of the recommended models includes the following steps:
Obtain the test set for having generated result;
Test set is applied in recommended models, calculates the F1 values on test set,
Its computational methods is as follows:
Wherein, M is poor classification number, and TP is the number of students that poor student is predicted as in test set and is also actually poor student Amount, to be predicted as poor student in test set, still actually not student's quantity of poor student, FN are to predict not to be in test set to FP Poor student is actually still student's quantity of poor student;
Pass through the accuracy of the value assessment models of F1.
Advantageous effect
The present invention's recommends method based on the poverty-stricken mountains in school behavioral data multidimensional analysis, compared with prior art base In student generate in school data, by extracting the feature of multiple dimensions, disaggregated model is established using these features, by classification The poor situation of model accurate judgement student simultaneously makes decisions.
The present invention using data mining processing poor student every data information, find out poor student assert in it is crucial because Element extracts student's family economic conditions, all-in-one campus card consumption, student performance situation and library's Borrow Situation data Multiple dimensional characteristics so that recommendation method is more close to practical application, and basic data accuracy is high, it is accurate to obtain information.It is logical Cross and establish disaggregated model using these features, disaggregated model is established based on decision-tree model, need not prepare huge data volume, Dummy variable, the incomplete data of removal need not be created;And the classification of number and data can be handled, multi output can be handled The problem of, meanwhile, disaggregated model uses whitepack model so that given situation was observed in a model, the condition Explain the Boolean logic for being changed into and being easier to explain.With data volume demand is small, recommendation results accuracy rate is high, closing to reality application The characteristics of.
Description of the drawings
Fig. 1 is the method precedence diagram of the present invention.
Specific embodiment
The effect of to make to structure feature of the invention and being reached, has a better understanding and awareness, to preferable Embodiment and attached drawing cooperation detailed description, are described as follows:
As shown in Figure 1, the poverty-stricken mountains of the present invention based in school behavioral data multidimensional analysis recommend method, packet Include following steps:
The first step, the acquisition of historical behavior data.Obtain the historical behavior data of the multiple dimensions of previous student, historical behavior Household economy data of the data including previous student, all-in-one campus card consumption data, student achievement data, library borrow number The study in school of student's family, student, living condition can be accurately reflected according to, these data, due to recommended models in the present invention Foundation is the feature of Behavior-based control data extraction and carries out construction training, here, the accurate essence for being chosen to be the later stage of basic data Standard recommends to lay the foundation.
Second step, the feature extraction of historical behavior data.The dimensional characteristics of previous student's historical behavior data are extracted, and are built Vertical eigenmatrix.Matrix decomposition is carried out by borrowing matrix to student performance matrix, library, is looked for by the way of matrix dimensionality reduction To student and course, the potential relationship between student and book borrowing and reading, conducive to the foundation of model.It is as follows:
(1) according to the household economy data structure household economy situation eigenmatrix A extracted;
Calculate the maximum value Max of each previous student's family economic consumptioni, minimum M ini, median Mediani, it is average Number Avgi, quartile Quartilei, standard deviation Standardi, working day and weekend spending amount ratio ratei, with square The form of battle array is deposited into memory,
Household economy situation eigenmatrix A is defined as follows:
A=[Maxi,Mini,Mediani,Avgi,Quartilei,Standardi,Ratei]T
(2) according to the all-in-one campus card consumption data structure all-in-one campus card consumption eigenmatrix B extracted, tool Body step is as follows:
A, the consumption number of times Times of each previous student is calculatedi, consumption total value Cost_Amounti, single maximum consumption Single_max_amounti, single minimum consumption Single_min_amounti, consumption mean value Cost_avgi, consumption median Cost_mediani
B, the Engel coefficient of each previous student is calculated, calculation formula is as follows:
Wherein, P1Represent each previous students' dining hall spending amount, P2Represent each previous college student's consumption total amount, EiTable Show the Engel coefficient of each previous student;
C, all-in-one campus card consumption eigenmatrix B is established, expression formula is as follows:
B=[Timesi, Cost_Amounti, Single_max_amounti, Single_min_amounti, Cost_ avgi, Cost_mediani, Ei]T
(3) student's latent factor matrix P is calculated according to the previous student achievement data extractedi, it is as follows:
A, each element r in previous student performance matrix R is built,
Wherein, rijRepresent student uiIn course cjIn achievement, student piIn course qjIn average achievement
B, each element W in previous students' needs matrix W is builtij,
WijRepresent student uiIn course cjIn curricula-variable situation, 1 represents selection course, and 0 represents non-selected course;
C, matrix decomposition is carried out to achievement matrix R, is realized by optimizing following object function:
Wherein, Pi represents student's latent factor, and Qj represents the course factor, and λ represents punishment parameter;Parameter Pi and QjPass through friendship It is acquired for least square method and stochastic gradient descent method, alternating least-squares are using following formula come undated parameter:
Wherein, EkIt is the unit matrix of a k*k, k is the dimension of the given feature to be extracted, i.e., the achievement of single student Vector of the situation for a k dimension;piIt is student's latent factor matrix table dendrography life to the preference of course, qjIt is that course is potential Factor matrix represents the quality of course itself;
(4) data structure is borrowed according to the previous student library extracted and borrows matrix S, be as follows:
A, the every a line for borrowing matrix S represents the Borrow Situation of a student, and each row represent what a books were borrowed Each element in situation wherein matrix, suiRepresent whether student u has borrowed books i, suiRepresent that the book is borrowed when being 1, sui Represent that the book is not borrowed when being 0;
B, matrix decomposition is done to book borrowing and reading matrix S, is realized by optimizing following object function,
Wherein, HuRepresent student to the preference heterogeneity of books, GiRepresent the latent factor of books, λ represents punishment parameter;Ginseng Number HukAnd GikIt is acquired by stochastic gradient descent method, gradient formula is as follows:
Gradient decline is minimized:
Wherein, α is iteration step length, HukThe preference score for being student u in k this classification, GkiIt is books i in this class of k Preference score on not;
C, the reading interest H of student u is obtainedu, reading interest HuReflect the corresponding reading interest feature of the student.
Third walks, the training of recommended models, special using household economy situation eigenmatrix A, all-in-one campus card consumption Levy matrix B, student's latent factor matrix PiRecommended models are trained with matrix S is borrowed.It is as follows:
(1) the model preprocessing stage, with the stop condition that recursive form setting model designs, setting is as follows:
A, data set class label vector is created, is denoted as Clsaslist { c1、c2、...、ck};
B, if all students belong to same class label C in training set Dk, then such label C is directly returnedk
C, if feature set X is sky, by the class label C of training set D middle school student number maximumkAs return value.
More than training set D is historical behavior data, and feature set X is household economy situation eigenmatrix A, all-in-one campus card disappears Take situation eigenmatrix B, student's latent factor matrix PiOr borrow matrix S.
(2) model training stage, the information theory based on Shannon create tree-model, and the information for calculating each feature respectively increases Benefit picks out information gain maximum feature as first class node, the entire model of recurrence, until all features are all used Complete, model training terminates.It is as follows:
A, the entropy of each feature in feature set X to training set D is calculated, calculation formula is as follows:
Wherein, D is training set, piRepresent the frequency of i-th of classification, the quantity of m presentation classes;
B, training set D according to feature set X is divided, calculates feature set X and the information that training set D is divided it is expected, meter It is as follows to calculate formula:
Wherein, Values (X) represents the property value set of feature set X, and j represents a property value, DiIt is to belong in training set D Property value be j subset;
C, calculating family economic conditions eigenmatrix A, all-in-one campus card consumption eigenmatrix B, student are potential respectively The factor matrix Pi and information gain gain () for borrowing matrix S, calculation formula are as follows:
Gain (D, X)=info (D)-infoX(D);
D, for household economy situation eigenmatrix A, all-in-one campus card consumption eigenmatrix B, student's latent factor Matrix Pi is compared with the information gain for borrowing matrix S;
E, root node of the eigenmatrix of information gain maximum as tree-model, the second level of child nodes point of tree-model are found out For two nodes, one of node is poverty-stricken mountains non-recommended set;
F, information gain is recalculated to remaining three eigenmatrixes and compared, find out in three eigenmatrixes gain most Second node of the big eigenmatrix as the second level of child nodes;
G, second node of the second level of child nodes extends the third level of child nodes as tree-model, third straton section down Point is divided into two nodes, and one of node is poverty-stricken mountains non-recommended set;
H, information gain is recalculated to remaining two eigenmatrixes and compared, find out in two eigenmatrixes gain most Second node of the big eigenmatrix as third level of child nodes;
I, second node of third level of child nodes extends the 4th level of child nodes as tree-model, the 4th straton section down Point is divided into two nodes, wherein, a node is poverty-stricken mountains non-recommended set, another node is the feature of gain minimum Matrix.
4th step, the acquisition of behavioral data to be analyzed.Obtain the behavioral data of the multiple dimensions of student to be analyzed, behavioral data Household economy data, all-in-one campus card consumption data including previous student, student achievement data, library borrow data.
5th step, the feature extraction of behavioral data to be analyzed.Pass through the same method of the feature extraction of historical behavior data Step extracts the dimensional characteristics of students ' behavior data to be analyzed, and establishes eigenmatrix to be analyzed.
6th step, the acquisition of recommendation results.The recommended models being analysed to after eigenmatrix input training, pass through tree-model It obtains poverty-stricken mountains and recommends set and poverty-stricken mountains non-recommended set.
In order to further increase recommendation accuracy, here, also providing the accuracy validation method for being directed to recommended models, push away The verification for recommending model includes the following steps:
(1) test set for having generated result is obtained, to be compared and analyzed using actual result and prediction result.
(2) test set is applied in recommended models, calculates the F1 values on test set,
Its computational methods is as follows:
Wherein, M is poor classification number, and TP is the number of students that poor student is predicted as in test set and is also actually poor student Amount, to be predicted as poor student in test set, still actually not student's quantity of poor student, FN are to predict not to be in test set to FP Poor student is actually still student's quantity of poor student.
(3) by the accuracy of the value assessment models of F1, the value of F1 is more than certain threshold value, then it is assumed that model is reliable, has Effect.
Basic principle, main feature and the advantages of the present invention of the present invention has been shown and described above.The technology of the industry Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and what is described in the above embodiment and the description is only the present invention Principle, various changes and improvements may be made to the invention without departing from the spirit and scope of the present invention, these variation and Improvement is both fallen in the range of claimed invention.The present invention claims protection domain by appended claims and its Equivalent defines.

Claims (4)

1. a kind of poverty-stricken mountains based in school behavioral data multidimensional analysis recommend method, which is characterized in that including following step Suddenly:
11) acquisition of historical behavior data, obtains the historical behavior data of the multiple dimensions of previous student, and historical behavior data include The household economy data of previous student, all-in-one campus card consumption data, student achievement data, library borrow data;
12) feature extraction of historical behavior data, extracts the dimensional characteristics of previous student's historical behavior data, and establishes feature square Battle array;
13) training of recommended models, using household economy situation eigenmatrix A, all-in-one campus card consumption eigenmatrix B, Student's latent factor matrix PiRecommended models are trained with matrix S is borrowed;
14) acquisition of behavioral data to be analyzed, obtains the behavioral data of the multiple dimensions of student to be analyzed, and behavioral data includes previous The household economy data of student, all-in-one campus card consumption data, student achievement data, library borrow data;
15) feature extraction of behavioral data to be analyzed, extracts the dimensional characteristics of students ' behavior data to be analyzed, and establishes to be analyzed Eigenmatrix;
16) acquisition of recommendation results, the recommended models being analysed to after eigenmatrix input training, obtains poverty-stricken mountains and recommends Set and poverty-stricken mountains non-recommended set.
2. the poverty-stricken mountains according to claim 1 based in school behavioral data multidimensional analysis recommend method, feature It is, the feature extraction of the historical behavior data includes the following steps:
21) according to the household economy data structure household economy situation eigenmatrix A extracted;
Calculate the maximum value Max of each previous student's family economic consumptioni, minimum M ini, median Mediani, average Avgi, quartile Quartilei, standard deviation Standardi, working day and weekend spending amount ratio Ratei, with matrix Form deposit into memory,
Household economy situation eigenmatrix A is defined as follows:
A=[Maxi,Mini,Mediani,Avgi,Quartilei,Standardi,Ratei]T
22) according to the all-in-one campus card consumption data structure all-in-one campus card consumption eigenmatrix B extracted, specific step It is rapid as follows:
221) the consumption number of times Times of each previous student is calculatedi, consumption total value Cost_Amounti, single maximum consumption Single_max_amounti, single minimum consumption Single_min_amounti, consumption mean value Cost_avgi, consumption median Cost_mediani
222) Engel coefficient of each previous student is calculated, calculation formula is as follows:
Wherein, P1Represent each previous students' dining hall spending amount, P2Represent each previous college student's consumption total amount, EiRepresent every The Engel coefficient of a previous student;
223) all-in-one campus card consumption eigenmatrix B is established, expression formula is as follows:
B=[Timesi,Cost_Amounti,Single_max_amounti,Single_min_amounti,Cost_avgi, Cost_mediani,Ei]T
23) student's latent factor matrix P is calculated according to the previous student achievement data extractedi, it is as follows:
231) each element r in previous student performance matrix R is built,
Wherein, rijRepresent student uiIn course cjIn achievement,For student piIn course qjIn average achievement;
232) each element W in previous students' needs matrix W is builtij,
WijRepresent student uiIn course cjIn curricula-variable situation, 1 represents selection course, and 0 represents non-selected course;
233) matrix decomposition is carried out to achievement matrix R, is realized by optimizing following object function:
Wherein, PiRepresent student's latent factor, QjRepresent the course factor, λ represents punishment parameter;Parameter PiAnd QjBy alternately minimum Square law and stochastic gradient descent method acquire, and alternating least-squares are using following formula come undated parameter:
Wherein, EkIt is the unit matrix of a k*k, k is the dimension of the given feature to be extracted, i.e., the achievement situation of single student Vector for a k dimension;piIt is student's latent factor matrix table dendrography life to the preference of course, qjIt is course latent factor Matrix represents the quality of course itself;
24) data structure is borrowed according to the previous student library extracted and borrows matrix S, be as follows:
241) the every a line for borrowing matrix S represents the Borrow Situation of a student, and each row represent the feelings that a books are borrowed Each element in condition wherein matrix, suiRepresent whether student u has borrowed books i, suiRepresent that the book is borrowed when being 1, suiFor Represent that the book is not borrowed when 0;
242) matrix decomposition is done to book borrowing and reading matrix S, is realized by optimizing following object function,
Wherein, HuRepresent student to the preference heterogeneity of books, GiRepresent the latent factor of books, λ represents punishment parameter;Parameter Huk And GikIt is acquired by stochastic gradient descent method, gradient formula is as follows:
Gradient decline is minimized:
Wherein, α is iteration step length, HukThe preference score for being student u in k this classification, GkiIt is books i in k this classification Preference score;
243) the reading interest H of student u is obtainedu, reading interest HuReflect the corresponding reading interest feature of the student.
3. the poverty-stricken mountains according to claim 1 based in school behavioral data multidimensional analysis recommend method, feature It is, the training of the recommended models includes the following steps:
31) the model preprocessing stage, with the stop condition that recursive form setting model designs, setting is as follows:
311) data set class label vector is created, is denoted as Clsaslist { c1、c2、…、ck};
312) if all students belong to same class label C in training set Dk, then such label C is directly returnedk
313) if feature set X is sky, by the class label C of training set D middle school student number maximumkAs return value;
32) model training stage, the information theory based on Shannon create tree-model, calculate the information gain of each feature respectively, choose Information gain maximum feature is selected as first class node, the entire model of recurrence, until all features exhaust, model Training terminates;It is as follows:
321) entropy of each feature in feature set X to training set D is calculated, calculation formula is as follows:
Wherein, D is training set, piRepresent the frequency of i-th of classification, the quantity of m presentation classes;
322) training set D according to feature set X is divided, calculates feature set X and the information that training set D is divided it is expected, calculate Formula is as follows:
Wherein, Values (X) represents the property value set of feature set X, and j represents a property value, DjIt is property value in training set D Subset for j;
323) calculate respectively family economic conditions eigenmatrix A, all-in-one campus card consumption eigenmatrix B, student it is potential because The submatrix Pi and information gain gain () for borrowing matrix S, calculation formula are as follows:
Gain (D, X)=info (D)-infoX(D);
324) for household economy situation eigenmatrix A, all-in-one campus card consumption eigenmatrix B, student's latent factor square Battle array Pi is compared with the information gain for borrowing matrix S;
325) root node of the eigenmatrix of information gain maximum as tree-model is found out, the second level of child nodes of tree-model is divided into Two nodes, one of node are poverty-stricken mountains non-recommended set;
326) information gain is recalculated to remaining three eigenmatrixes and compared, it is maximum to find out gain in three eigenmatrixes Second node of the eigenmatrix as the second level of child nodes;
327) second node of the second level of child nodes extends the third level of child nodes as tree-model, third level of child nodes down It is divided into two nodes, one of node is poverty-stricken mountains non-recommended set;
328) information gain is recalculated to remaining two eigenmatrixes and compared, it is maximum to find out gain in two eigenmatrixes Second node of the eigenmatrix as third level of child nodes;
329) second node of third level of child nodes extends the 4th level of child nodes as tree-model, the 4th level of child nodes down It is divided into two nodes, wherein, a node is poverty-stricken mountains non-recommended set, another node is the feature square of gain minimum Battle array.
4. the poverty-stricken mountains according to claim 1 based in school behavioral data multidimensional analysis recommend method, feature It is, further include the verification of recommended models, the verification of the recommended models includes the following steps:
41) test set for having generated result is obtained;
42) test set is applied in recommended models, calculates the F1 values on test set,
Its computational methods is as follows:
Wherein, M is poor classification number, and TP is student's quantity that poor student is predicted as in test set and is also actually poor student, To be predicted as poor student in test set, still actually not student's quantity of poor student, FN are to predict it is not poor in test set to FP Raw but practical student's quantity for poor student;
43) pass through the accuracy of the value assessment models of F1.
CN201711415918.2A 2017-12-25 2017-12-25 Poverty-stricken and living fund assisting recommendation method based on multidimensional analysis of on-school behavior data Active CN108170765B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711415918.2A CN108170765B (en) 2017-12-25 2017-12-25 Poverty-stricken and living fund assisting recommendation method based on multidimensional analysis of on-school behavior data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711415918.2A CN108170765B (en) 2017-12-25 2017-12-25 Poverty-stricken and living fund assisting recommendation method based on multidimensional analysis of on-school behavior data

Publications (2)

Publication Number Publication Date
CN108170765A true CN108170765A (en) 2018-06-15
CN108170765B CN108170765B (en) 2021-11-12

Family

ID=62524004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711415918.2A Active CN108170765B (en) 2017-12-25 2017-12-25 Poverty-stricken and living fund assisting recommendation method based on multidimensional analysis of on-school behavior data

Country Status (1)

Country Link
CN (1) CN108170765B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035095A (en) * 2018-08-02 2018-12-18 杭州华网信息技术有限公司 A kind of Intelligent campus is cut classes method for early warning and system
CN109145113A (en) * 2018-08-24 2019-01-04 北京桃花岛信息技术有限公司 A kind of student's poverty degree prediction technique based on machine learning
CN109165211A (en) * 2018-07-27 2019-01-08 合肥智圣新创信息技术有限公司 A kind of poor student based on big data precisely subsidizes system
CN109670998A (en) * 2018-12-27 2019-04-23 三盟科技股份有限公司 Based on the multistage identification of accurate subsidy and system under the big data environment of campus
CN110097142A (en) * 2019-05-15 2019-08-06 杭州华网信息技术有限公司 Poor student's prediction technique of behavior is serialized for student
CN111079083A (en) * 2019-11-22 2020-04-28 电子科技大学 Student behavior based analysis method
CN111599472A (en) * 2020-05-14 2020-08-28 重庆大学 Method and device for recognizing psychological states of students and computer
CN111754115A (en) * 2020-06-24 2020-10-09 重庆电子工程职业学院 College family economic difficulty student identification system
CN112215385A (en) * 2020-03-24 2021-01-12 北京桃花岛信息技术有限公司 Student difficulty degree prediction method based on greedy selection strategy
CN112966473A (en) * 2021-03-04 2021-06-15 吉林农业大学 University test paper analysis software mainly based on Excel mean value median analysis
CN113343106A (en) * 2021-06-29 2021-09-03 山东建筑大学 Intelligent student recommendation method and system
CN116664014A (en) * 2023-07-25 2023-08-29 临沂大学 Comprehensive evaluation system and method for college student management

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682352A (en) * 2011-03-11 2012-09-19 鮑济美 Intelligent campus security information management system based on internet of things
CN105930540A (en) * 2016-03-23 2016-09-07 四川长虹电器股份有限公司 Data processing system
CN106202094A (en) * 2015-05-05 2016-12-07 中国移动通信集团公司 A kind of determination method and apparatus of captivation index information
CN106557846A (en) * 2016-11-30 2017-04-05 成都寻道科技有限公司 Based on university students school data graduation whereabouts Forecasting Methodology
CN106779999A (en) * 2016-12-23 2017-05-31 重庆工程职业技术学院 Financial situation identification and device
CN106886922A (en) * 2017-03-01 2017-06-23 安徽大智睿科技技术有限公司 It is a kind of that students ' analysis method and system are paid close attention to based on all-purpose card consumption
CN106934742A (en) * 2017-02-22 2017-07-07 黔南民族师范学院 A kind of Impoverished College Studentss assessment method
CN106951568A (en) * 2017-04-07 2017-07-14 中南大学 Student's poverty Forecasting Methodology based on data mining
CN106991187A (en) * 2017-04-10 2017-07-28 武汉朱雀闻天科技有限公司 The analysis method and device of a kind of campus data

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682352A (en) * 2011-03-11 2012-09-19 鮑济美 Intelligent campus security information management system based on internet of things
CN106202094A (en) * 2015-05-05 2016-12-07 中国移动通信集团公司 A kind of determination method and apparatus of captivation index information
CN105930540A (en) * 2016-03-23 2016-09-07 四川长虹电器股份有限公司 Data processing system
CN106557846A (en) * 2016-11-30 2017-04-05 成都寻道科技有限公司 Based on university students school data graduation whereabouts Forecasting Methodology
CN106779999A (en) * 2016-12-23 2017-05-31 重庆工程职业技术学院 Financial situation identification and device
CN106934742A (en) * 2017-02-22 2017-07-07 黔南民族师范学院 A kind of Impoverished College Studentss assessment method
CN106886922A (en) * 2017-03-01 2017-06-23 安徽大智睿科技技术有限公司 It is a kind of that students ' analysis method and system are paid close attention to based on all-purpose card consumption
CN106951568A (en) * 2017-04-07 2017-07-14 中南大学 Student's poverty Forecasting Methodology based on data mining
CN106991187A (en) * 2017-04-10 2017-07-28 武汉朱雀闻天科技有限公司 The analysis method and device of a kind of campus data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周洪兰: "贫困生资助系统的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
张艳: "如何优化"贫困大学生家庭经济困难认定机制"", 《科教导刊(中旬刊)》 *
董丽娟: "基于关联规则的决策树改进算法在贫困生认定中的应用", 《中国优秀硕士学位论文全文数据库 社会科学Ⅱ辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165211A (en) * 2018-07-27 2019-01-08 合肥智圣新创信息技术有限公司 A kind of poor student based on big data precisely subsidizes system
CN109035095A (en) * 2018-08-02 2018-12-18 杭州华网信息技术有限公司 A kind of Intelligent campus is cut classes method for early warning and system
CN109145113A (en) * 2018-08-24 2019-01-04 北京桃花岛信息技术有限公司 A kind of student's poverty degree prediction technique based on machine learning
CN109145113B (en) * 2018-08-24 2021-12-21 北京桃花岛信息技术有限公司 Student poverty degree prediction method based on machine learning
CN109670998A (en) * 2018-12-27 2019-04-23 三盟科技股份有限公司 Based on the multistage identification of accurate subsidy and system under the big data environment of campus
CN110097142A (en) * 2019-05-15 2019-08-06 杭州华网信息技术有限公司 Poor student's prediction technique of behavior is serialized for student
CN111079083A (en) * 2019-11-22 2020-04-28 电子科技大学 Student behavior based analysis method
CN112215385A (en) * 2020-03-24 2021-01-12 北京桃花岛信息技术有限公司 Student difficulty degree prediction method based on greedy selection strategy
CN112215385B (en) * 2020-03-24 2024-03-19 北京桃花岛信息技术有限公司 Student difficulty degree prediction method based on greedy selection strategy
CN111599472A (en) * 2020-05-14 2020-08-28 重庆大学 Method and device for recognizing psychological states of students and computer
CN111599472B (en) * 2020-05-14 2023-10-24 重庆大学 Method and device for identifying psychological state of student and computer
CN111754115A (en) * 2020-06-24 2020-10-09 重庆电子工程职业学院 College family economic difficulty student identification system
CN112966473A (en) * 2021-03-04 2021-06-15 吉林农业大学 University test paper analysis software mainly based on Excel mean value median analysis
CN112966473B (en) * 2021-03-04 2023-06-23 吉林农业大学 University teacher test paper analysis system mainly based on Excel mean value analysis
CN113343106A (en) * 2021-06-29 2021-09-03 山东建筑大学 Intelligent student recommendation method and system
CN116664014A (en) * 2023-07-25 2023-08-29 临沂大学 Comprehensive evaluation system and method for college student management

Also Published As

Publication number Publication date
CN108170765B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN108170765A (en) Recommend method based on the poverty-stricken mountains in school behavioral data multidimensional analysis
Sidone et al. Scholarly publication and collaboration in B razil: The role of geography
Marinoni A stochastic spatial decision support system based on PROMETHEE
CN105183870B (en) A kind of urban function region detection method and system using microblogging location information
Kangas et al. Multiple criteria decision support in forest management—the approach, methods applied, and experiences gained
Rozenfeld et al. The area and population of cities: New insights from a different perspective on cities
Comber et al. Different methods, different wilds: Evaluating alternative mappings of wildness using fuzzy MCE and Dempster-Shafer MCE
CN109583625A (en) One kind pulling part amount prediction technique, system, equipment and storage medium
CN104198912B (en) A kind of hardware circuit FMEA based on data mining analyzes method
CN114154427B (en) Volume fracturing fracture expansion prediction method and system based on deep learning
Raimbault et al. Space matters: Extending sensitivity analysis to initial spatial conditions in geosimulation models
CN105869100A (en) Method for fusion and prediction of multi-field monitoring data of landslides based on big data thinking
Li et al. Knowledge transfer and adaptation for land-use simulation with a logistic cellular automaton
Abualdenien et al. Ensemble-learning approach for the classification of Levels Of Geometry (LOG) of building elements
CN109635244A (en) Drillability of rock prediction technique, system, storage medium and electric terminal
Giupponi Decision support for mainstreaming climate change adaptation in water resources management
Hassani et al. Forecasting UK consumer price inflation using inflation forecasts
CN108885628A (en) Data analysing method candidate's determination device
Zhong et al. A comparative analysis of traditional four-step and activity-based travel demand modeling: a case study of Tampa, Florida
Santiago et al. A methodology for the characterization of flow conductivity through the identification of communities in samples of fractured rocks
Chao Estimating project overheads rate in bidding: DSS approach using neural networks
Bruna et al. Geographically weighted panel regression and development accounting for European Regions
CN111325255B (en) Specific crowd delineating method and device, electronic equipment and storage medium
CN110264014B (en) Method and device for predicting oil production of old well
Karmshahi et al. Application of an integrated CA-Markov model in simulating spatiotemporal changes in forest cover: a case study of Malekshahi county forests, Ilam province

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant