CN113742396A - Mining method and device for object learning behavior pattern - Google Patents

Mining method and device for object learning behavior pattern Download PDF

Info

Publication number
CN113742396A
CN113742396A CN202110989581.6A CN202110989581A CN113742396A CN 113742396 A CN113742396 A CN 113742396A CN 202110989581 A CN202110989581 A CN 202110989581A CN 113742396 A CN113742396 A CN 113742396A
Authority
CN
China
Prior art keywords
behavior
sequence
mining
learning
behaviors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110989581.6A
Other languages
Chinese (zh)
Other versions
CN113742396B (en
Inventor
张立山
李亭亭
刘丽丽
冯硕
赵爱茹
戴志诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202110989581.6A priority Critical patent/CN113742396B/en
Priority to PCT/CN2021/128838 priority patent/WO2022160842A1/en
Publication of CN113742396A publication Critical patent/CN113742396A/en
Application granted granted Critical
Publication of CN113742396B publication Critical patent/CN113742396B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Abstract

The invention provides a mining method and a mining device for an object learning behavior pattern, belonging to the technical field of mining learning behavior patterns, wherein the method comprises the following steps: arranging the object behaviors according to the time sequence of the object execution actions, and constructing a behavior sequence corresponding to each object to form a behavior sequence database; adopting learning behavior embedding to conduct vectorization processing on behaviors in the whole behavior sequence database; clustering the behaviors after the vector quantization processing, and dividing the behaviors into different categories; assigning different digital codes to different behavior categories, and constructing a digital behavior sequence corresponding to each object; mining frequent subsequences from a digital behavior sequence database by adopting a sequence pattern mining method; the object learning behavior pattern is analyzed based on the frequent subsequence. The invention can reduce the situations of excessive mining subsequences and redundant learning behavior subsequences, so that the mining sequence mode is more representative.

Description

Mining method and device for object learning behavior pattern
Technical Field
The invention belongs to the technical field of mining learning behavior patterns, and particularly relates to a mining method and device of an object learning behavior pattern.
Background
In recent years, online education platforms such as a mulsion class (MOOC), a smart tree and a netbook public class develop rapidly, and a huge problem faced by the online education platforms is that an educator faces thousands of students and cannot pay attention to the learning condition of each student at all, the students can generate different learning modes in the learning process and generate a large amount of learning data, a teacher hardly obtains useful learning modes from the data, and the data needs to be analyzed so as to know the students and make decisions better to help the students to learn better, so that research on learning behavior analysis is imperative.
The existing technology for analyzing online learning behaviors mainly comprises an educational data mining technology and a learning analysis technology. The educational data mining technology mostly uses methods such as sequence mining technology, process mining algorithm, lag sequence analysis and association rules. The learning analysis technology mainly uses analysis methods such as network behavior modeling, social network analysis, utterance analysis, content analysis, and data visualization. For example: the existing researchers use a difference sequence mining algorithm combining the SPAMc algorithm and plot mining to mine the learning behaviors of students, and analyze the effectiveness and the change of the learning behaviors of the students between the high self-adjusting learning level and the low self-adjusting learning level; the process mining algorithm of the pMineR packet is used for generating a first-order transition probability matrix of the learning behaviors of the students and then analyzing the learning behaviors of the students; and the researchers classify learners according to certain indexes and analyze different student learning modes.
The existing research methods such as the traditional sequence mining algorithm are often confronted with the situation that the number of mined subsequences is too large and the learning behavior subsequences are redundant; and the problem of the learning mode is difficult to find by simply clustering learners. In addition, the existing research technology only uses an educational data mining technology or a learning analysis technology, only single research is carried out from a technical or learning analysis level, different analysis methods and data models are used for explaining behavior data, a learning rule is searched according to the explained result, and the learning performance is reflected.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a mining method and a mining device for an object learning behavior pattern, and aims to solve the problems that in the existing mining method for the learning behavior pattern, the mining effect is poor under the condition of complicated learning behaviors and the mined sequence pattern is lack of representativeness because the sequence mining method is frequently confronted with the condition that mining subsequences are too many and learning behavior subsequences are redundant.
In order to achieve the above object, in one aspect, the present invention provides a method for mining a learning behavior pattern of an object, including the following steps:
arranging the object behaviors according to the time sequence of the execution actions of the objects, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database;
adopting learning behavior embedding to conduct vectorization processing on behaviors in a behavior sequence database;
clustering the behaviors after the vector quantization processing, and dividing the behaviors into different categories;
assigning different digital codes to different behavior categories, and constructing a digital behavior sequence corresponding to each object;
mining frequent subsequences from a digital behavior sequence database by adopting a sequence pattern mining method; wherein each type of digital behavior sequence is used as an item set; the digital behavior sequence database is a set of digital behavior sequences corresponding to all objects;
the object learning behavior pattern is analyzed based on the frequent subsequence.
Preferably, the method for constructing the behavior sequence comprises the following steps:
extracting behavior data including an object ID, time, and action;
dividing the behavior data by taking the object as a unit according to the ID of the object;
and arranging the actions in the behavior data of each object according to the time of executing the actions by the object, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
Preferably, the method for vectorizing the behavior sequence corresponding to each object includes the following steps:
inputting the object behavior sequence database into a word2vec model;
the word2vec model interprets each behavior sequence into a sentence and interprets each behavior into a word, and each action is converted into a behavior vector through a learning behavior embedding process.
Preferably, the clustering method of the behavior after the subtended quantization comprises the following steps:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method and a contour coefficient method.
Preferably, the sequence pattern mining method is a SPAM method.
Preferably, a cosine value or a euclidean distance is used to calculate a similarity distance between two behavior vectors, and a distance matrix is obtained, where the distance matrix is used to reflect the similarity relationship of the behavior vectors.
Preferably, the mining method of the object learning behavior pattern is used for mining of a student learning behavior pattern of online education.
In another aspect, the present invention provides an excavating device for object learning behavior patterns, including: the system comprises a behavior sequence construction module, a vectorization processing module, a clustering processing module, a digital processing module, a sequence pattern mining module and a behavior analysis module;
the behavior sequence construction module is used for arranging the object behaviors according to the time sequence of the execution actions of the objects, constructing a behavior sequence corresponding to each object and constructing an object behavior sequence database;
the vectorization processing module is used for vectorizing the behaviors in the object behavior sequence database by adopting learning behavior embedding;
the clustering processing module is used for clustering the behaviors after the vector quantization processing and dividing the behaviors into different categories;
the digital processing module is used for assigning different digital codes to different behavior categories and constructing a digital behavior sequence corresponding to each object;
the sequence pattern mining module is used for mining frequent subsequences from the digital behavior sequence database by adopting a sequence pattern mining method; wherein each type of digital behavior sequence is used as an item set; the digital behavior sequence database is a set of digital behavior sequences corresponding to all objects;
the behavior analysis module is used for analyzing the object learning behavior pattern based on the frequent subsequence.
Preferably, the method for constructing the behavior sequence comprises the following steps:
extracting behavior data including an object ID, time, and action;
dividing the behavior data by taking the object as a unit according to the ID of the object;
and arranging the actions in the behavior data of each object according to the time of executing the actions by the object, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
Preferably, the vectorization processing module is a word2vec module, and the word2vec module is used for executing the operation of the word2vec model.
Preferably, the method for clustering the quantified behavior vectors is as follows:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method and a contour coefficient method.
Preferably, the sequence pattern mining method is a SPAM method.
Preferably, the mining means of the subject learning behavior pattern is applied to mining of a student learning behavior pattern of online education.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
firstly, performing vectorization processing on behaviors in a behavior sequence database by adopting learning behavior embedding; then, performing behavior clustering on the behaviors subjected to the vector quantization processing, and dividing the behaviors into different categories; secondly, different digital codes are assigned to different behavior categories, and a digital behavior sequence corresponding to each object is constructed; finally, mining a frequent subsequence from the digital behavior sequence database by adopting a sequence pattern mining method; the combination of vectorization processing and a sequence pattern mining method is realized, and compared with the prior art which only uses the sequence pattern mining method, the traditional sequence pattern mining method cannot effectively cluster similar learning behaviors, so that the performance is poor under the condition of complicated learning behaviors, and the mined sequence pattern is lack of representativeness; the invention uses a preprocessing means of learning behavior vectorization, the vectorized behaviors can visually see the similarity relation among the behaviors, and then a behavior clustering method is matched to classify the complicated learning types, so that the relation among the behaviors can be well represented, and a sequence mining method is matched to reduce the situations of excessive mining subsequences and redundant learning behavior subsequences, so that the mining precision is improved.
The model for vectorizing the behavior sequence database is a word2vec model; the word2vec model interprets each behavior sequence into a sentence and interprets each behavior into a word, and each action is converted into a behavior vector through a learning behavior embedding process. Since the simple method can generate useful behavior embedding, capture the learning or browsing patterns of students and reveal the similarity of the students in learning in a similar way, the behavior patterns similar to the students can be captured very intuitively after word2vec training of the behavior sequence database.
After the vectorization processing of the behavior sequence is carried out, the similarity distance between two behavior vectors can be calculated by using a cosine value or an Euclidean distance, a distance matrix is obtained, and the distance matrix can reflect the similarity relation of the behavior vectors. Although the distance matrix cannot classify the behaviors in the behavior sequence, the distance matrix can be used as an auxiliary for verifying the clustering effect.
The invention selects a k-means method to cluster the behavior vectors, selects a method combining an elbow method and a contour coefficient method to determine the value of k in the k-means method, and divides a data set into different classes or clusters according to a certain specific standard (such as a distance criterion) so that the similarity of data objects in the same cluster is as large as possible, and the difference of the data objects which are not in the same cluster is also as large as possible, namely the data of the same class are gathered together as much as possible after clustering, and the different data are separated as much as possible. Thus, different types of learning behavior can be efficiently partitioned.
Drawings
Fig. 1 is a schematic diagram of a mining method of a student learning behavior pattern provided by an embodiment of the present invention;
fig. 2 is a schematic diagram of a learning behavior clustering result provided in the embodiment of the present invention;
fig. 3 is a schematic diagram of the first 10 results of mining behavior patterns by using an SPAM method after vectorization clustering according to an embodiment of the present invention;
FIG. 4 is a graph of SSE as a function of k value in the elbow method provided by an embodiment of the present invention;
FIG. 5 is a graph showing the variation of Silhouette Coeffcient with k value in the contour coefficient method according to the embodiment of the present invention;
FIG. 6 is a flow chart of a SPAM method provided by an embodiment of the present invention;
fig. 7 is an exemplary diagram of S extension and I extension processes provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In one aspect, the invention provides a mining method for an object learning behavior pattern, which comprises the following steps:
arranging the object behaviors according to the time sequence of the execution actions of the objects, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database;
adopting learning behavior embedding to carry out vectorization processing on behaviors in the object behavior sequence database;
clustering the behaviors after the vector quantization processing, and dividing the behaviors into different categories;
assigning different digital codes to different behavior categories, and constructing a digital behavior sequence corresponding to each object;
mining frequent subsequences from a digital behavior sequence database by adopting a sequence pattern mining method; wherein each type of digital behavior sequence is used as an item set; the digital behavior sequence database is a set of digital behavior sequences corresponding to all objects;
the object learning behavior pattern is analyzed based on the frequent subsequence.
Preferably, the method for constructing the behavior sequence comprises the following steps:
extracting behavior data including an object ID, time, and action;
dividing the behavior data by taking the object as a unit according to the ID of the object;
and arranging the actions in the behavior data of each object according to the time of executing the actions by the object, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
Preferably, the method for vectorizing the behavior sequence corresponding to each object includes the following steps:
inputting the object behavior sequence database into a word2vec model;
the word2vec model interprets each behavior sequence into a sentence and interprets each behavior into a word, and each action is converted into a behavior vector through a learning behavior embedding process.
Preferably, the clustering method of the behavior after the subtended quantization comprises the following steps:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method and a contour coefficient method.
Preferably, the sequence pattern mining method is a SPAM method.
Preferably, a cosine value or a euclidean distance is used to calculate a similarity distance between two behavior vectors, and a distance matrix is obtained, where the distance matrix is used to reflect the similarity relationship of the behavior vectors.
Preferably, the mining method of the object learning behavior pattern is used for mining of a student learning behavior pattern of online education.
In another aspect, the present invention provides an excavating device for object learning behavior patterns, including: the system comprises a behavior sequence construction module, a vectorization processing module, a clustering processing module, a digital processing module, a sequence pattern mining module and a behavior analysis module;
the behavior sequence construction module is used for arranging the object behaviors according to the time sequence of the execution actions of the objects, constructing a behavior sequence corresponding to each object and constructing an object behavior sequence database;
the vectorization processing module is used for vectorizing each behavior in the object behavior sequence database by adopting learning behavior embedding;
the clustering processing module is used for clustering the behaviors after the vector quantization processing and dividing the behaviors into different categories;
the digital processing module is used for assigning different digital codes to different behavior categories and constructing a digital behavior sequence corresponding to each object;
the sequence pattern mining module is used for mining frequent subsequences from the digital behavior sequence database by adopting a sequence pattern mining method; wherein each type of digital behavior sequence is used as an item set; the digital behavior sequence database is a set of digital behavior sequences corresponding to all objects;
the behavior analysis module is used for analyzing the object learning behavior pattern based on the frequent subsequence.
Preferably, the method for constructing the behavior sequence comprises the following steps:
extracting behavior data including an object ID, time, and action;
dividing the behavior data by taking the object as a unit according to the ID of the object;
and arranging the actions in the behavior data of each object according to the time of executing the actions by the object, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
Preferably, the vectorization processing module is a word2vec module, and the word2vec module is used for executing the operation of the word2vec model.
Preferably, the method for clustering the quantified behavior vectors is as follows:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method and a contour coefficient method.
Preferably, the sequence pattern mining method is a SPAM method.
Preferably, the mining means of the subject learning behavior pattern is applied to mining of a student learning behavior pattern of online education.
Examples
As shown in fig. 1, the present embodiment relates to a mining method based on learning behavior patterns of students in an online education environment, which includes learning behavior vectorization representation, clustering after behavior vectorization, and sequential pattern mining of learning behaviors after clustering.
The method for digging the learning behavior mode of the students in the online learning platform environment specifically comprises the following steps:
1. pre-processing of data
Extracting learning behavior data of students in an online learning platform environment: the data source is the behavioral data of part of students, including: student ID, time and action;
dividing the behavior data by taking students as units according to the IDs of the students, arranging the actions in the behavior data of each student according to the time of the students for executing the actions, then deleting the IDs and the time of the students, and constructing a session corresponding to each student, wherein the session only comprises the action data of the students; the conversation corresponding to each student is used as a behavior sequence corresponding to the student, and then a student learning behavior sequence database is constructed;
2. vectorizing processing is carried out on behavior sequence by learning behavior embedding
Learning behavior embedding using NLP (natural language processing) method; the method comprises the following specific steps:
the method comprises the steps that a behavior sequence database is formed for behavior sequences corresponding to all students, wherein each behavior sequence comprises multiple behaviors, in the embodiment, a standard word2vec model is directly applied to achieve learning behavior embedding, the word2vec model inputs the behavior sequences, each behavior sequence is interpreted into a sentence, each behavior is interpreted into a word, a behavior vector corresponding to each action is output, and useful behavior embedding is generated; the learning or browsing patterns of the students can be captured through the behavior vectors and the similarities of the learning of the students in a similar manner are revealed. Specifically, the process of realizing learning behavior embedding by adopting a word2vec model is as follows:
inputting a behavior sequence corresponding to the student into a word2vec model;
the word2vec model interprets each behavior sequence into a sentence, interprets each behavior into a word, and converts each action into a behavior vector through a learning behavior embedding process;
for the later verification of the behavior vector clustering result, the embodiment may calculate the similarity relationship between the behavior vectors corresponding to the two behaviors; the method comprises the following specific steps:
calculating the similarity distance between the two behavior vectors by using the cosine value or the Euclidean distance to obtain a distance matrix;
the distance matrix can more intuitively see the similarity and cosine value between the behavior vectors, and the similarity relation between the learning behaviors of students can be better judged;
3. clustering the quantified behaviors
After the learning behavior embedding of the students is vectorized, clustering the behavior vectors by using a k-means method; determining the value of k by combining an elbow method and a contour coefficient method;
after all learning behaviors of students are vectorized and clustered, six types of learning behaviors are obtained after the learning behaviors are clustered, the obtained six types of behaviors are named according to clustering results, and behavior data in an original data set can be clustered into six types of structures of classroom operation, task operation, knowledge construction a, knowledge construction b, analysis behavior and thinking resistance behavior; the action data in the data source contained in each class is shown in fig. 2, and the six classes of behaviors reflect the learning behaviors and states of students.
The experimental scenario of the embodiment is a process in which a student draws a concept graph, and a classroom operation behavior represents basic operations of the student on classroom tasks, for example: importing export files, creating projects, closing operations and other behaviors; the task operation is to download, answer, submit and other related behaviors to task students issued by teachers; the knowledge construction a is about the behavior of copying and pasting, and belongs to the category of knowledge construction due to the fact that the thinking cost is low; the knowledge construction b is the operations of establishing, editing, deleting, adding relations and the like in the concept diagram constructed by students; the analysis behavior is the behavior of enlarging and reducing the concept diagram of the student, and whether the overall analysis construction is reasonable or not is judged; the behavior of thinking resistance is the behavior of canceling redo, and is the behavior of thinking resistance of the drawn concept diagram by students;
4. SPAM method mining behavior pattern suitable for six types of behaviors
In the embodiment, after the learning behaviors of students are vectorized and clustered, the learning behaviors and the similar relations of various learning behaviors can be very visually seen, on the basis, different digital codes are assigned to different behavior types, and the student behavior sequences are correspondingly converted into the digital codes to construct digitized behavior sequences; mining learning behavior patterns of students by applying a sequence pattern mining method, wherein the sequence mining method is a knowledge discovery process for searching frequent subsequences from a digital behavior sequence database as patterns; after the attributes, scanning times and time constraints of a classical sequence Mining method are comprehensively compared, a SPAM (sequential Pattern Mining A Bitmap replication) method in the sequence Mining method is selected, a clustered student digital behavior sequence database and a specified threshold support degree minister are input, wherein the threshold support degree minister represents the proportion of a subsequence mode in the digital behavior sequence database, and the SPAM method outputs a frequent subsequence mode; it should be noted that in this embodiment, each type of learning behavior is taken as an item set, learning behavior patterns of students are mined, the support degree minsup is set to 0.8, the number of mined frequent subsequence is 181, the number of frequent subsequence with support degree minsup of 100% is 125, and the mining result in fig. 3 lists mining the top 10 subsequence patterns with the maximum support degree;
5. post-mining result analysis of behavior vectors
The mining result in fig. 3 shows that most of the subsequence modes are knowledge construction or task operation, or the combination of the two types, so that the probability of the student operating the classroom task and operating the webpage is more frequent, but the result does not analyze the behavior and the backstepping behavior, and the result also reflects that the student is involved in the knowledge construction once, lacks corresponding meta-cognitive ability and strategy, and causes lack of planning, monitoring and regulation; the mining result is also matched with the result obtained after vectorization, and it can be seen that the learning behavior analysis method of "behavior vectorization + sequence pattern mining" provided by this embodiment is effective. For example: the category of the mined 'drag-drop' action accounts for 55% of the total behavior pattern, and if a behavior vectorization preprocessing means is not used, the behavior pattern obtained by the action is only 0.4%; the mining result shows that the mining effect of the student learning behavior pattern provided by the invention is improved no matter the number of the mined frequent subsequences, the number of the entries with the same support degree or the less-occurring action proportion.
In conclusion, the method adopts an embedding method when the learning behaviors of the students are vectorized, and word2vec modeling is utilized to obtain the vectorization expression of the learning behaviors of the students; then, clustering by using k-means to obtain a plurality of corresponding clusters; and comparing the result obtained by the embedding method with the result obtained by the sequence pattern mining method, and judging whether the sequences of frequent subsequence patterns belong to the same cluster or word with higher similarity.
The following describes methods, models, and the like related to the present invention, specifically as follows:
(1) the learning behavior embedding is to convert a large sparse vector into a low-dimensional space with a reserved semantic relationship; solving a core problem of sparse input data by mapping high dimensional data to a lower dimensional space; even a small multidimensional space can freely combine items with similar semantics, and the items are separated by different items, and the position in the vector space can encode the semantics into a good embedding;
there are many ways to implement learning behavior embedding, for example: MF, SVD and other matrix decomposition modes, DNN-embedding deep neural network embedding, graph embedding and sequence embedding: typically word2vec and Item2 vec; word2vec is used in this embodiment;
the word2vec model is actually divided into two parts, wherein the first part is used for establishing the model, and the second part is used for selecting the embedded word vector through the model; for the establishment of the model, the method of skip-gram is used in the embodiment, and the partial behavior data of the first student is taken as an example, such as an addopenedrecoverecovererset; firstly, selecting a word in the middle of the learning behaviors of the student as an input word of a word2vec model, for example, selecting the behavior word of 'openend' as the input word; then, a skip-window parameter is defined, which represents the number of words selected from one side of the current input word, for example, if "skip-window" is set to 2 ", then the words in the final window are 'extended', 'open', 'receiver'; then, num-skip is defined, which represents how many different words are selected from the whole window as output words, and assuming that "num-skip is 2", training data in the form of (input words and output words) will be obtained, two sets of training data are obtained, namely ('openend', 'extended') ('openend', 'receiver'); the neural network will output a probability distribution based on the training data, the probability representing the likelihood that each word in our dictionary is an output word; the output probability of the model represents how likely each word in the dictionary will occur simultaneously with the input word; the model will learn statistics from the number of single occurrences of each pair; a vectorized representation of each learning behavior is obtained according to this process described above.
(2) After vectorization of the learning behaviors, the learning behaviors need to be classified, and the classification can be used for knowing which behaviors are similar in meaning, so that the similar behaviors in the text can be conveniently extracted; clustering by using a k-means method, and determining the value of k by adopting a method combining an elbow method and a contour coefficient method; FIGS. 4 and 5 are diagrams of k-point values obtained by elbow method and contour coefficient method, respectively;
the core index of the elbow method is the sum of square error SSE, the sample division is more precise along with the increase of the clustering number k, the aggregation degree of each cluster is gradually improved, and then the sum of square error SSE is naturally gradually reduced; when k is smaller than the true cluster number, the decrease of the SSE is large because the increase of k greatly increases the aggregation degree of each cluster; when k reaches the real clustering number, the aggregation degree return obtained by increasing k is rapidly reduced, so that the descending amplitude of SSE is rapidly reduced and then becomes gentle along with the continuous increase of the k value;
the core index of the contour coefficient method is a contour coefficient, and the contour coefficients of all samples are solved and then averaged to obtain an average contour coefficient; the value range of the average contour coefficient is [ -1,1], and the closer the distance of the samples in the clusters is, the farther the distance of the samples between the clusters is, the larger the average contour coefficient is, and the better the clustering effect is. Therefore, k with the largest average contour coefficient is the optimal clustering number;
(3) the sequence pattern mining method is a knowledge discovery process for searching frequent subsequences from a sequence database as patterns, namely a sequence database is input, and all sequence processes not less than the minimum support degree are output; the various types of sequential pattern mining methods differ in using breadth-first search or depth-first search, and how to compute the support of the patterns to determine whether they satisfy a minimum support constraint; after attributes, scanning times, time constraints and use scenes of an algorithm are comprehensively compared, an SPAM method is selected in a classical sequence pattern mining method, a series of action data of each student in a sequence database are recorded as a sequence, and a frequent subsequence behavior pattern is obtained by applying the SPAM sequence pattern mining method;
when the learning behavior pattern of the student is mined by the SPAM sequence pattern mining method, the database representation of a vertical bitmap is used, a depth-first search method is introduced into the method, and the database only needs to be scanned for 2 times; the flow of the SPAM method is shown in FIG. 6, scanning the sequence database to find all frequent 1 sequence patterns; generating a corresponding sequence mode bit chart according to the found frequent 1 sequence mode; then, a depth-first search method is adopted, the expansion operation of the sequence items and the item sets is recursively carried out in an enumeration mode, and the number of the expanded sequences in a sequence mode bit diagram is calculated to be different from 0 so as to obtain the support degree count of the sequences; calculating the support degree of the sequence by counting the number of the sequences in the bitmap, which is not 0; the core of the method is that depth-first traversal is adopted, and an S expansion and I expansion sequence mode is used; FIG. 7 shows the process of S-expansion and I-expansion; the inputs to the SPAM method are the sequence database and a user-specified threshold minsup ([0,1] values represent percentages); the output of SPAM is all the frequent sequence patterns that occur in sequence databases (sub-sequences appear above the database mini sequence).
In summary, compared with the prior art, the invention has the following advantages:
firstly, vectorizing behaviors corresponding to all objects by adopting learning behavior embedding; then clustering the behaviors after the vector quantization processing, and dividing the behaviors into different categories; secondly, different digital codes are assigned to different behavior categories, and a digital behavior sequence corresponding to each object is constructed; finally, mining a frequent subsequence from the digital behavior sequence database by adopting a sequence pattern mining method; realizes the combination of vectorization processing and a sequence pattern mining method, and compared with the prior art which only uses the sequence pattern mining method, the traditional sequence pattern mining method can not effectively cluster similar learning behaviors, therefore, the invention has poor performance under the condition of various learning behaviors, the mined sequence pattern lacks representativeness and can not well represent the relationship among all the behaviors, the invention uses a preprocessing means of learning behavior vectorization to carry out the vectorized behaviors, so that the similarity relationship among all the behaviors can be very visually seen, and then a behavior clustering method is matched to classify the various learning types, so that the relationship among the behaviors can be well represented, and then a sequence mining method is matched to use, the situations that the mining subsequences are too many and the learning behavior subsequences are redundant can be reduced, and the accuracy of the mining effect is improved to some extent.
The model for vectorizing the behavior sequence is a word2vec model; the word2vec model interprets each behavior sequence into a sentence and interprets each behavior into a word, and each action is converted into a behavior vector through a learning behavior embedding process. Since this simple method can generate useful behavior embedding, capture the learning or browsing patterns of students, and reveal the similarities of students learning in a similar way, the similar behavior patterns of students can be captured very intuitively after word2vec training.
After the vectorization processing of the behavior sequence is carried out, the similarity distance between two behavior vectors can be calculated by using a cosine value or an Euclidean distance, a distance matrix is obtained, and the distance matrix can reflect the similarity relation of the behavior vectors. Although the distance matrix cannot classify the behaviors in the behavior sequence, the distance matrix can be used as an auxiliary for verifying the clustering effect.
The invention selects a k-means method to cluster the behavior vectors, selects a method combining an elbow method and a contour coefficient method to determine the value of k in the k-means method, and divides a data set into different classes or clusters according to a certain specific standard (such as a distance criterion) so that the similarity of data objects in the same cluster is as large as possible, and the difference of the data objects which are not in the same cluster is also as large as possible, namely the data of the same class are gathered together as much as possible after clustering, and the different data are separated as much as possible. Thus, different types of learning behavior can be efficiently partitioned.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A mining method for an object learning behavior pattern is characterized by comprising the following steps:
arranging the object behaviors according to the time sequence of the object execution actions, constructing a behavior sequence corresponding to each object, and constructing a behavior sequence database;
adopting learning behavior embedding to conduct vectorization processing on behaviors in a behavior sequence database;
clustering the behaviors after the vector quantization processing, and dividing the behaviors into different categories;
assigning different digital codes to different behavior categories, and constructing a digital behavior sequence corresponding to each object;
mining frequent subsequences from a digital behavior sequence database by adopting a sequence pattern mining method; wherein each type of digital behavior sequence is used as an item set; the digital behavior sequence database is a set of digital behavior sequences corresponding to all objects;
the object learning behavior pattern is analyzed based on the frequent subsequence.
2. The mining method according to claim 1, wherein the construction method of the behavior sequence comprises the following steps:
extracting behavior data including an object ID, time, and action;
dividing the behavior data by taking the object as a unit according to the ID of the object;
and arranging the actions in the behavior data of each object according to the time of executing the actions by the object, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
3. The mining method according to claim 1 or 2, wherein the method for vectorizing the behavior sequence corresponding to each object comprises the following steps:
inputting the object behavior sequence database into a word2vec model;
the word2vec model interprets each behavior sequence into a sentence and interprets each behavior into a word, and each action is converted into a behavior vector through a learning behavior embedding process.
4. The mining method according to claim 1, wherein the method for clustering the vector of the vector after the vector quantization comprises the following steps:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method and a contour coefficient method.
5. The mining method according to claim 1 or 4, wherein the sequence pattern mining method is a SPAM method.
6. The mining method according to claim 3, wherein a similarity distance between two behavior vectors is calculated by using a cosine value or a Euclidean distance, and a distance matrix is obtained, and the distance matrix is used for reflecting the similarity relation of the behavior vectors.
7. The mining method according to claim 1, wherein the mining method is used for mining of student learning behavior patterns of online education.
8. An excavation apparatus for an object learning behavior pattern, comprising: the system comprises a behavior sequence construction module, a vectorization processing module, a clustering processing module, a digital processing module, a sequence pattern mining module and a behavior analysis module;
the behavior sequence construction module is used for arranging the object behaviors according to the time sequence of the execution actions of the objects, constructing a behavior sequence corresponding to each object and constructing an object behavior sequence database;
the vectorization processing module is used for vectorizing the behaviors in the object behavior sequence database by adopting learning behavior embedding;
the clustering processing module is used for clustering the behaviors after the vector quantization processing and dividing the behaviors into different categories;
the digital processing module is used for assigning different digital codes to different behavior categories and constructing a digital behavior sequence corresponding to each object;
the sequence pattern mining module is used for mining frequent subsequences from a digital behavior sequence database by adopting a sequence pattern mining method; wherein each type of digital behavior sequence is used as an item set; the digital behavior sequence database is a set of digital behavior sequences corresponding to all objects;
the behavior analysis module is used for analyzing the object learning behavior pattern based on the frequent subsequence.
9. The mining device of claim 8, wherein the clustering method for the vector of the vector after the vector quantization comprises:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method and a contour coefficient method.
10. The mining device of claim 8 or 9, wherein the vectorization processing module is a word2vec module, and the word2vec module is configured to perform an operation of a word2vec model.
CN202110989581.6A 2021-01-26 2021-08-26 Mining method and device for object learning behavior mode Active CN113742396B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110989581.6A CN113742396B (en) 2021-08-26 2021-08-26 Mining method and device for object learning behavior mode
PCT/CN2021/128838 WO2022160842A1 (en) 2021-01-26 2021-11-05 Student collaboration state assessment method and system based on electroencephalogram data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110989581.6A CN113742396B (en) 2021-08-26 2021-08-26 Mining method and device for object learning behavior mode

Publications (2)

Publication Number Publication Date
CN113742396A true CN113742396A (en) 2021-12-03
CN113742396B CN113742396B (en) 2023-10-27

Family

ID=78733164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110989581.6A Active CN113742396B (en) 2021-01-26 2021-08-26 Mining method and device for object learning behavior mode

Country Status (1)

Country Link
CN (1) CN113742396B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022160842A1 (en) * 2021-01-26 2022-08-04 华中师范大学 Student collaboration state assessment method and system based on electroencephalogram data
CN115600925A (en) * 2022-10-28 2023-01-13 广州宏途数字科技有限公司(Cn) In-class student behavior analysis auxiliary system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281891A (en) * 2014-10-13 2015-01-14 安徽华贞信息科技有限公司 Time-series data mining method and system
US20190057085A1 (en) * 2016-05-10 2019-02-21 Beijing Information Science & Technology University Method for establishing a digitized interpretation base of dongba classic ancient books
CN111090679A (en) * 2019-10-31 2020-05-01 国网浙江省电力有限公司 Time sequence data representation learning method based on time sequence influence and graph embedding
AU2020103216A4 (en) * 2020-09-25 2021-01-14 Qilu University Of Technology A similarity analysis method of negative sequential patterns based on biological sequences and its implementation system and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281891A (en) * 2014-10-13 2015-01-14 安徽华贞信息科技有限公司 Time-series data mining method and system
US20190057085A1 (en) * 2016-05-10 2019-02-21 Beijing Information Science & Technology University Method for establishing a digitized interpretation base of dongba classic ancient books
CN111090679A (en) * 2019-10-31 2020-05-01 国网浙江省电力有限公司 Time sequence data representation learning method based on time sequence influence and graph embedding
AU2020103216A4 (en) * 2020-09-25 2021-01-14 Qilu University Of Technology A similarity analysis method of negative sequential patterns based on biological sequences and its implementation system and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈方华;白雪;孟凡媛;韩营;: "数据挖掘技术在学习分析中的应用研究", 软件导刊(教育技术), no. 02 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022160842A1 (en) * 2021-01-26 2022-08-04 华中师范大学 Student collaboration state assessment method and system based on electroencephalogram data
CN115600925A (en) * 2022-10-28 2023-01-13 广州宏途数字科技有限公司(Cn) In-class student behavior analysis auxiliary system and method

Also Published As

Publication number Publication date
CN113742396B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN110597735B (en) Software defect prediction method for open-source software defect feature deep learning
CN111310438B (en) Chinese sentence semantic intelligent matching method and device based on multi-granularity fusion model
CN112214610B (en) Entity relationship joint extraction method based on span and knowledge enhancement
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN108874878A (en) A kind of building system and method for knowledge mapping
CN109918505B (en) Network security event visualization method based on text processing
CN110598005A (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN112256939B (en) Text entity relation extraction method for chemical field
CN112507699A (en) Remote supervision relation extraction method based on graph convolution network
CN112306494A (en) Code classification and clustering method based on convolution and cyclic neural network
CN109857457B (en) Function level embedding representation method in source code learning in hyperbolic space
CN113742396B (en) Mining method and device for object learning behavior mode
CN109241199B (en) Financial knowledge graph discovery method
CN113779272A (en) Data processing method, device and equipment based on knowledge graph and storage medium
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN115392252A (en) Entity identification method integrating self-attention and hierarchical residual error memory network
CN116383399A (en) Event public opinion risk prediction method and system
CN114912435A (en) Power text knowledge discovery method and device based on frequent itemset algorithm
CN113434418A (en) Knowledge-driven software defect detection and analysis method and system
CN116484024A (en) Multi-level knowledge base construction method based on knowledge graph
CN114048354A (en) Test question retrieval method, device and medium based on multi-element characterization and metric learning
CN113886562A (en) AI resume screening method, system, equipment and storage medium
CN113722494A (en) Equipment fault positioning method based on natural language understanding
CN116245107B (en) Electric power audit text entity identification method, device, equipment and storage medium
CN113505583A (en) Sentiment reason clause pair extraction method based on semantic decision diagram neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant