CN113742396B - Mining method and device for object learning behavior mode - Google Patents

Mining method and device for object learning behavior mode Download PDF

Info

Publication number
CN113742396B
CN113742396B CN202110989581.6A CN202110989581A CN113742396B CN 113742396 B CN113742396 B CN 113742396B CN 202110989581 A CN202110989581 A CN 202110989581A CN 113742396 B CN113742396 B CN 113742396B
Authority
CN
China
Prior art keywords
behavior
sequence
mining
learning
behaviors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110989581.6A
Other languages
Chinese (zh)
Other versions
CN113742396A (en
Inventor
张立山
李亭亭
刘丽丽
冯硕
赵爱茹
戴志诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central China Normal University
Original Assignee
Central China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central China Normal University filed Critical Central China Normal University
Priority to CN202110989581.6A priority Critical patent/CN113742396B/en
Priority to PCT/CN2021/128838 priority patent/WO2022160842A1/en
Publication of CN113742396A publication Critical patent/CN113742396A/en
Application granted granted Critical
Publication of CN113742396B publication Critical patent/CN113742396B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Strategic Management (AREA)
  • Fuzzy Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a device for mining a learning behavior mode of an object, which belong to the technical field of mining the learning behavior mode, and the method comprises the following steps: arranging object behaviors according to the time sequence of the execution actions of the objects, and constructing a behavior sequence corresponding to each object to form a behavior sequence database; carrying out vectorization processing on behaviors in the whole behavior sequence database by adopting learning behaviors; clustering the vectorized behaviors, and dividing the behaviors into different categories; assigning different digital codes to different behavior categories, and constructing a digital behavior sequence corresponding to each object; adopting a sequence pattern mining method to mine frequent subsequences from a digital behavior sequence database; the object learning behavior pattern is analyzed based on frequent subsequences. The invention can reduce the situations of excessive mining subsequences and redundant learning behavior subsequences, so that the mined sequence mode is more representative.

Description

Mining method and device for object learning behavior mode
Technical Field
The invention belongs to the technical field of mining learning behavior patterns, and particularly relates to a mining method and device for an object learning behavior pattern.
Background
In recent years, online education platforms such as Mouses (MOOC), wisdom trees and online public classes are rapidly developed, and the huge problem faced by the online education platforms is that education workers are definitely faced with thousands of students, and the students cannot pay attention to the learning situation of each student at all, so that the students can generate different learning modes in the learning process and generate a large amount of learning data, and teachers can hardly obtain useful learning modes in the data, but need to analyze the data so as to better know the students and make decisions to help the students to learn better, so that the study of learning behavior analysis is imperative.
Existing techniques for analyzing online learning behavior mainly include educational data mining techniques and learning analysis techniques. Educational data mining techniques mostly use methods such as sequence mining techniques, process mining algorithms, hysteresis sequence analysis, association rules, and the like. The learning analysis technology mainly uses analysis methods such as network behavior modeling, social network analysis, speech analysis, content analysis, data visualization and the like. For example: researchers use a differential sequence mining algorithm combining a SPAMC algorithm and plot mining to mine learning behaviors of students, and analyze the effectiveness and change of the learning behaviors of the students between high self-regulating learning level and low self-regulating learning level; the process mining algorithm of the pMineR package is used for generating a first-order transition probability matrix of the student learning behavior, and analyzing the student learning behavior; and the researchers classify the learners according to a certain index and analyze different student learning modes.
Existing research methods such as traditional sequence mining algorithms often face situations of excessive mining subsequences and redundant learning behavior subsequences; however, it is difficult to find a problem of learning patterns by simply clustering learners. In addition, the existing research technology only singly uses educational data mining technology or learning analysis technology, only carries out single research from the technical or learning analysis level, applies different analysis methods and data models to explain behavior data, searches learning rules according to the explained results, reflects learning performance, does not technically solve the problem that the traditional algorithm mining results are poor, and falls into how to explain the results.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a mining method and a mining device for an object learning behavior mode, and aims to solve the problems that in the conventional mining method for the learning behavior mode, the sequence mining method often faces the situations of excessive mining subsequences and redundant learning behavior subsequences, so that the mining effect is poor in performance under the situation of complicated learning behavior types, and the mined sequence mode is lack of representativeness.
In order to achieve the above object, in one aspect, the present invention provides a mining method for learning a behavior pattern of an object, including the steps of:
arranging object behaviors according to the time sequence of the execution actions of the objects, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database;
carrying out vectorization processing on behaviors in a behavior sequence database by adopting learning behaviors;
clustering the vectorized behaviors, and dividing the behaviors into different categories;
assigning different digital codes to different behavior categories, and constructing a digital behavior sequence corresponding to each object;
adopting a sequence pattern mining method to mine frequent subsequences from a digital behavior sequence database; wherein each class of digitized behavior sequences serves as a set of items; the digitalized behavior sequence database is a set of digitalized behavior sequences corresponding to all objects;
the object learning behavior pattern is analyzed based on frequent subsequences.
Preferably, the method for constructing the behavior sequence comprises the following steps:
extracting behavior data including object ID, time, and action;
dividing behavior data by taking an object as a unit according to the ID of the object;
and arranging the actions in each object behavior data according to the time of executing the actions by the objects, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
Preferably, the method for vectorizing the behavior sequence corresponding to each object comprises the following steps:
inputting an object behavior sequence database into a word2vec model;
the word2vec model interprets each behavior sequence as a sentence and each behavior as a word, and converts each action into a behavior vector through a learning behavior casting process.
Preferably, the behavior clustering method after vectorization is as follows:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method with a contour coefficient method.
Preferably, the sequence pattern mining method is a SPAM method.
Preferably, a similarity distance between two behavior vectors is calculated by using a cosine value or Euclidean distance, and a distance matrix is obtained, wherein the distance matrix is used for reflecting the similarity relation of the behavior vectors.
Preferably, the mining method of the object learning behavior pattern is used for mining the learning behavior pattern of students for online education.
In another aspect, the present invention provides an apparatus for mining a learning behavior pattern of an object, including: the system comprises a behavior sequence construction module, a vectorization processing module, a clustering processing module, a digital processing module, a sequence pattern mining module and a behavior analysis module;
the behavior sequence construction module is used for arranging the behaviors of the objects according to the time sequence of the execution actions of the objects, constructing behavior sequences corresponding to the objects and constructing an object behavior sequence database;
the vectorization processing module is used for vectorizing the behaviors in the object behavior sequence database by adopting learning behaviors;
the clustering processing module is used for clustering the behavior after the vectorization processing and dividing the behavior into different categories;
the digital processing module is used for assigning different digital codes to different behavior categories and constructing digital behavior sequences corresponding to the objects;
the sequence pattern mining module is used for mining frequent subsequences from the digital behavior sequence database by adopting a sequence pattern mining method; wherein each class of digitized behavior sequences serves as a set of items; the digitalized behavior sequence database is a set of digitalized behavior sequences corresponding to all objects;
the behavior analysis module is used for analyzing the object learning behavior pattern based on the frequent subsequences.
Preferably, the method for constructing the behavior sequence comprises the following steps:
extracting behavior data including object ID, time, and action;
dividing behavior data by taking an object as a unit according to the ID of the object;
and arranging the actions in each object behavior data according to the time of executing the actions by the objects, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
Preferably, the vectorization processing module is a word2vec module, and the word2vec module is used for executing operations of the word2vec model.
Preferably, the clustering method for the vectorized behavior vector comprises the following steps:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method with a contour coefficient method.
Preferably, the sequence pattern mining method is a SPAM method.
Preferably, the mining means of the object learning behavior pattern is applied to mining of the student learning behavior pattern of the online education.
In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:
firstly, adopting learning behavior enabling to vectorize behaviors in a behavior sequence database; then, carrying out behavior clustering on the behavior subjected to vectorization processing, and dividing the behavior into different categories; secondly, different digital codes are assigned to different behavior categories, and a digital behavior sequence corresponding to each object is constructed; finally, adopting a sequence pattern mining method to mine frequent subsequences from the digital behavior sequence database; the combination of vectorization processing and a sequence pattern mining method is realized, and compared with the prior art that the sequence pattern mining method is simply used, the traditional sequence pattern mining method cannot effectively cluster similar learning behaviors, so that the method is poor in performance under the condition of complex learning behavior types, and the mined sequence pattern is lack of representativeness; the invention uses the pretreatment means of vectorization of learning behaviors, the vectorized behaviors can intuitively see the similarity relation between the behaviors, and the complex learning types can be classified by matching with the behavior clustering method, so that the relation between the behaviors can be well represented, and the situations of excessive mining subsequences and redundant learning behavior subsequences can be reduced by matching with the sequence mining method, so that the mining accuracy is improved.
The model for vectorizing the behavior sequence database is a word2vec model; the word2vec model interprets each behavior sequence as a sentence and each behavior as a word, and converts each action into a behavior vector through a learning behavior casting process. Because the simple method can generate useful behavior embedding, capture learning or browsing modes of students and reveal the similarity of learning by the students in a similar way, the similar behavior modes of the students can be captured intuitively after word2vec training is carried out on the behavior sequence database.
After the vectorization processing of the behavior sequence is carried out, the similarity distance between the two behavior vectors can be calculated by using the cosine value or the Euclidean distance, and a distance matrix is obtained, wherein the distance matrix can reflect the similarity relation of the behavior vectors. Although the distance matrix cannot classify the behaviors in the behavior sequence, the distance matrix can be used as an aid in verifying the clustering effect.
The invention selects the k-means method to cluster the action vector, and selects the method of combining the elbow method and the contour coefficient method to determine the value of k in the k-means method, because the clustering divides a data set into different classes or clusters according to a specific standard (such as a distance criterion), the similarity of the data objects in the same cluster is as large as possible, and the difference of the data objects which are not in the same cluster is also as large as possible, namely, the data of the same class after the clustering are gathered together as much as possible, and the different data are separated as much as possible. Thus, different types of learning behavior can be effectively divided.
Drawings
FIG. 1 is a schematic diagram of a mining method for learning behavior patterns of students according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a learning behavior clustering result provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of the first 10 results of mining a behavior pattern using the SPAM method after vectorized clustering according to an embodiment of the present invention;
FIG. 4 is a graph showing SSE as a function of k-value for an elbow method provided by an embodiment of the present invention;
FIG. 5 is a graph of Silhouette Coeffcient as a function of k in a profile factor method according to an embodiment of the present invention;
FIG. 6 is a flowchart of a SPAM method provided by an embodiment of the present invention;
fig. 7 is an exemplary schematic diagram of S expansion and I expansion processes provided in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In one aspect, the invention provides a mining method for learning behavior patterns of an object, comprising the following steps:
arranging object behaviors according to the time sequence of the execution actions of the objects, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database;
carrying out vectorization processing on behaviors in an object behavior sequence database by adopting learning behaviors;
clustering the vectorized behaviors, and dividing the behaviors into different categories;
assigning different digital codes to different behavior categories, and constructing a digital behavior sequence corresponding to each object;
adopting a sequence pattern mining method to mine frequent subsequences from a digital behavior sequence database; wherein each class of digitized behavior sequences serves as a set of items; the digitalized behavior sequence database is a set of digitalized behavior sequences corresponding to all objects;
the object learning behavior pattern is analyzed based on frequent subsequences.
Preferably, the method for constructing the behavior sequence comprises the following steps:
extracting behavior data including object ID, time, and action;
dividing behavior data by taking an object as a unit according to the ID of the object;
and arranging the actions in each object behavior data according to the time of executing the actions by the objects, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
Preferably, the method for vectorizing the behavior sequence corresponding to each object comprises the following steps:
inputting an object behavior sequence database into a word2vec model;
the word2vec model interprets each behavior sequence as a sentence and each behavior as a word, and converts each action into a behavior vector through a learning behavior casting process.
Preferably, the behavior clustering method after vectorization is as follows:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method with a contour coefficient method.
Preferably, the sequence pattern mining method is a SPAM method.
Preferably, a similarity distance between two behavior vectors is calculated by using a cosine value or Euclidean distance, and a distance matrix is obtained, wherein the distance matrix is used for reflecting the similarity relation of the behavior vectors.
Preferably, the mining method of the object learning behavior pattern is used for mining the learning behavior pattern of students for online education.
In another aspect, the present invention provides an apparatus for mining a learning behavior pattern of an object, including: the system comprises a behavior sequence construction module, a vectorization processing module, a clustering processing module, a digital processing module, a sequence pattern mining module and a behavior analysis module;
the behavior sequence construction module is used for arranging the behaviors of the objects according to the time sequence of the execution actions of the objects, constructing behavior sequences corresponding to the objects and constructing an object behavior sequence database;
the vectorization processing module is used for vectorizing each behavior in the object behavior sequence database by adopting learning behavior enabling;
the clustering processing module is used for clustering the behavior after the vectorization processing and dividing the behavior into different categories;
the digital processing module is used for assigning different digital codes to different behavior categories and constructing digital behavior sequences corresponding to the objects;
the sequence pattern mining module is used for mining frequent subsequences from the digital behavior sequence database by adopting a sequence pattern mining method; wherein each class of digitized behavior sequences serves as a set of items; the digitalized behavior sequence database is a set of digitalized behavior sequences corresponding to all objects;
the behavior analysis module is used for analyzing the object learning behavior pattern based on the frequent subsequences.
Preferably, the method for constructing the behavior sequence comprises the following steps:
extracting behavior data including object ID, time, and action;
dividing behavior data by taking an object as a unit according to the ID of the object;
and arranging the actions in each object behavior data according to the time of executing the actions by the objects, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
Preferably, the vectorization processing module is a word2vec module, and the word2vec module is used for executing operations of the word2vec model.
Preferably, the clustering method for the vectorized behavior vector comprises the following steps:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method with a contour coefficient method.
Preferably, the sequence pattern mining method is a SPAM method.
Preferably, the mining means of the object learning behavior pattern is applied to mining of the student learning behavior pattern of the online education.
Examples
As shown in fig. 1, the present embodiment relates to a mining method based on learning behavior patterns of students in an online education environment, which includes learning behavior vectorization representation, clustering after behavior vectorization, and sequence pattern mining of learning behaviors after clustering.
The method for mining the learning behavior mode of the student in the online learning platform environment specifically comprises the following steps:
1. preprocessing of data
Extracting learning behavior data of students in an online learning platform environment: the data source is behavioral data of a portion of the student, including: student ID, time, action, etc. behavior data;
dividing behavior data by taking students as units according to the IDs of the students, arranging actions in each student behavior data according to the time of executing the actions by the students, deleting the IDs and the time of the students, and constructing a session corresponding to each student, wherein the session only comprises the action data of the students; the session corresponding to each student is used as a behavior sequence corresponding to the student, and then a student learning behavior sequence database is constructed;
2. vectorizing the behavior sequence by adopting learning behavior embellishing
Learning the behavior embdding by using an NLP (natural language processing) method; the method comprises the following steps:
for all the behavior sequences corresponding to students to form a behavior sequence database, wherein each behavior sequence comprises a plurality of behaviors, in the embodiment, a standard word2vec model is directly applied to realize learning of behavior embedding, the word2vec model inputs the behavior sequences, each behavior sequence is interpreted as a sentence, each behavior is interpreted as a word, a behavior vector corresponding to each action is output, and useful behavior embedding is generated; the learning or browsing patterns of the student can be captured by the behavior vector, and the similarity of learning by the student in a similar manner is revealed. The specific process of realizing learning behavior empdding by adopting word2vec model is as follows:
inputting a behavior sequence corresponding to the student into a word2vec model;
the word2vec model interprets each behavior sequence as a sentence, interprets each behavior as a word, and converts each action into a behavior vector through a learning behavior embedding process;
for the verification of the behavior vector clustering result, the embodiment can calculate the similarity relationship of the two behavior vectors corresponding to the behaviors; the method comprises the following steps:
calculating the similarity distance between two behavior vectors by using a cosine value or Euclidean distance to obtain a distance matrix;
the distance matrix can more intuitively see the similarity and cosine value between each behavior vector, and better judge the similarity relation between the learning behaviors of students;
3. clustering vectorized behavior
After vectorization representation is carried out on learning behaviors of students, a k-means method is selected to cluster the behavior vectors; selecting a method combining an elbow method and a contour coefficient method to determine the value of k;
after vectorization and clustering are carried out on all learning behaviors of students, six types of learning behaviors are obtained after the learning behaviors are clustered, the obtained six types of behaviors are named according to clustering results, and behavior data in an original data set can be clustered into six types of structures including classroom operation, task operation, knowledge construction a, knowledge construction b, analysis behaviors and anti-thinking behaviors; the action data in the data sources contained in each class are shown in fig. 2, and the six types of actions reflect the learning actions and states of students.
The test scenario of this embodiment is a process of drawing a conceptual diagram by a student, and the classroom operation behavior represents a basic operation of the student on a classroom task, for example: importing and exporting files, creating projects, closing operations and other actions; the task operation is to download, answer, submit and the like related behaviors to task students issued by teachers; knowledge construction a is a behavior of copy and paste, and the knowledge construction category is also realized due to the fact that the thinking cost is not high; the knowledge construction b is to construct the operations such as establishing, editing, deleting and adding relations in the conceptual diagram for students; the analysis behavior is the behavior of magnifying and shrinking the conceptual diagram by students, and whether the overall analysis construction is reasonable or not; the anti-thinking behavior is to cancel the redo behavior, and is to the students' anti-thinking behavior of the drawn conceptual diagram;
4. six-class behavior applicable SPAM method mining behavior mode
After the learning behaviors of the students are subjected to vectorization representation and clustering, the learning behaviors and the similarity relation of the learning behaviors can be visually seen, different numerical codes are assigned to different behavior types on the basis, and a digital behavior sequence is correspondingly converted into the numerical codes to construct the digital behavior sequence; excavating learning behavior patterns of students by applying a sequence pattern excavation method, wherein the sequence excavation method refers to a knowledge discovery process for searching frequent subsequences from a digitalized behavior sequence database as patterns; after comprehensively comparing the attribute, the scanning times and the time constraint of the classical sequence mining method, selecting a SPAM (Sequential PAttern Mining using A Bitmap Representation) method in the sequence mining method, and inputting a clustered student digital behavior sequence database and a specified threshold support degree minsup, wherein the threshold support degree minsup represents the duty ratio of a subsequence mode in the digital behavior sequence database, and outputting by a SPAM method to obtain a frequent subsequence mode; note that in this embodiment, each type of learning behavior is taken as a term set, learning behavior modes of students are mined, the support degree minsup=0.8 is set, the number of frequent subsequences mined is 181, the number of frequent subsequences with the support degree minsup being 100% is 125, and mining results in fig. 3 enumerate the first 10 subsequence modes mined to the maximum support degree;
5. post-behavioral vector mining result analysis
The mining results in fig. 3 show that most of the sub-sequence modes are knowledge construction or task operation or combination of the two types, so that students can frequently operate class tasks and operate web pages, but the results have no analysis behaviors and anti-thinking behaviors, and reflect that the students are involved in knowledge construction uniformly, and lack of corresponding meta-cognition capability and strategies, so that planning, monitoring and regulation are lacked; the mining result is also consistent with the vectorized result, and it can be seen that the learning behavior analysis method of "behavior vectorization+sequence pattern mining" provided in this embodiment is effective. For example: the class of the mined 'flag-driven' action accounts for 55% of the total behavior pattern, and if a preprocessing means of behavior vectorization is not used, the behavior pattern obtained by the action is only 0.4%; from the mining results, the mining effect of the student learning behavior mode provided by the invention is improved no matter how many frequent subsequences are mined, the number of entries with the same support degree is mined or the less-occurring action duty ratio is used.
In summary, the invention uses the method of embellishing when carrying out vectorization on the learning behavior of the student, and utilizes word2vec modeling to obtain vectorization representation of the learning behavior of the student; then clustering by using k-means to obtain a plurality of corresponding clusters; comparing the result obtained by the ebadd method with the result obtained by the sequence pattern mining method, and judging whether the sequence of the frequent subsequence pattern belongs to the same cluster or has higher similarity of words.
The following describes the method and model related to the present invention, specifically as follows:
(1) Learning behavior empdding is to convert a large sparse vector into a low-dimensional space retaining semantic relationships; the core problem of sparse input data is solved by mapping high-dimensional data to a lower-dimensional space; even a small multidimensional space can freely combine semantically similar items together and use different item separation, and the positions in the vector space can encode the semantically into a good emmbedding;
there are many ways to implement learning behavior, such as: matrix decomposition modes such as MF, SVD and the like, DNN-embedding deep neural network embedding, graph embedding and sequence embedding: typically including word2vec and Item2vec; word2vec is used in this embodiment;
the word2vec model is actually divided into two parts, wherein the first part is a built model, and the second part is an embedded word vector selected through the model; the skip-gram method is used for establishing the model, and partial behavior data of the first student is taken as an example, for example, added openedrecoverrecoverset; firstly, selecting a word in the middle of learning behaviors of students as an input word of a word2vec model, for example, selecting an "open" behavior word as the input word; a skip-window parameter is defined again, which represents the number of words selected from one side of the current input word, for example, setting "skip-window=2", then the words in the final window are 'added', 'open', 'over'; again, a num-skip is defined, which represents how many different words are selected from the whole window as output words, and given that "num-skip=2" will result in training data in the form of (input words, output words), two sets of training data are obtained (' open ', ' added ') (open ', ' recovery '); the neural network will output a probability distribution based on the training data, the probability representing the likelihood that each word in the our dictionary is an output word; the output probability of the model represents how likely each word in the dictionary is to appear simultaneously with the input word; the model learns from the times of each pair of single occurrences to obtain a statistical result; a vectorized representation of each learning behavior is obtained according to the procedure described above.
(2) After vectorization is carried out on the learning behaviors, the learning behaviors are required to be classified, and the meaning of which behaviors are expressed is known through classification, so that the similar behaviors in the text can be conveniently extracted; clustering by using a k-means method, and determining the value of k by adopting a method combining an elbow method and a contour coefficient method; fig. 4 and fig. 5 are respectively obtained by an elbow method and a contour coefficient method to obtain a k point value graph;
the core indexes of the elbow method are error square sum SSE, the sample division is finer along with the increase of the clustering number k, the aggregation degree of each cluster is gradually increased, and then the error square sum SSE naturally becomes smaller gradually; when k is smaller than the real cluster number, the aggregation degree of each cluster is greatly increased due to the increase of k, so that the reduction amplitude of SSE is large; when k reaches the true cluster number, the aggregation degree return obtained by increasing k is rapidly reduced, so that the reduction amplitude of SSE is rapidly reduced, and then the value of k is gradually flattened along with the continuous increase of the value of k;
the core index of the contour coefficient method is a contour coefficient, and the average contour coefficient is obtained by averaging after the contour coefficients of all samples are obtained; the value range of the average contour coefficient is [ -1,1], and the closer the distance between samples in the cluster is, the farther the distance between samples in the cluster is, the larger the average contour coefficient is, and the better the clustering effect is. Therefore, k with the largest average profile coefficient is the optimal cluster number;
(3) The sequence pattern mining method is a knowledge discovery process of searching frequent subsequences from a sequence database as patterns, namely, a sequence database is input, and all sequence processes with the minimum support degree are output; various types of sequence pattern mining methods differ in using breadth-first or depth-first searches, and how the support of patterns is calculated to determine whether they meet minimum support constraints; after the attribute, the scanning times, the time constraint and the use scene of the algorithm are comprehensively compared, a SPAM method is selected in a classical sequence pattern mining method, a series of action data of each student in a sequence database is recorded as a sequence, and the SPAM sequence pattern mining method is applied to obtain frequent subsequence behavior patterns;
when the SPAM sequence pattern mining method is used for mining the learning behavior pattern of students, a database representation of a vertical bitmap is used, a depth-first search method is introduced into the method, and only 2 times of scanning are needed to be carried out on the database; the flow of the SPAM method is shown in FIG. 6, a sequence database is scanned, and all frequent 1 sequence modes are found; generating a corresponding sequence pattern bit chart according to the found frequent 1 sequence pattern; then adopting a depth-first search method, recursively carrying out sequence item and item set expansion operation in an enumeration mode, and calculating the non-0 number of the expanded sequence in a sequence mode bit chart to obtain the support count of the sequence; calculating the support degree of the sequence by counting the non-0 number of the sequence in the bitmap; the core is to use depth-first traversal, and S expansion and I expansion sequence modes are used; fig. 7 shows the procedure of S expansion and I expansion; the input of the SPAM method is a sequence database and a value in a threshold value minutiae (0, 1) designated by a user represents percentage; the output of SPAM is all frequent sequence patterns that occur in the sequence database (subsequences occur above the database minutiae sequences).
In summary, compared with the prior art, the invention has the following advantages:
according to the invention, firstly, a learning behavior enabling is adopted to carry out vectorization processing on behaviors corresponding to all objects; clustering the vectorized behaviors, and dividing the behaviors into different categories; secondly, different digital codes are assigned to different behavior categories, and a digital behavior sequence corresponding to each object is constructed; finally, adopting a sequence pattern mining method to mine frequent subsequences from the digital behavior sequence database; the method comprises the steps of combining vectorization processing and a sequence pattern mining method, wherein compared with the prior art, the conventional sequence pattern mining method cannot effectively cluster similar learning behaviors, so that the phenomenon of poor performance of the conventional sequence pattern mining method under the condition of complex learning behavior types is caused, the mined sequence patterns are lack of representativeness and cannot well represent the relation among the behaviors.
The model for vectorizing the behavior sequence is a word2vec model; the word2vec model interprets each behavior sequence as a sentence and each behavior as a word, and converts each action into a behavior vector through a learning behavior casting process. Since this simple method can generate useful behavior embeddings, capture learning or browsing patterns of students, and reveal similarities in learning by students in a similar manner, similar behavior patterns of students can be captured very intuitively after word2vec training.
After the vectorization processing of the behavior sequence is carried out, the similarity distance between the two behavior vectors can be calculated by using the cosine value or the Euclidean distance, and a distance matrix is obtained, wherein the distance matrix can reflect the similarity relation of the behavior vectors. Although the distance matrix cannot classify the behaviors in the behavior sequence, the distance matrix can be used as an aid in verifying the clustering effect.
The invention selects the k-means method to cluster the action vector, and selects the method of combining the elbow method and the contour coefficient method to determine the value of k in the k-means method, because the clustering divides a data set into different classes or clusters according to a specific standard (such as a distance criterion), the similarity of the data objects in the same cluster is as large as possible, and the difference of the data objects which are not in the same cluster is also as large as possible, namely, the data of the same class after the clustering are gathered together as much as possible, and the different data are separated as much as possible. Thus, different types of learning behavior can be effectively divided.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A mining method for learning behavior patterns of an object, comprising the steps of:
arranging object behaviors according to the time sequence of the execution actions of the objects, constructing behavior sequences corresponding to the objects, and constructing a behavior sequence database;
carrying out vectorization processing on behaviors in a behavior sequence database by adopting learning behaviors;
clustering the vectorized behaviors, and dividing the behaviors into different categories;
assigning different digital codes to different behavior categories, and constructing a digital behavior sequence corresponding to each object;
adopting a sequence pattern mining method to mine frequent subsequences from a digital behavior sequence database; wherein each class of digitized behavior sequences serves as a set of items; the digitalized behavior sequence database is a set of digitalized behavior sequences corresponding to all objects;
the object learning behavior pattern is analyzed based on frequent subsequences.
2. The mining method according to claim 1, characterized in that the method of constructing the behavior sequence comprises the steps of:
extracting behavior data including object ID, time, and action;
dividing behavior data by taking an object as a unit according to the ID of the object;
and arranging the actions in each object behavior data according to the time of executing the actions by the objects, deleting the object ID and the time in the behavior data, constructing a behavior sequence corresponding to each object, and constructing an object behavior sequence database.
3. The mining method according to claim 1 or 2, characterized in that the method for vectorizing the behavior sequence corresponding to each object comprises the following steps:
inputting an object behavior sequence database into a word2vec model;
the word2vec model interprets each behavior sequence as a sentence and each behavior as a word, and converts each action into a behavior vector through a learning behavior casting process.
4. The mining method according to claim 1, wherein the clustering method for the vectorized behavior vectors is as follows:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method with a contour coefficient method.
5. The mining method according to claim 1 or 4, characterized in that the sequence pattern mining method is a SPAM method.
6. The mining method according to claim 3, wherein a similarity distance between the two behavior vectors is calculated using a cosine value or euclidean distance, and a distance matrix is obtained, the distance matrix being used to reflect a similarity relationship of the behavior vectors.
7. The mining method according to claim 1, characterized in that it is used for mining learning behavior patterns of students for online education.
8. An mining apparatus for learning a behavior pattern of an object, comprising: the system comprises a behavior sequence construction module, a vectorization processing module, a clustering processing module, a digital processing module, a sequence pattern mining module and a behavior analysis module;
the behavior sequence construction module is used for arranging object behaviors according to the time sequence of the object execution actions, constructing behavior sequences corresponding to the objects and constructing an object behavior sequence database;
the vectorization processing module is used for vectorizing the behaviors in the object behavior sequence database by adopting learning behaviors;
the clustering processing module is used for clustering the behavior after the vectorization processing and dividing the behavior into different categories;
the digital processing module is used for assigning different digital codes to different behavior categories and constructing digital behavior sequences corresponding to the objects;
the sequence pattern mining module is used for mining frequent subsequences from the digital behavior sequence database by adopting a sequence pattern mining method; wherein each class of digitized behavior sequences serves as a set of items; the digitalized behavior sequence database is a set of digitalized behavior sequences corresponding to all objects;
the behavior analysis module is used for analyzing the object learning behavior pattern based on the frequent subsequence.
9. The mining apparatus of claim 8, wherein the clustering of the vectorized behavior vectors is performed by:
selecting a k-means method to cluster the behavior vectors; and determining the value of k in the k-means method by combining an elbow method with a contour coefficient method.
10. The mining apparatus according to claim 8 or 9, wherein the vectorization processing module is a word2vec module, and the word2vec module is configured to perform operations of the word2vec model.
CN202110989581.6A 2021-01-26 2021-08-26 Mining method and device for object learning behavior mode Active CN113742396B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110989581.6A CN113742396B (en) 2021-08-26 2021-08-26 Mining method and device for object learning behavior mode
PCT/CN2021/128838 WO2022160842A1 (en) 2021-01-26 2021-11-05 Student collaboration state assessment method and system based on electroencephalogram data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110989581.6A CN113742396B (en) 2021-08-26 2021-08-26 Mining method and device for object learning behavior mode

Publications (2)

Publication Number Publication Date
CN113742396A CN113742396A (en) 2021-12-03
CN113742396B true CN113742396B (en) 2023-10-27

Family

ID=78733164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110989581.6A Active CN113742396B (en) 2021-01-26 2021-08-26 Mining method and device for object learning behavior mode

Country Status (1)

Country Link
CN (1) CN113742396B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022160842A1 (en) * 2021-01-26 2022-08-04 华中师范大学 Student collaboration state assessment method and system based on electroencephalogram data
CN115600925A (en) * 2022-10-28 2023-01-13 广州宏途数字科技有限公司(Cn) In-class student behavior analysis auxiliary system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281891A (en) * 2014-10-13 2015-01-14 安徽华贞信息科技有限公司 Time-series data mining method and system
CN111090679A (en) * 2019-10-31 2020-05-01 国网浙江省电力有限公司 Time sequence data representation learning method based on time sequence influence and graph embedding
AU2020103216A4 (en) * 2020-09-25 2021-01-14 Qilu University Of Technology A similarity analysis method of negative sequential patterns based on biological sequences and its implementation system and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021354A (en) * 2016-05-10 2016-10-12 北京信息科技大学 Establishment method of digital interpretation library of Dongba classical ancient books

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281891A (en) * 2014-10-13 2015-01-14 安徽华贞信息科技有限公司 Time-series data mining method and system
CN111090679A (en) * 2019-10-31 2020-05-01 国网浙江省电力有限公司 Time sequence data representation learning method based on time sequence influence and graph embedding
AU2020103216A4 (en) * 2020-09-25 2021-01-14 Qilu University Of Technology A similarity analysis method of negative sequential patterns based on biological sequences and its implementation system and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
数据挖掘技术在学习分析中的应用研究;陈方华;白雪;孟凡媛;韩营;;软件导刊(教育技术)(02);全文 *

Also Published As

Publication number Publication date
CN113742396A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
CN110598005B (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
Yu et al. Beyond Word Attention: Using Segment Attention in Neural Relation Extraction.
CN109657947B (en) Enterprise industry classification-oriented anomaly detection method
Fyfe et al. Maximum likelihood Hebbian rules.
CN109255031A (en) The data processing method of knowledge based map
CN106649275A (en) Relation extraction method based on part-of-speech information and convolutional neural network
CN109918505B (en) Network security event visualization method based on text processing
CN113742396B (en) Mining method and device for object learning behavior mode
CN113779272A (en) Data processing method, device and equipment based on knowledge graph and storage medium
CN109299270A (en) A kind of text data unsupervised clustering based on convolutional neural networks
CN113196277A (en) System for retrieving natural language documents
CN114265935A (en) Science and technology project establishment management auxiliary decision-making method and system based on text mining
CN116842194A (en) Electric power semantic knowledge graph system and method
CN113434418A (en) Knowledge-driven software defect detection and analysis method and system
CN114912435A (en) Power text knowledge discovery method and device based on frequent itemset algorithm
CN112417893A (en) Software function demand classification method and system based on semantic hierarchical clustering
CN116561264A (en) Knowledge graph-based intelligent question-answering system construction method
CN116245107A (en) Electric power audit text entity identification method, device, equipment and storage medium
CN115659947A (en) Multi-item selection answering method and system based on machine reading understanding and text summarization
Gelman et al. A language-agnostic model for semantic source code labeling
CN110502669A (en) The unsupervised chart dendrography learning method of lightweight and device based on the side N DFS subgraph
CN113139061B (en) Case feature extraction method based on word vector clustering
Jaiswal et al. Genetic approach based bug triage for sequencing the instance and features
Miksatko et al. What’s in a cluster? automatically detecting interesting interactions in student e-discussions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant