WO2023015483A1

WO2023015483A1 - Method and apparatus for prerequisite relation discovery for concepts of a plurality of courses

Info

Publication number: WO2023015483A1
Application number: PCT/CN2021/112049
Authority: WO
Inventors: Evgeny Kharlamov; Jie Tang; Jifan YU; Juanzi LI; Lei HOU; Zhiyuan Liu; Maosong SUN; Gan LUO
Original assignee: Robert Bosch Gmbh; Tsinghua University
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2023-02-16
Also published as: CN118020097A

Abstract

A method for enabling a neural network system to discover prerequisite relations among concepts of a plurality of courses is provided, wherein each of the plurality of courses contains a series of videos. The method comprising: collecting student behavior data, including at least video watching behavior data across the plurality of courses; modeling the student behavior data into prerequisite features based at least on video watching behavior pattern, wherein the video watching behavior pattern is formed by one or more video pairs from the video watching behavior data; and training the neural network system based at least on the prerequisite features. Numerous other aspects are provided.

Description

[Title established by the ISA under Rule 37.2] METHOD AND APPARATUS FOR PREREQUISITE RELATION DISCOVERY FOR CONCEPTS OF A PLURALITY OF COURSES

FIELD

Aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a system having the ability of prerequisite relation discovery for concepts.

BACKGROUND

Many efforts from pedagogy have suggested that students should grasp prerequisite knowledge before moving forward to learn subsequent knowledge. Such prerequisite relations are described as the dependence among knowledge concepts, which are crucial for students to learn, organize, and apply knowledge.

In the era of intelligent education, prerequisite relations play an essential role in a series of educational applications such as curriculum planning, reading list generation, course recommendation, etc. With explicit prerequisite relations among concepts, a coherent and reasonable learning sequence can be recommended to a student or user. However, as the quantity of educational resources proliferates, the explosive growth of knowledge concepts makes it expensive and ineffective to obtain fine-grained prerequisite relations by expert annotations. Therefore, there exists the need for automatically discovering prerequisite relations among concepts, aiming to detect the dependence of concepts from courses.

Despite several attempts on the task of prerequisite relation discovery of concepts, including extracting such relations from the content of videos of courses and the preset orders of courses, it is still far from sufficient to directly apply these methods in the practical applications.

SUMMARY

The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

Unlike the factual relations of general entities, prerequisite relations are more cognitive than factual, which makes it rarely mentioned in texts and challenging to be directly captured from courses. Moreover, the courses and videos considered to be prerequisite clues are often noisy as a video usually teaches several concepts; it is common that some of these concepts are not prerequisite to the ones in later videos. Therefore, it is crucial to discover prerequisite relations from more effective sources other than courses or videos.

Based on the idea from educational psychology that students' learning behaviors are positively related to the cognitive structure of knowledge, by collecting actual data of the students' behavior and analyzing typical student behavior patterns, the students' behavior can be used as effective sources for helping prerequisite relation discovery. Furthermore, to explore student behaviors' better modeling, a graph-based solution is provided by building concept graphs from student behaviors and conducting link prediction on them.

According to an aspect, a method for enabling a neural network system to discover prerequisite relations among concepts of a plurality of courses is provided, wherein each of the plurality of courses contains a series of videos. The method comprising: collecting student behavior data, including at least video watching behavior data across the plurality of courses; modeling the student behavior data into prerequisite features based at least on video watching behavior pattern, wherein the video watching behavior pattern is formed by one or more video pairs from the video watching behavior data; and training the neural network system based at least on the prerequisite features.

According to a further aspect, the video watching behavior pattern comprises one or more of: sequential watching pattern, cross course watching pattern, skipping watching pattern and backward watching pattern.

According to a further aspect, the neural network system is based partially on a graph-based model, which is constructed based on one or more concept graphs, each of the one or more concept graphs is a weighted directed graph, and wherein each node of the weighted directed graph represents one concept of the set of concepts, and each edge indicates a prerequisite relation among two concepts.

According to another aspect, a method for enabling a curriculum planning system including a neural network system to discover prerequisite relations among concepts of a plurality of courses is provided, wherein each of the plurality of courses contains a series of videos. The method comprising: collecting student behavior data, including at least the video watching behavior data across the plurality of courses; modeling the student behavior data into prerequisite features based at least on the video watching behavior pattern, wherein the video watching behavior pattern is formed by one or more video pairs from the video watching behavior data; and training the neural network system based at least on the prerequisite features.

According to another aspect, a method for enabling a reading/watching list generation system including a neural network system to discover prerequisite relations among concepts of a plurality of courses is provided, wherein each of the plurality of courses contains a series of videos. The method comprising: collecting student behavior data, including at least the video watching behavior data across the plurality of courses; modeling the student behavior data into prerequisite features based at least on the video watching behavior pattern, wherein the video watching behavior pattern is formed by one or more video pairs from the video watching behavior data; and training the neural network system based at least on the prerequisite features.

According to another aspect, a method for enabling a course recommendation system including a neural network system to discover prerequisite relations among concepts of a plurality of courses is provided, wherein each of the plurality of courses contains a series of videos. The method comprising: collecting student behavior data, including at least the video watching behavior data across the plurality of courses; modeling the student behavior data into prerequisite features based at least on the video watching behavior pattern, wherein the video watching behavior pattern is formed by one or more video pairs from the video watching behavior data; and training the neural network system based at least on the prerequisite features.

The present disclosure enables a neural network system to automatically discover prerequisite relations among concepts based at least on actual students' behavior data, which reveal connections between the concepts from more cognitive than factual.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosed aspects will be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.

Fig. 1 illustrates exemplary prerequisite relations among concepts of courses in Massive Open Online Courses (MOOCs) , in accordance with various aspects of the present disclosure.

Fig. 2 illustrates an exemplary flow chart of data construction, in accordance with various aspects of the present disclosure.

Fig. 3 illustrates an exemplary flow chart of data annotation, in accordance with various aspects of the present disclosure.

Fig. 4 illustrates exemplary video watching behavior patterns for students, in accordance with various aspects of the present disclosure.

Fig. 5 illustrates the framework of an exemplary graph-based model, in accordance with various aspects of the present disclosure.

Fig. 6 illustrates a flow chart of an exemplary method for a neural network system to discover prerequisite relations among concepts, in accordance with various aspects of the present disclosure.

Fig. 7 illustrates an exemplary computing system 700, in accordance with various aspects of the present disclosure.

DETAILED DESCRIPTION

The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.

Various embodiments will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and embodiments are for illustrative purposes, and are not intended to limit the scope of the disclosure.

Discovering prerequisite relations, it aims at detecting the dependence of concepts from different types of resources. The task of identifying prerequisite relations originates from educational data mining, which could help in automatic curriculum planning and other educational applications.

Fig. 1 illustrates exemplary prerequisite relations among concepts of courses in Massive Open Online Courses (MOOCs) , in accordance with various aspects of the present disclosure. It is noted that although most of the embodiments are discussed in the context of Massive Open Online courses (MOOCs) , they are not limited thereto and could be applied to any type of online course website or system, or offline in the case of the student behavior data are available.

As shown in Fig. 1, there are three courses C1-C3, each of the courses comprises a series of videos V1-Vn. Further, each video comprises a plurality of concepts, and concepts of different videos may overlap with each other. For a student who wants to learn the concept C, as shown in the concept graph, he/she is expected to have had the knowledge of it prerequisite concepts A and B, corresponding to video 12 of course 1 and video 25 of course 2 respectively. Thus, the student may be suggested to follow the prerequisite chain for learning the concept C, from video 12 of course 1 to video 25 of course 2 to video 18 of course 3.

A various kinds of methods have been explored to detect prerequisite relations from courses, especially online courses, including matrix optimization, feature engineering, neural networks, most of these methods consider sources below as indispensable clues for discovering such relations: the static information of online courses, paper citation networks, textbooks' unit sequences and titles, etc.

For the purpose of discovering prerequisite relations among concepts, some basic definitions and formulations are given below. It is noted that although most of the embodiments are discussed in the context of MOOCs, they are not limited thereto and could be applied to any type of online course website or system, or offline in the case of the student behavior data are available.

A MOOC corpus, which is composed of a set of courses, denoted as

where

indicates the i-th course. Each course includes a sequence of videos, denoted as

where v _ij refers to a video with its subtitles from the course.

Course dependence is defined as a prerequisite relation between courses, denoted as

which indicates that course

is a prerequisite course of

In an embodiment, this information may be obtained from the order the courses are listed. In another embodiment, this information may be provided by the teachers when setting up new courses. How the information is provided would not limit the scope of the invenion.

Based on a cognitive learning hypothesis that students tend to follow the prerequisite cognitive structure to learn new knowledge, the student behavior data are also introduced into our method as a type of resource for prerequisite relation discovery. The student behavior is organized as the video watching behaviors, denoted as

where each behavior records student u∈U started to watch the video v at time t, and U is the set of all students.

Course concepts are the subjects taught in a course, for example, "convolutional neural network" would be a concept of the Machine Learning course. The concepts of a certain video, a course and the whole MOOC corpus are denoted as

and

respectively. The video concepts

is the concepts taught in video v _ij, and v _ij is the j-th video in course

As a course is consist of several videos, the course concept in course

may be

And all the concepts of the MOOC corpus is

Based on the above, discovering prerequisite relation of course concepts in MOOCs is formulated as: given the MOOC corpus

course dependence

student behavior

and the corresponding course concepts

the objective is to learn a function

that maps a concept pair (c _a, c _b) , where

to a binary class that indicates whether c _a is a prerequisite concept of c _b.

To this end, a method for organizing multi-level annotations for concepts and constructing a prerequisite concept dataset is provided, and further a feature-based model for modeling student behavior data into student behavior features is provided hereinafter.

As it is challenging to ensure data quality while keeping low annotation cost due to prerequisite relationships' sparsity, there are two issues should be considered in data construction: the connectivity of course concepts and the effectiveness of annotations. Fig. 2 illustrates an exemplary flow chart of data construction, in accordance with various aspects of the present disclosure.

It is noted that although most of the embodiments are discussed in the context of MOOCs, they are not limited thereto and could be applied to any type of online course website or system, or offline in the case of the student behavior data are available.

The data construction begins at block 201, with selecting a set of courses from MOOCs. In an embodiment, the set of courses are selected based at least on their similar domains, making their concepts highly relevant, in order to lift the connectivity of course concepts.

The data construction then proceeds to block 202, with downloading materials for the selected set of courses, which include the video orders and subtitles; and obtaining the video watching logs of students for the selected set of courses as user behavior data source. In an embodiment, the video watching logs of students include one or more of but not limited to students' profile, log in/out timestamp, watching duration, etc.

The data construction then proceeds to block 203, with annotating the dependence of courses. In an embodiment, the annotating may be made from the order the courses listed in MOOCs. In another embodiment, the annotating may be provided by the teachers who set up the courses.

The data construction then proceeds to block 204, with extracting concepts from the subtitles of the selected set of courses. The extracting may be achieved by any concept extraction method, and the extracted concept candidates can be further confirmed by annotators in order to give up the incorrect ones. In an embodiment, each concept's Wikipedia abstract is dumped as side-information for the reproduction of baseline methods.

The data construction then proceeds to block 205, with annotating the prerequisite relations among the extracted concepts. In an embodiment, the annotated concept pairs may be used as training dataset for a classifier to learn the prerequisite relations among concepts.

A critical annotation is the giant quantity and sparsity. If the concept number is n, n (n-1) /2, which requires arduous human labeling work, a multi-step strategy is provided, for the purpose of effectively annotating while reducing the workload.

The data annotation begins at block 301, with clustering the concepts to several groups, which may maintain possible prerequisite relations. In an embodiment, the clustering may be achieved by any clustering method. In another embodiment, the clustering may be done by the teacher of the corresponding courses.

The data annotation then proceeds to block 302, with generating candidate concept pairs within each of the clusters and sampling a small scale of the candidate concept pairs as golden standard to train more than one classifier as candidate filters. For example, the classifier can be GlobalF, PREREQ, CPR-Recover, etc.

The data annotation then proceeds to block 303, with annotating the concept pairs that at least one trained classifier predicts it to be prerequisite. In an embodiment, to ensure the accuracy of annotation, a concept pair is labeled as positive only when two annotators are in agreement.

The data construction in Fig. 2 and the data annotation in Fig. 3 could be used in combination to construct a prerequisite concept dataset with annotation.

By analyzing the clues of the prerequisite concepts implied in student learning orders inferred from the video watching logs of students for the selected set of courses, it could be found that although MOOCs preset the video order, the students often learn MOOC videos in their own orders. To leverage student behavior for discovering prerequisite relation, a feature-based model is provided.

A video watch behavior sequence

for each user u is constructed from student behavior record

where the video watch behaviors are sorted in time order. A video watching behavior pattern

is formed by one or more video pairs. A video pair (v _i, v _j) belongs to a pattern

when it matches the corresponding conditions. As the student behavior patterns are at video level, the prerequisite features of a concept pair

and

can be inferred by considering videos as bags of concepts, where

correspond to the concepts taught in v _i, v _j, and a concept may be taught in more than one videos. By speculating the causes of video watching behaviors from the cognitive perspective, prerequisite features

are built to model them into .

Four patterns of video watching behaviors are shown in Fig. 4: sequential watching, cross course watching, skip watching and backward watching.

Sequential watching pattern 401 indicates that a student watches videos in the course’s preset video order, which indicates that the concepts taught in these videos are in accordance with the prerequisite cognitive structure. To leverage this pattern, prerequisite feature

is assigned for the concepts c _a and c _b as:

Where function Seq (u, v _i, v _j) =1 holds when 1) v _i, v _j are the i-th, j-th videos of a student's watching record

and are in the same course, j>i; 2)

and

otherwise Seq (u, v _i, v _j) =0.

Considering there are multiple concepts taught in each video,

is employed to normalize the feature of a certain concept pair. Furthermore, since the distance between watching videos corresponds to their relatedness, an attenuation coefficient of α∈ (0, 1) is employed to capture distant dependence from long sequences in this pattern.

Cross course watching pattern 402 reflects the phenomenon that some students choose to watch videos in other courses before continuing on the present study besides watching in one course. The main reason is that the knowledge provided by other courses’ videos is helpful to study this course. Hence, cross course watching behavior could reflect the dependence between concepts from different courses. To leverage this pattern, prerequisite feature

is assigned for the concepts c _a and c _b as:

Where function Crs (u, v _i, v _j) =1 holds when 1) v _i, v _j are the i-th, j-th videos of a student's watching record

and are in the different courses; 2)

and

otherwise Crs (u, v _i, v _j) =0.

Considering there are multiple concepts taught in each video,

Skipping watching pattern 403 illustrates an abnormal student behavior that is skipping some videos when learning a course, which drops a hint that the “skipped videos” are not so necessary for latter videos’ comprehension. Given a student behavior sequence

and course video orders

the skipped video pairs can be detected and assigned a negative feature

for the concepts c _a and c _b as:

Where function Skp (u, v _i, v _j) =1 holds when 1) v _i, v _j are the i-th, j-th videos of a same courses and j>i; 2) v _j is watched by student u but v _i is not watched; 3)

and

otherwise Skp (u, v _i, v _j) =0.

Considering there are multiple concepts taught in each video,

Backward watching pattern 404 means a student goes back to a video that he/she watched before. A possible explanation is he/she jumps back to a video for re-learning prerequisite knowledge of the current video. Based on this assumption, prerequisite feature

is assigned for the concepts c _a and c _b as:

Where function Bck (u, v _i, v _j) =1 holds when 1) v _i, v _j are the i-th, j-th videos of a student's watching record

and j>i; 2) v _i is watched again after v _j; 3)

and

otherwise Bck (u, v _i, v _j) =0.

Considering there are multiple concepts taught in each video,

In an embodiment, the prerequisite features

came from the student behavior data can be combined as additional features to directly input into a neural network system for training the system to learn the prerequisite relations among concepts. In another embodiment, only one or more prerequisite features from

may be used. For example, the prerequisite features can be concatenated with original features to train a classifier in different methods, including but not limited to GlobalF, PREREQ, LSTM, etc.

In a further embodiment, since the prerequisite relations among concepts are transitive, i.e., if a→b, b→c then a→c, the student behaviors can be leveraged by building a concept graph to describe the dependence on a set of concepts. For example, a concept graph

is defined as a directed graph, whose nodes are course concepts

As another example, a concept graph

is defined as a weighted directed graph, whose nodes are course concepts

and each edge e= (c _a→c _b) ∈E is associated with a weight w _e. The examples hereinafter mainly based on the weighted directed graph but would not be limited thereto, or offline in the case of the student behavior data are available.

By regarding the prerequisite relation learning as a link prediction problem in a graph, the student behaviors are able to be leveraged better by utilizing Graph Convolutional Networks (GCNs) to model information propagation of the concepts. Meanwhile, as several types of information could also be applied to detect prerequisite concepts, including but not limited to course dependence and video order, similar concept graphs could be designed for these resources as well.

As shown in Fig. 5, three concept graphs

501,

502, and

503 are constructed based on student behaviors, course dependence and video orders respectively. It is noted that, the number of concept graphs and the resources used for constructing the concept graph would not limit the scope of the invention, any types of resource could be used if appropriate. As the nodes of these concept graphs are the same course concepts

the only difference would the settings of their edges. The main idea for assigning edge weight for each concept pair in these graphs is to calculate all edges' weights in a graph based on corresponding resources, and only preserve the edges with positive weights for they are helpful for relation reasoning.

The concept graph built from student behaviors is for modeling the prerequisite clues by combining the extracted features in equation (1) - (4) . Hence, the weight

for the edge e= (c _a→c _b) in graph

501 is assigned as:

Where

denotes the features of the concept c _a and c _b from the four video watching behavior patterns.

is used to normalize the weight to combine with other user-independent graphs.

The concept graphs for static prerequisite clues can be built though similar methods. Course dependence could be used in prerequisite learning since when a course is certain to be a prerequisite course of another one, there must be dependence relations between some of their concepts. A concept graph

502 can be built on course dependency to exploit this information, the weight

for the edge e= (c _a→c _b) in graph

502 is assigned as:

Where c _a and c _b are respective concepts of course

and

function

only when course pair

is in the course dependence set

otherwise

is used to normalize such information to concept-level.

Video order indicates the dependence between videos. In general, the previous videos in a course are helpful for the latter ones and such dependence tends to be stronger when two videos are closer. Thus, when calculating the weight for the concept graph

503, the attenuation coefficient α is also applied to obtain edge weight

for the edge e between concept c _a and c _b:

Where function VO (u, v _i, v _j) =1 only when 1) v _i and v _j are the i-th and j-th videos of a same course; 2)

and

otherwise VO (u, v _i, v _j) =0.

After concept graphs

501,

502, and

503 are constructed, GCNs could be utilized to reason prerequisite relations in these graphs. An adjacency matrix A of a graph and a feature matrix X of the concept nodes can be initialized for each graph. The adjacency matrix A, with a size of

can be derived from edge weights, for example, for the adjacency matrix A ^s of the concept graphs

501

where

is the weight of edge e= (c _i→c _j) . And the

sized feature matrix X of the concept nodes in all graphs is initialized by any pre-trained d-dimension language model if appropriate, i.e., X _i is the word embedding of the text concept c _i.

The training of GCNs on the weighted directed concept graphs follows the propagation rule below:

Where Θ is a matrix of filter parameters, Z is the convolved signal matrix, Z _i=h _i is the graph embedding of concept c _i,

and the Laplacian is

After the graph-training stage, the graph embeddings h _a and h _b of a concept pair (c _a, c _b) are input into a classifier 504 to do classification, for example the classifier could be a two-layer MLP followed with a sigmoid function as:

Where Pr 505 is the probability of the concept pair (c _a, c _b) to be prerequisite, σ (·) is the sigmoid function,

and

are trainable matrices, and

denotes vectors concatenation. In an embodiment, the graph embedding h _a of corresponding concept c _a is the combination of the node embeddings learned on each concept graphs. For example, the node embeddings can be concatenated or weighted sum together to combine into the graph embedding.

At block 601, obtaining a set of concepts from a plurality of courses, wherein each of the courses contains a series of videos. In an embodiment, the plurality of courses comes from a MOOC website, and the series of videos of each course are listed in a preset order.

At block 602, collecting student behavior data, including at least the video watching behavior data across the plurality of courses. In an embodiment, the student behavior data are collected from a MOOC website.

At block 603, modeling the student behavior data into prerequisite features based at least on the video watching behavior pattern, wherein the video watching behavior pattern is formed by one or more video pairs from the video watching behavior data. In an embodiment, the video watching behavior pattern comprises one or more of: sequential watching pattern, cross course watching pattern, skipping watching pattern and backward watching pattern. In a further embodiment, the prerequisite features can be modeled with equation (1) - (4) herein.

At block 604, training the neural network system based at least on the prerequisite features.

In an embodiment, the neural network system comprises at least one classifier, wherein the prerequisite features are directly used as at least part of features of training data. In a further aspect, the training data are constructed and annotated with the method described with Fig. 2 and Fig. 3.

In another embodiment, the neural network system is based partially on a graph-based model, which is constructed based on one or more concept graphs.

In an aspect, each of the one or more concept graphs is a weighted directed graph and each node of the weighted directed graph represents one concept of the set of concepts, and each edge indicates a prerequisite relation among two concepts.

In a further aspect, the graph-based model is constructed based on at least one concept graph, a weight of each edge of which is calculated based on the prerequisite features modeled based on the student behavior data.

In an aspect, a weight of each edge of one of the concept graphs is calculated based on the dependence of the plurality of courses.

In a further aspect, a weight of each edge of one of the concept graph is calculated based on the order of the series of videos of each course.

In an aspect, training the neural network system based at least on the prerequisite features comprising: initializing a word embedding of each concept of the set of concepts by a pre-trained language model; and inputting the word embeddings of all the concepts of the set of concepts to the graph-based model.

In a further aspect, training the neural network system based at least on the prerequisite features comprising: learning a graph embedding of each node of the set of concepts on the graph-based model by Graph Convolutional Networks (GCNs) .

In an aspect, learning the graph embedding of each node of the set of concepts on the graph-based model by GCNs comprising: learning a node embedding of each node of the set of concepts on the one or more concept graphs respectively; and combining the node embeddings corresponding to the one or more concept graphs as the graph embedding on the graph-based model.

In an aspect, combining the node embeddings corresponding to the one or more concept graphs comprising: concatenating the node embeddings corresponding to the one or more concept graphs; or weighted summing the node embeddings corresponding to the one or more concept graphs.

In an aspect, the graph embeddings corresponding to each concept pair are input into a classifier to determine whether the concept pair has prerequisite relation among them. In an aspect, the classifier can be a two-layer MLP followed with a sigmoid function. In a further aspect, the classifier can be pre-trained with training data that are constructed and annotated with the method described with Fig. 2 and Fig. 3.

Fig. 7 illustrates an exemplary computing system 700, in accordance with various aspects of the present disclosure. The computing system 700 may comprise at least one processor 710. The computing system 700 may further comprise at least one storage device 720. In an aspect, the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform a method for enabling a neural network system to discover prerequisite relations among concepts of a plurality of courses, wherein each of the plurality of courses contains a series of videos, the method comprising: collecting student behavior data, including at least video watching behavior data across the plurality of courses; modeling the student behavior data into prerequisite features based at least on video watching behavior pattern, wherein the video watching behavior pattern is formed by one or more video pairs from the video watching behavior data; and training the neural network system based at least on the prerequisite features.

It should be appreciated that the storage device 720 may store computer-executable instructions that, when executed, cause the processor 710 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.

The embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.

The embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.

It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.

It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.

Claims

A method for enabling a neural network system to discover prerequisite relations among concepts of a plurality of courses, wherein each of the plurality of courses contains a series of videos, the method comprising:

collecting student behavior data, including at least video watching behavior data across the plurality of courses;

modeling the student behavior data into prerequisite features based at least on video watching behavior pattern, wherein the video watching behavior pattern is formed by one or more video pairs from the video watching behavior data; and

training the neural network system based at least on the prerequisite features.
The method of claim 1, wherein the video watching behavior pattern comprises one or more of: sequential watching pattern, cross course watching pattern, skipping watching pattern and backward watching pattern.
The method of claim 1, wherein the neural network system is based partially on a graph-based model, which is constructed based on one or more concept graphs, each of the one or more concept graphs is a weighted directed graph, and

wherein each node of the weighted directed graph represents one concept of the set of concepts, and each edge indicates a prerequisite relation among two concepts.
The method of claim 3, wherein the graph-based model is constructed based on at least one concept graph, a weight of each edge of which is calculated based on the prerequisite features.
The method of claim 3, wherein a weight of each edge of one of the concept graphs is calculated based on the dependence of the plurality of courses.
The method of claim 3, wherein a weight of each edge of one of the concept graph is calculated based on the order of the series of videos of each course.
The method of claim 4, training the neural network system based at least on the prerequisite features further comprising:

initializing a word embedding of each concept of the set of concepts by a pre-trained language model; and

inputting the word embeddings of all the concepts of the set of concepts to the graph-based model constructed based on the at least one concept graph; and

learning a graph embedding of each node of the set of concepts on the graph-based model by Graph Convolutional Networks (GCNs) .
The method of claim 7, learning the graph embedding of each node of the set of concepts on the graph-based model by GCNs further comprising:

learning a node embedding of each node of the set of concepts on the one or more concept graphs respectively; and

combining the node embeddings corresponding to the one or more concept graphs as the graph embedding on the graph-based model.
The method of claim 8, combining the node embeddings corresponding to the one or more concept graphs, further comprising:

concatenating the node embeddings corresponding to the one or more concept graphs; or

weighted summing the node embeddings corresponding to the one or more concept graphs.
The method of claim 1, wherein the neural network system comprises at least one classifier.
The method of claim 10, wherein training the neural network system further comprising:

annotating the prerequisite relations among a portion of the concepts of set of concepts; and

training the at least one classifier with the annotated prerequisite relations among the portion of the concepts to discover prerequisite relations among all the concepts.
The method of claim 11, wherein annotating the prerequisite relations among the portion of the concepts of set of concepts further comprising:

clustering the set of concepts into several groups of concepts;

generating a set of candidate concept pairs for each of the groups of concepts;

sampling a portion of the set of candidate concept pairs to train more than one classifiers; and

annotating a concept pair as prerequisite if at least one trained classifier indicates the concept pair as prerequisite.
A computer system, comprising:

one or more processors; and

one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method of one of claims 1-12.
One or more computer readable storage media storing computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-12.
A computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-12.