WO2023010514A1 - Method for establishing knowledge repository for online courses - Google Patents

Method for establishing knowledge repository for online courses Download PDF

Info

Publication number
WO2023010514A1
WO2023010514A1 PCT/CN2021/111149 CN2021111149W WO2023010514A1 WO 2023010514 A1 WO2023010514 A1 WO 2023010514A1 CN 2021111149 W CN2021111149 W CN 2021111149W WO 2023010514 A1 WO2023010514 A1 WO 2023010514A1
Authority
WO
WIPO (PCT)
Prior art keywords
concepts
student
courses
videos
video
Prior art date
Application number
PCT/CN2021/111149
Other languages
French (fr)
Inventor
Evgeny Kharlamov
Jie Tang
Jifan YU
Juanzi LI
Lei HOU
Zhiyuan Liu
Maosong SUN
Yuquan WANG
Original Assignee
Robert Bosch Gmbh
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch Gmbh, Tsinghua University filed Critical Robert Bosch Gmbh
Priority to CN202180101405.2A priority Critical patent/CN118020080A/en
Priority to PCT/CN2021/111149 priority patent/WO2023010514A1/en
Publication of WO2023010514A1 publication Critical patent/WO2023010514A1/en

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers
    • G09B7/02Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student

Definitions

  • aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method for establishing a knowledge repository for online courses.
  • MOOCs Massive open online courses
  • NLP Neuro Linguistic Program
  • AI artificial intelligence
  • An object of the disclosure is to provide a method for establishing an online courses knowledge repository as well as a platform in order to provide an improved database for the development of various types of applications related to online courses.
  • the method comprises: obtaining a plurality of concepts from a plurality of courses, wherein the plurality of courses including videos and exercises; linking each of the videos and the exercises included in the courses to one or more related concepts of the plurality of concepts; and linking each of a plurality of student behaviors to one or more related concepts of the plurality of concepts; wherein the established knowledge repository comprises the plurality of courses including the videos and exercises, the plurality of student behaviors, the plurality of concepts, the links between the videos and the plurality of concepts, the links between the exercises and the plurality of concepts, the links between the student behaviors and the plurality of concepts.
  • a computer system for providing a knowledge repository for online courses.
  • the computer system comprises one or more processors and one or more storage devices storing the knowledge repository, wherein the knowledge repository comprises a plurality of courses including videos and exercises, a plurality of student behaviors, a plurality of concepts, links between the videos and the plurality of concepts, links between the exercises and the plurality of concepts, links between the student behaviors and the plurality of concepts.
  • a computer system which comprises one or more processors and one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
  • there provides one or more computer readable storage media storing computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
  • a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
  • Fig. 1 illustrates an exemplary structure of course material for online courses according to an embodiment of the disclosure.
  • Fig. 2 illustrates an exemplary structure of student material for online courses according to an embodiment of the disclosure.
  • Fig. 3 illustrates an exemplary data organization of knowledge repository for online courses according to an embodiment of the disclosure.
  • Fig. 4 illustrates an exemplary data organization of knowledge repository for online courses according to an embodiment of the disclosure.
  • Fig. 5 illustrates an exemplary process for establishing a knowledge repository for online courses according to an embodiment of the disclosure.
  • Fig. 6 illustrates an exemplary computing system according to an embodiment of the disclosure.
  • Fig. 1 illustrates an exemplary structure of course material for online courses according to an embodiment of the disclosure.
  • a course consists of multiple teaching units and a teaching unit is composed of a series of videos and exercises.
  • the teaching unit U may also be referred to as a chapter.
  • a course C1 includes teaching units U1, U2, U7 and so on, the teaching unit U1 includes videos V1, V2, V3 and exercise E1, the teaching unit U2 includes videos V4, V5 and exercise E2, the teaching unit U7 includes videos V22, V23, V24 and exercise E7.
  • the course C2 and CN as well as other courses C have similar structures as illustrated in Fig. 1.
  • the videos and exercises are considered as entities in the online course knowledge repository.
  • the course-related information such as the syllabus of the course, and entities including videos and exercises, may be saved as the list type.
  • a course C may also include teacher information T and school information S.
  • the course C1 includes teacher description T1 and school information S1 related to the course C1.
  • the course C2 and CN as well as other courses have similar teacher description and school description related to the courses as illustrated in Fig. 1.
  • the teacher description and school description of the courses are considered as attributes in the online course knowledge repository.
  • information of the teacher and school of the course may be crawled from the websites so as to supplement the teacher description and school description of the course. This kind of information can help build associations for courses and support related tasks such as teaching style detection.
  • the videos V may include subtitles or may be processed to obtain the subtitles, for example, the subtitle files of the videos V may be processed into JSON format and the timestamp of each sentence of the subtitle files may be recorded. For example, the start time and end time of a sentence may be found out based on the timestamps.
  • a video V as illustrated in Fig. 1 includes not only the video itself but also the related textual data such as the subtitles of the video and the timestamp of the sentences of the subtitle.
  • the exercises E may include the questions, options, standard answers and so on. It is appreciated that a teaching unit U may include one or more videos and one or more exercise, it is also possible a teaching unit U includes one or more videos without exercise.
  • discipline information may be annotated to the plurality of courses according to a discipline classification standard, such as the national discipline classification standard.
  • Fig. 2 illustrates an exemplary structure of student material for online courses according to an embodiment of the disclosure.
  • a student information such as student S1, S2, SM and so on includes student profiles P and student behavior B.
  • the student profile P includes each student’s gender, location, age and grade level.
  • the student behavior data B includes records of student behavior, which includes video watching behavior, exercising behavior and discussion behavior.
  • Video watching logs are collected as the video watching records.
  • video watching logs at the scale of seconds are recorded, for example, five-second video sections are recorded.
  • the system records what video the student is currently watching and what position the student is studying.
  • the video watching record V may also include information about students watching actions such as jumping video positions, watching speed, and so on.
  • the subtitle text, swiping, jumping actions corresponding to each video watching record may be included in the video watching record.
  • the video watching records may be used to infer specific learning trajectories of students.
  • the record of doing exercises is the basis for modeling the students’ mastery of knowledge. It is a measure of the outcome of the learning process. For each student, the timestamp of completing each question, the submitted answer, the score of each student may be recorded. Furthermore, for those questions that students submitted answers multiple times, a history of each submission may be recorded. In this way, problem-solving records may be obtained and recorded.
  • the exercising record E includes the timestamp of completing each exercise and each question, the answer submitted by the student, the score obtained for the exercise, and the revision history of the student.
  • Student’s online learning process is often accompanied by discussions and communication. These discussions not only reflect the social relationships among students but also serve as important feedback on the course design.
  • the discussion content may be recorded by using the comments and replies obtained from the online course platform.
  • Each comment or posting is attached to a certain video or exercises, started by a student and replied by more students. Therefore, for each comment, the text of it may be crawled and the comment ID may be attached to the corresponding video ID or exercise ID, as well as the student ID of the student who posts the comment.
  • the text information of the reply, the student ID of the student who issues the reply and the original comment ID may be recorded. In this way, comment-reply records may be obtained.
  • the discussion D includes posted comments and reply of the student as well as the above mentioned information.
  • the student may post a question or comment on a website such as the online course website or application, and users may reply to the posting of the student.
  • the student may also reply to the posting of other users.
  • the postings and replies records D of each student may be recorded as the discussion records.
  • the student data S1 includes the student profile P1 and student behavior B1.
  • the student profile P1 includes the gender, location, age and grade level of the student S1.
  • the student behavior data B1 includes behavior records of student S1.
  • the student behavior B1 includes an exemplary video watching record V9.
  • the content of the video watching record V9 includes a section of a video watched by the student S1 as well as the subtitle text, swiping and/or jumping actions corresponding to the video watching record V9. And the time when the video was watched by the student S1 may also be recorded.
  • the student behavior B1 includes an exemplary discussion record D221.
  • the content of the discussion record 221 includes a posted comment and/or reply of the student S1. And the time when the discussion record 221 occurs may also be recorded.
  • the student behavior B1 includes an exemplary exercising record E13.
  • the content of the exercising record E13 includes the answer submitted by the student, the score obtained for the exercise, and the revision history of the student. The time of completing the exercise by the student may also be recorded.
  • the exemplary student S2 and student SM may have similar structures as illustrated in Fig. 2. Besides the course resources, the various types of student behaviors are essential for adaptive learning research, which helps the modeling of student’s learning intents, cognitive levels and social activities.
  • Fig. 3 illustrates an exemplary data organization of knowledge repository for online courses according to an embodiment of the disclosure.
  • a concept extraction process may be used to obtain the concepts for the subtitles, for example, “AVL tree” , “balanced Binary Search Tree” , “balancing factor” , “node” and “tree” may be the extracted concepts.
  • the concepts are employed in the knowledge repository for online courses to better support retrieval, knowledge discovery, and summarization.
  • fine-grained concepts are extracted from the course material of available online course corpus.
  • the extracted concepts are denoted as the nodes such as the nodes A, B, C, D and so on in Fig. 3.
  • a prerequisite discovering process may be used to identify the prerequisite relation among the concepts. If a first concept can help understand a second concept, then there is a prerequisite relation from the first concept to second concept.
  • a concept graph G may be constructed by discovering prerequisite relation among the nodes representing the concepts.
  • the concept graph G may be represented as a directional graph.
  • the concepts are extracted from the subtitles of the videos V of the courses C.
  • a suitable concept extraction method may be used to extract the concepts from the subtitles of the videos.
  • a neural network (NN) model may be trained to extract the concepts from the entities of the courses.
  • a weakly supervised fine-grained concept extraction method is employed to obtain the concepts from the texts of video subtitles.
  • the process includes two stages: candidate extraction and concept ranking. Multiple concept extraction methods may be used in the candidate extraction stage in order to extract the concepts as many as possible. For example, phrase mining, entity linking, and named entity recognition, which are mainly supported by externa knowledge graphs and a small amount of annotation, may be used for extracting the candidate concepts.
  • phrase table is used in the Phrase Mining.
  • the noun-phrase titles of Chinese Wikipedia are selected as the phrase table. These phrases are crowdsourcing annotated high-quality entities.
  • phrase mining for each video, the phrases in the phrase table that appear in the subtitles of the video are preserved as candidate concepts.
  • Entity linking is used to discover the mentions of an external knowledge base.
  • the entity linking is performed with XLink (Jing Zhang, etc., XLink: An Unsupervised Bilingual Entity Linking System. Springer International Publishing, Cham, 172–183. ) and the linked entities are selected as candidate concepts.
  • the concepts of large-scale knowledge base Xlore (Hailong Jin, etc., 2018.
  • XLORE2 Large-scale cross-lingual knowledge graph construction and application. Data Intelligence 1, 1 (2018) , 77–98. ) may be extracted as candidate concepts for each video.
  • a pre-trained language models for fine-grained concept extraction may be used.
  • the training scheme can be regarded as Named Entity Recognition. If a phrase appearing in the video subtitle is a concept, then its span is annotated as a “named entity” .
  • concepts from a small subset of videos’s ubtitles of a small subset of random courses may be annotated.
  • a pre-trained language model such as RoBERTa (Yiming Cui, etc., 2019.
  • Pre-training with whole word masking for chinese bert. arXiv preprint arXiv: 1906.08101 (2019) may be further trained with the annotated data.
  • the trained extractor run prediction on all video subtitles and select phrases with confidence larger than a threshold as concept candidates.
  • Experimental results show that RoBERTa-NER method extracts more fine-grained concepts than phrase mining and entity linking.
  • a cluster-based unsupervised method is employed to determine the extraction quality of the candidate concepts.
  • each course’s concepts may be clustered into multiple clusters (for example, 15 clusters) by its BERT embedding with K-means, concepts in the one or more highest scored clusters (for example, top 2 highest scored clusters) are selected as finally extracted concepts of the course.
  • the score of cluster j ⁇ ⁇ 1. . 15 ⁇ is calculated by
  • s i and c j are the center of i-th seed cluster and j-th candidate cluster
  • d (x, y) is a cosine similarity function of BERT embedding of x and y
  • the labeled seed concepts of each discipline is clustered into 10 clusters.
  • candidate concepts may be extracted from subtitles of the videos included in the plurality of courses by using multiple concept extraction methods, and the precise concepts may be selected out of the candidate concepts based on a ranking of the candidate concepts.
  • the prerequisite relations among the concepts may be identified by using a suitable prerequisite discovery method.
  • a neural network (NN) model may be trained to discover the prerequisite relations among the concepts so as to construct the concept graph G, which is a directional concept graph.
  • a text-based method and a graph-based method are co-trained by using a small number of annotated concept pairs, and the trained model is used to identify the prerequisite relations among the concepts as illustrated by the concept pairs connected with the arrow lines in the graph G.
  • a text-based method and a graph-based method for prerequisite relation discovery may be produced.
  • a simple neural network classifier is applied to predict whether a concept pair has the prerequisite relation.
  • a concept consists of several tokens.
  • Neural models that use fixed word embeddings may be employed in an embodiment.
  • a text encoder of BERT Jacob Devlin, etc., 2019.
  • BERT Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) . 4171–4186. ) may be employed to obtain the embedding of a concept.
  • the output vector at the end position and a token such as the “ [CLS] ” token are taken as the embedding vector for LSTM and BERT, respectively. Then the two embeddings of the concepts of the pair are concatenated to predict the binary score of the concept pair.
  • graph encoders are employed to obtain concept embeddings first, for example, Graph Attention Networks (GAT) (Petar Veli ⁇ kovié, etc., 2017. Graph attention networks. arXiv preprint arXiv: 1710.10903 (2017) . ) are employed to obtain concept embeddings through their initial representations and the graph structure between the concepts. Then two embeddings of the concepts of the pair are concatenated to predict the binary score. For the initial representation of a concept, the embeddings of the concept from text encoders may be averaged as the initial representation. For the graph structure between concepts, the order of videos of each course is utilized. Since courses are taught in a cognitive order, it’s reasonable to assume that concepts in a video may have prerequisite relations with concepts in the following videos.
  • GAT Graph Attention Networks
  • the text model results and the graph model results may be ensembled to automatically generate candidate pairs, and the positive pairs may be manually found out.
  • the co-training method takes the previously labeled positive pairs and randomly sampled negative pairs as training data and ranks other pairs according to the predicted probability. Then experienced annotators label the top pairs with higher positive probability. Except for the initial iteration that annotators manually label a small number of seed positive pairs, this generation and labeling process repeats alternately multiple times to discover positive pairs so as to automatically find plenty of positive pairs.
  • a positive pair means there is a prerequisite relation between the pair of concepts.
  • each video V is linked to the concepts extracted from itself.
  • concepts are extracted from video subtitles, a video is naturally annotated with concepts extracted from the video’s subtitles.
  • other types of resources are still lack of linking to concepts.
  • its concepts are the union of its video concepts.
  • the concepts of the videos in the same teaching unit or chapter including the exercise are firstly selected as candidates and then their BERT embedding may be employed to select the top matched one or more concepts.
  • one or more concepts with a cosine similarity higher than a threshold may be selected as the one or more concepts to be linked to the exercise.
  • the threshold of cosine similarity is 0.8.
  • the concepts may be selected for the discussion in the same way as the exercise if the discussion is related to a teaching unit.
  • the video, exercise and discussion resources are linked to the concept graph, enriching the knowledge-based resource connections.
  • node A represents a concept of “gradient decent”
  • node B represents a concept of “backward propagation”
  • node C represents a concept of “convolutional neural network” .
  • the prerequisite relations among the concepts A, B and C indicate that it’s advisable to study the concepts in the order of A to B to C.
  • the video V12 in unit U9 of course C2 is about “gradient” , and this video V12 is linked to the concept A.
  • the exercise E7 in unit U7 of course C1 is about “optimization” , and this exercise E7 is linked to the concept B.
  • the video V8 in unit U7 of course C2 is about “CNN” , and this video V8 is linked to the concept C. It is appreciated that a lot of links may be established among the entities such as the videos V and exercises E of the courses and the concepts, and one entity may be linked to one or more concepts.
  • a discussion record D of the student behavior may be linked to one or more concepts.
  • the discussion such as posting and reply may be issued in a discussion area of a teaching unit, then candidate concepts may be obtained from the videos V of the teaching unit, then the discussion D may be linked to one or more most relevant concepts out of the candidate concepts.
  • the BERT encoder may be used to encode the text of the discussion D, the relevance between the discussion D and the concepts may be evaluated based on BERT embedding, and the one or more most relevant concepts may be identified from the concepts based on the evaluated relevance.
  • node C represents a concept of “convolutional neural network”
  • the discussion D341 in student S2 is about “CNN”
  • this discussion D341 is linked to the concept C.
  • Node D represents a concept of “neural network”
  • the discussion D221 in student S1 is about “NN”
  • this discussion D221 is linked to the concept D. It is appreciated that a lot of links may be established among the discussions of the student behavior records and the concepts, and one discussion may be linked to one or more concepts.
  • the links among the courses, the student behaviors to the concepts may be implemented in suitable ways.
  • the videos, the exercises and the discussions may be respectively added with the pointers of corresponding concepts, may be annotated with corresponding concepts, which all together constructed as a directional graph.
  • the concept-centric organization of the courses, the student behaviors and the concepts may help improve the development of the online course applications.
  • Fig. 4 illustrates an exemplary data organization of knowledge repository for online courses according to an embodiment of the disclosure.
  • the external resources ER for the concepts include books BK, blogs BL, pages PA, and question and answers QA related to the concepts.
  • each of the external resources ER may be linked to one or more concepts of the concept graph G.
  • the BERT encoder may be used to encode the text of a book into latency space, the relevance between the book and the concepts of graph G may be evaluated based on the BERT embedding, and the one or more most relevant concepts may be identified for the book based on the evaluated relevance.
  • the linking of the blog, paper and the QA to the concepts may be performed in similar way.
  • the book BK3 is linked to the concept C
  • the blog BL3 is linked to the concept E. It is appreciated that a lot of links may be established among the external data and the concepts, and one external data may be linked to one or more concepts.
  • the concept-centric organization of the courses, the student behaviors, the external resources and the concepts may help improve the development of the online course applications.
  • Table 1 illustrates an example of course resource in the online course knowledge repository after concept-based organization according to an embodiment of the disclosure.
  • the data structure of table 1 may be utilized for the course data C in Figs. 1-4.
  • each course is assigned a course ID, example of which is C_1729.
  • the name “Artificial Intelligence” and the field “CS” of the course are stored as text type.
  • the title of a video, as well as the name of the teaching unit including the video, may be saved as text type for the video.
  • Each video has a video ID which serves as the identifier of the video.
  • the chapter ID and the video ID is stored in the table.
  • the video may be retrieved based on the video ID.
  • the video text is stored in the table.
  • An exercise E may be assigned an exercise identifier, and a question may be assigned a question identifier.
  • the correspondence between exercise ID and problem ID may be preserved in the list of the course.
  • two exercises corresponding to the video are stored. For example, the exercise ID Ex_7552, question ID Qm_14512 and the question text are stored, where the question type “0” is also stored.
  • Table 2 illustrates an example of student behavior in the online course knowledge repository after concept-based organization according to an embodiment of the disclosure.
  • the data structure of table 2 may be utilized for the student behavior data B in the Figs. 1-4.
  • the example student behavior shows the video watching behavior, exercise behavior and comment and reply behavior of student U_112 in course “Artificial Intelligence” .
  • the student behavior record includes the user ID of the student, the course ID and the course name related to this behavior record.
  • the student behavior record also includes the video ID of the video watched by the student. For example, the video ID V_59697 and the comment issued while watching the video are recorded, the video ID V_59703 and the reply issued while watching the video are recorded.
  • the comment or reply behavior may be commonly referred to as discussion behavior.
  • the table 2 may also include video section ID of a video section such as the above mentioned 5-second video section.
  • the video section ID may be recorded associated with the video ID shown in table 2, or may replace the video ID shown in table 2. Additional information mentioned in the disclosure such as the jumping operation, video playing speed, subtitles of the video section may be recorded associated with the video ID.
  • the student behavior record also includes the exercise ID, the question ID and the answer of the student.
  • the exercise ID Ex_7552, the question ID Qm_14512 and the answer “A” are recorded in association.
  • additional information such as the revision history, score and son on may also be recorded as the exercising behavior.
  • the concepts associated with a discussion may also be preserved. For example, the concepts contained in the comment text and reply text may be annotated.
  • Fig. 5 illustrates an exemplary process for establishing a knowledge repository for online courses according to an embodiment of the disclosure.
  • a plurality of concepts are obtained from a plurality of courses.
  • the plurality of courses may be available online courses and may include videos and exercises.
  • each of the videos and the exercises included in the courses is linked to one or more related concepts of the plurality of concepts.
  • each of a plurality of student behaviors is linked to one or more related concepts of the plurality of concepts.
  • the knowledge repository is established by organizing the courses, the student behavior and the concepts on the basis of the concepts.
  • the established knowledge repository comprises the plurality of courses including the videos and exercises, the plurality of student behaviors, the plurality of concepts, the links between the videos and the plurality of concepts, the links between the exercises and the plurality of concepts, the links between the student behaviors and the plurality of concepts.
  • external knowledge data for each of the plurality of concepts may be obtained from websites, and each of the external knowledge data may be linked to one or more related concepts of the plurality of concepts.
  • the established knowledge repository further comprises the external knowledge data and the links between the external knowledge data and the plurality of concepts.
  • the external knowledge data for each of the plurality of concepts comprise papers, blogs, technical Question and Answers and books.
  • prerequisite relations among the plurality of concepts may be obtained.
  • the established knowledge repository further comprises the prerequisite relations among the plurality of concepts.
  • prerequisite relations among the plurality of concepts may be obtained by using a text-based classifier and a graph-based classifier.
  • the established knowledge repository comprises student information, wherein the student information comprises student profiles of a plurality of students and the student behaviors of the plurality of students.
  • the student profile of a student comprises gender, location, age and grade level of the student.
  • the student behavior of a student comprises exercising records, discussion records and video watching records of the student.
  • the exercising records of the student comprise a timestamp of completing an exercise, answers submitted for the exercise, score obtained for the exercise, and revision history of the student.
  • the discussion records of the student comprise postings and replies of the student.
  • the video watching records of the student comprise video clips, subtitle text of the video clips, the student’s swiping and/or jumping actions to each of the video clips.
  • the video clips may have a length on the scale of seconds.
  • each of the plurality of courses comprises one or more teaching units, each of the one or more teaching units comprises one or more videos and one or more exercises. In an embodiment, each of the plurality of courses comprises teacher information and school information.
  • the concepts may be extracted from subtitles of the videos included in the plurality of courses.
  • candidate concepts may be extracted from subtitles of the videos included in the plurality of courses by using multiple concept extraction methods, and the concepts may be selected out of the candidate concepts based on a ranking of the candidate concepts.
  • the video may be linked to the one or more related concepts extracted from the video. Relevance between the exercise and concepts extracted from videos included in the teaching unit containing the exercise may be estimated, and the exercise may be linked to the one or more related concepts out of the extracted concepts based on the estimated relevance.
  • step 530 relevance between the discussion record of the student behavior and at least a part of the plurality of concepts may be estimated, and the discussion record may be linked to the one or more related concepts out of the at least a part of concepts based on the estimated relevance.
  • step 530 relevance between the discussion record of the student behavior and concepts extracted from videos included in the teaching unit related to the discussion record may be estimated, and the discussion record may be linked to the one or more related concepts out of the extracted concepts based on the estimated relevance.
  • discipline information may be annotated to the plurality of courses according to a discipline classification standard.
  • Fig. 6 illustrates an exemplary computing system according to an embodiment of the present disclosure.
  • the computing system 60 may comprise at least one processor 610.
  • the computing system 60 may further comprise at least one storage device 620.
  • the storage device 620 may store computer-executable instructions that, when executed, cause the processor 610 to obtain a plurality of concepts from a plurality of courses, wherein the plurality of courses including videos and exercises; link each of the videos and the exercises included in the courses to one or more related concepts of the plurality of concepts; and link each of a plurality of student behaviors to one or more related concepts of the plurality of concepts, wherein the established knowledge repository comprises the plurality of courses including the videos and exercises, the plurality of student behaviors, the plurality of concepts, the links between the videos and the plurality of concepts, the links between the exercises and the plurality of concepts, the links between the student behaviors and the plurality of concepts.
  • the storage device 620 may store computer-executable instructions that, when executed, cause the processor 610 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-5.
  • the exemplary computing system 60 may be implemented as platform for providing a knowledge repository for online courses.
  • the one or more storage devices 620 of the platform may store the knowledge repository, wherein the knowledge repository comprises a plurality of courses including videos and exercises, a plurality of student behaviors, a plurality of concepts, a plurality of prerequisite relations among the plurality of concepts, links between the videos and the plurality of concepts, links between the exercises and the plurality of concepts, links between the student behaviors and the plurality of concepts.
  • the storage devices 620 may store the knowledge repository according to various embodiments as described in connection with Figs. 1-5.
  • the processor 610 in response to a data request, the processor 610 may retrieve data from the knowledge repository stored in the storage device 620.
  • the exemplary computing system 60 may be implemented as application develop platform.
  • the processor 610 may be used to execute the instructions of an AI application, and the one or more storage devices 620 may store the knowledge repository according to various embodiments as described in connection with Figs. 1-5.
  • the processor 610 may execute the instructions of an AI application based on the knowledge repository stored in the storage devices 620, for example, models of the AI application may be trained based on the knowledge repository stored in the storage device 620.
  • the embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium.
  • the non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
  • the embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
  • modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method for establishing a knowledge repository for online courses. The method comprises: obtaining a plurality of concepts from a plurality of courses, wherein the plurality of courses including videos and exercises; linking each of the videos and the exercises included in the courses to one or more related concepts of the plurality of concepts; and linking each of a plurality of student behaviors to one or more related concepts of the plurality of concepts, wherein the established knowledge repository comprises the plurality of courses including the videos and exercises, the plurality of student behaviors, the plurality of concepts, the links between the videos and the plurality of concepts, the links between the exercises and the plurality of concepts, the links between the student behaviors and the plurality of concepts.

Description

[Title established by the ISA under Rule 37.2] METHOD FOR ESTABLISHING KNOWLEDGE REPOSITORY FOR ONLINE COURSES FIELD
Aspects of the present disclosure relate generally to artificial intelligence, and more particularly, to a method for establishing a knowledge repository for online courses.
BACKGROUND
Massive open online courses (MOOCs) have provided convenient education for users worldwide. As a multi-media, large-scale online interactive system, MOOC is an excellent platform for advanced application research. Since MOOC is committed to help students learn implicit knowledge concepts from diverse courses, many efforts from Neuro Linguistic Program (NLP) and artificial intelligence (AI) raise topics to build novel applications for assistance. From extracting course concepts and their prerequisite relations to analyzing student behaviors, MOOC-related topics, tasks, and methods are developed in recent years.
Despite the research and develop interests, the resource from MOOCs is impoverished. Although a few efforts have been devoted to constructing MOOC datasets, it is still insufficient to meet the need for various course understanding and learning analytic tasks.
SUMMARY
An object of the disclosure is to provide a method for establishing an online courses knowledge repository as well as a platform in order to provide an improved database for the development of various types of applications related to online courses.
According to an aspect of the disclosure, there provides a method for establishing a knowledge repository for online courses. The method comprises: obtaining a plurality of concepts from a plurality of courses, wherein the plurality of courses including videos and exercises; linking each of the videos and the exercises included in the courses to one or more related concepts of the plurality of concepts; and linking each of a plurality of student behaviors to one or more related concepts of the plurality of concepts; wherein the established knowledge repository comprises the plurality of courses including the videos and exercises, the plurality of student behaviors, the plurality of concepts, the links between the videos and the plurality of concepts, the links between the exercises and the plurality of concepts, the links between the student behaviors and the plurality of concepts.
According to an aspect of the disclosure, there provides a computer system for providing a knowledge repository for online courses. The computer system comprises  one or more processors and one or more storage devices storing the knowledge repository, wherein the knowledge repository comprises a plurality of courses including videos and exercises, a plurality of student behaviors, a plurality of concepts, links between the videos and the plurality of concepts, links between the exercises and the plurality of concepts, links between the student behaviors and the plurality of concepts.
According to an aspect of the disclosure, there provides a computer system, which comprises one or more processors and one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
According to an aspect of the disclosure, there provides one or more computer readable storage media storing computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
According to an aspect of the disclosure, there provides a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method as mentioned above as well as to perform the operations of the method according to aspects of the disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
The disclosed aspects will be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.
Fig. 1 illustrates an exemplary structure of course material for online courses according to an embodiment of the disclosure.
Fig. 2 illustrates an exemplary structure of student material for online courses according to an embodiment of the disclosure.
Fig. 3 illustrates an exemplary data organization of knowledge repository for online courses according to an embodiment of the disclosure.
Fig. 4 illustrates an exemplary data organization of knowledge repository for online courses according to an embodiment of the disclosure.
Fig. 5 illustrates an exemplary process for establishing a knowledge repository for online courses according to an embodiment of the disclosure.
Fig. 6 illustrates an exemplary computing system according to an embodiment of the disclosure.
DETAILED DESCRIPTION
The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.
Various embodiments will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and embodiments are for illustrative purposes, and are not intended to limit the scope of the disclosure.
Fig. 1 illustrates an exemplary structure of course material for online courses according to an embodiment of the disclosure.
As shown in Fig. 1, a course consists of multiple teaching units and a teaching unit is composed of a series of videos and exercises. The teaching unit U may also be referred to as a chapter. For example, a course C1 includes teaching units U1, U2, U7 and so on, the teaching unit U1 includes videos V1, V2, V3 and exercise E1, the teaching unit U2 includes videos V4, V5 and exercise E2, the teaching unit U7 includes videos V22, V23, V24 and exercise E7. The course C2 and CN as well as other courses C have similar structures as illustrated in Fig. 1. The videos and exercises are considered as entities in the online course knowledge repository. In an embodiment, in order to preserve such structured information of the courses, the course-related information, such as the syllabus of the course, and entities including videos and exercises, may be saved as the list type.
As shown in Fig. 1, a course C may also include teacher information T and school information S. For example, the course C1 includes teacher description T1 and school information S1 related to the course C1. The course C2 and CN as well as other courses have similar teacher description and school description related to the courses as illustrated in Fig. 1. The teacher description and school description of the courses are considered as attributes in the online course knowledge repository. In an embodiment, information of the teacher and school of the course may be crawled from the websites so as to supplement the teacher description and school description of the course. This kind of information can help build associations for courses and support related tasks such as teaching style detection.
The videos V may include subtitles or may be processed to obtain the subtitles, for example, the subtitle files of the videos V may be processed into JSON format and the timestamp of each sentence of the subtitle files may be recorded. For example, the start time and end time of a sentence may be found out based on the timestamps. In an embodiment, a video V as illustrated in Fig. 1 includes not only the video itself but  also the related textual data such as the subtitles of the video and the timestamp of the sentences of the subtitle.
The exercises E may include the questions, options, standard answers and so on. It is appreciated that a teaching unit U may include one or more videos and one or more exercise, it is also possible a teaching unit U includes one or more videos without exercise.
In order to improve the courses data and for facilitation of the organization of the courses data, in an embodiment, discipline information may be annotated to the plurality of courses according to a discipline classification standard, such as the national discipline classification standard.
Although the exemplary videos in respective courses are indicated with similar labels in Fig. 1, for example, V1 in course C1 and V1 in course C2, it is appreciated that the different videos in respective courses have their unique identifiers, the labels shown in the Fig. 1 are just for sake of illustration.
Fig. 2 illustrates an exemplary structure of student material for online courses according to an embodiment of the disclosure.
As shown in Fig. 2, a student information such as student S1, S2, SM and so on includes student profiles P and student behavior B. The student profile P includes each student’s gender, location, age and grade level. The student behavior data B includes records of student behavior, which includes video watching behavior, exercising behavior and discussion behavior.
When a student or user studies online courses, the main activity is watching the videos in the courses. Video watching logs are collected as the video watching records. In an embodiment, video watching logs at the scale of seconds are recorded, for example, five-second video sections are recorded. For example, the system records what video the student is currently watching and what position the student is studying. The video watching record V may also include information about students watching actions such as jumping video positions, watching speed, and so on. The subtitle text, swiping, jumping actions corresponding to each video watching record may be included in the video watching record. The video watching records may be used to infer specific learning trajectories of students.
The record of doing exercises is the basis for modeling the students’ mastery of knowledge. It is a measure of the outcome of the learning process. For each student, the timestamp of completing each question, the submitted answer, the score of each student may be recorded. Furthermore, for those questions that students submitted answers multiple times, a history of each submission may be recorded. In this way, problem-solving records may be obtained and recorded. The exercising record E includes the timestamp of completing each exercise and each question, the answer  submitted by the student, the score obtained for the exercise, and the revision history of the student.
Student’s online learning process is often accompanied by discussions and communication. These discussions not only reflect the social relationships among students but also serve as important feedback on the course design. In an embodiment, the discussion content may be recorded by using the comments and replies obtained from the online course platform. Each comment or posting is attached to a certain video or exercises, started by a student and replied by more students. Therefore, for each comment, the text of it may be crawled and the comment ID may be attached to the corresponding video ID or exercise ID, as well as the student ID of the student who posts the comment. For each reply, the text information of the reply, the student ID of the student who issues the reply and the original comment ID may be recorded. In this way, comment-reply records may be obtained. The discussion D includes posted comments and reply of the student as well as the above mentioned information. For example, the student may post a question or comment on a website such as the online course website or application, and users may reply to the posting of the student. The student may also reply to the posting of other users. The postings and replies records D of each student may be recorded as the discussion records.
For example, the student data S1 includes the student profile P1 and student behavior B1.The student profile P1 includes the gender, location, age and grade level of the student S1. The student behavior data B1 includes behavior records of student S1. In the example as shown in Fig. 2, the student behavior B1 includes an exemplary video watching record V9. The content of the video watching record V9 includes a section of a video watched by the student S1 as well as the subtitle text, swiping and/or jumping actions corresponding to the video watching record V9. And the time when the video was watched by the student S1 may also be recorded.
The student behavior B1 includes an exemplary discussion record D221. The content of the discussion record 221 includes a posted comment and/or reply of the student S1. And the time when the discussion record 221 occurs may also be recorded.
The student behavior B1 includes an exemplary exercising record E13. The content of the exercising record E13 includes the answer submitted by the student, the score obtained for the exercise, and the revision history of the student. The time of completing the exercise by the student may also be recorded.
The exemplary student S2 and student SM may have similar structures as illustrated in Fig. 2. Besides the course resources, the various types of student behaviors are essential for adaptive learning research, which helps the modeling of student’s learning intents, cognitive levels and social activities.
Fig. 3 illustrates an exemplary data organization of knowledge repository for online courses according to an embodiment of the disclosure.
Concepts refer to the knowledge concepts taught in the courses such as courses C1, C2 and so on. Taking a course named “Data Structure and Algorithms” as an example, a part of the subtitles of the course is “At the end of the previous lesson, we had introduced the AVL tree. This is a typical moderately balanced binary search tree. You will recall that we need to define and introduce a metric called the balancing factor for each node and require that all nodes in the tree have a balancing factor between -1 and +1 …” . In this example, a concept extraction process may be used to obtain the concepts for the subtitles, for example, “AVL tree” , “balanced Binary Search Tree” , “balancing factor” , “node” and “tree” may be the extracted concepts.
The concepts are employed in the knowledge repository for online courses to better support retrieval, knowledge discovery, and summarization. According to an embodiment, in order to establish the knowledge repository for online courses, fine-grained concepts are extracted from the course material of available online course corpus. The extracted concepts are denoted as the nodes such as the nodes A, B, C, D and so on in Fig. 3.
A prerequisite discovering process may be used to identify the prerequisite relation among the concepts. If a first concept can help understand a second concept, then there is a prerequisite relation from the first concept to second concept. As illustrated in Fig. 3, a concept graph G may be constructed by discovering prerequisite relation among the nodes representing the concepts. The concept graph G may be represented as a directional graph.
In an embodiment, the concepts are extracted from the subtitles of the videos V of the courses C. A suitable concept extraction method may be used to extract the concepts from the subtitles of the videos. In an embodiment, a neural network (NN) model may be trained to extract the concepts from the entities of the courses.
In an embodiment, in order to accomplish high-quality concept acquisition from the large-scale MOOC resources, a weakly supervised fine-grained concept extraction method is employed to obtain the concepts from the texts of video subtitles. In order to obtain high-quality concepts from texts, the process includes two stages: candidate extraction and concept ranking. Multiple concept extraction methods may be used in the candidate extraction stage in order to extract the concepts as many as possible. For example, phrase mining, entity linking, and named entity recognition, which are mainly supported by externa knowledge graphs and a small amount of annotation, may be used for extracting the candidate concepts.
A phrase table is used in the Phrase Mining. In an example, the noun-phrase titles of Chinese Wikipedia are selected as the phrase table. These phrases are crowdsourcing annotated high-quality entities. In the phrase mining, for each video, the phrases in the phrase table that appear in the subtitles of the video are preserved as candidate concepts.
Entity linking is used to discover the mentions of an external knowledge base. In an example, for each video, the entity linking is performed with XLink (Jing Zhang, etc., XLink: An Unsupervised Bilingual Entity Linking System. Springer International Publishing, Cham, 172–183. ) and the linked entities are selected as candidate concepts. The concepts of large-scale knowledge base Xlore (Hailong Jin, etc., 2018. XLORE2: Large-scale cross-lingual knowledge graph construction and application. Data Intelligence 1, 1 (2018) , 77–98. ) may be extracted as candidate concepts for each video.
A pre-trained language models for fine-grained concept extraction may be used. The training scheme can be regarded as Named Entity Recognition. If a phrase appearing in the video subtitle is a concept, then its span is annotated as a “named entity” . In an example, for each discipline, concepts from a small subset of videos’s ubtitles of a small subset of random courses may be annotated. And a pre-trained language model such as RoBERTa (Yiming Cui, etc., 2019. Pre-training with whole word masking for chinese bert. arXiv preprint arXiv: 1906.08101 (2019) ) may be further trained with the annotated data. The trained extractor run prediction on all video subtitles and select phrases with confidence larger than a threshold as concept candidates. Experimental results show that RoBERTa-NER method extracts more fine-grained concepts than phrase mining and entity linking.
As for the concept ranking stage, in an embodiment, a cluster-based unsupervised method is employed to determine the extraction quality of the candidate concepts. In order to improve the precision of extracted concepts, each course’s concepts may be clustered into multiple clusters (for example, 15 clusters) by its BERT embedding with K-means, concepts in the one or more highest scored clusters (for example, top 2 highest scored clusters) are selected as finally extracted concepts of the course. Taking the 15 clusters as an example, the score of cluster j ∈ {1. . 15} is calculated by
Figure PCTCN2021111149-appb-000001
where s i and c j are the center of i-th seed cluster and j-th candidate cluster, d (x, y) is a cosine similarity function of BERT embedding of x and y, and the labeled seed concepts of each discipline is clustered into 10 clusters.
By using the two-stage concept extraction process, candidate concepts may be extracted from subtitles of the videos included in the plurality of courses by using multiple concept extraction methods, and the precise concepts may be selected out of the candidate concepts based on a ranking of the candidate concepts.
In an embodiment, the prerequisite relations among the concepts may be identified by using a suitable prerequisite discovery method. In an embodiment, a neural network (NN) model may be trained to discover the prerequisite relations among the concepts so as to construct the concept graph G, which is a directional concept graph. In an  embodiment, a text-based method and a graph-based method are co-trained by using a small number of annotated concept pairs, and the trained model is used to identify the prerequisite relations among the concepts as illustrated by the concept pairs connected with the arrow lines in the graph G.
A text-based method and a graph-based method for prerequisite relation discovery may be produced. For the text-based method, a simple neural network classifier is applied to predict whether a concept pair has the prerequisite relation. In an embodiment, a concept consists of several tokens. Neural models that use fixed word embeddings may be employed in an embodiment. In another embodiment, a text encoder of BERT (Jacob Devlin, etc., 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) . 4171–4186. ) may be employed to obtain the embedding of a concept. The output vector at the end position and a token such as the “ [CLS] ” token are taken as the embedding vector for LSTM and BERT, respectively. Then the two embeddings of the concepts of the pair are concatenated to predict the binary score of the concept pair.
In an embodiment, for the graph-based method, similar to text-based methods, graph encoders are employed to obtain concept embeddings first, for example, Graph Attention Networks (GAT) (Petar Veliěkovié, etc., 2017. Graph attention networks. arXiv preprint arXiv: 1710.10903 (2017) . ) are employed to obtain concept embeddings through their initial representations and the graph structure between the concepts. Then two embeddings of the concepts of the pair are concatenated to predict the binary score. For the initial representation of a concept, the embeddings of the concept from text encoders may be averaged as the initial representation. For the graph structure between concepts, the order of videos of each course is utilized. Since courses are taught in a cognitive order, it’s reasonable to assume that concepts in a video may have prerequisite relations with concepts in the following videos.
In order to reduce the label costs, the text model results and the graph model results may be ensembled to automatically generate candidate pairs, and the positive pairs may be manually found out. Specifically, the co-training method takes the previously labeled positive pairs and randomly sampled negative pairs as training data and ranks other pairs according to the predicted probability. Then experienced annotators label the top pairs with higher positive probability. Except for the initial iteration that annotators manually label a small number of seed positive pairs, this generation and labeling process repeats alternately multiple times to discover positive pairs so as to automatically find plenty of positive pairs. A positive pair means there is a prerequisite relation between the pair of concepts.
As illustrated in Fig. 3, the entities of the courses and the student behavior records are linked to the concepts. By linking various resources to the concept graph, their knowledge-level connections can be can enriched. In an embodiment, each video V is linked to the concepts extracted from itself. As the concepts are extracted from video subtitles, a video is naturally annotated with concepts extracted from the video’s subtitles. However, other types of resources are still lack of linking to concepts. For each course, its concepts are the union of its video concepts. For each exercise, the concepts of the videos in the same teaching unit or chapter including the exercise are firstly selected as candidates and then their BERT embedding may be employed to select the top matched one or more concepts. For example, one or more concepts with a cosine similarity higher than a threshold may be selected as the one or more concepts to be linked to the exercise. For example, the threshold of cosine similarity is 0.8. For each discussion, i.e. comment or reply, the concepts may be selected for the discussion in the same way as the exercise if the discussion is related to a teaching unit. As a result, the video, exercise and discussion resources are linked to the concept graph, enriching the knowledge-based resource connections.
In the example as illustrated in Fig. 3, node A represents a concept of “gradient decent” , node B represents a concept of “backward propagation” , node C represents a concept of “convolutional neural network” . The prerequisite relations among the concepts A, B and C indicate that it’s advisable to study the concepts in the order of A to B to C. The video V12 in unit U9 of course C2 is about “gradient” , and this video V12 is linked to the concept A. The exercise E7 in unit U7 of course C1 is about “optimization” , and this exercise E7 is linked to the concept B. The video V8 in unit U7 of course C2 is about “CNN” , and this video V8 is linked to the concept C. It is appreciated that a lot of links may be established among the entities such as the videos V and exercises E of the courses and the concepts, and one entity may be linked to one or more concepts.
In an embodiment, a discussion record D of the student behavior may be linked to one or more concepts. In an example, the discussion such as posting and reply may be issued in a discussion area of a teaching unit, then candidate concepts may be obtained from the videos V of the teaching unit, then the discussion D may be linked to one or more most relevant concepts out of the candidate concepts. In another example, the BERT encoder may be used to encode the text of the discussion D, the relevance between the discussion D and the concepts may be evaluated based on BERT embedding, and the one or more most relevant concepts may be identified from the concepts based on the evaluated relevance.
In the example as illustrated in Fig. 3, node C represents a concept of “convolutional neural network” , the discussion D341 in student S2 is about “CNN” , and this discussion D341 is linked to the concept C. Node D represents a concept of “neural  network” , the discussion D221 in student S1 is about “NN” , and this discussion D221 is linked to the concept D. It is appreciated that a lot of links may be established among the discussions of the student behavior records and the concepts, and one discussion may be linked to one or more concepts.
The links among the courses, the student behaviors to the concepts may be implemented in suitable ways. For example, the videos, the exercises and the discussions may be respectively added with the pointers of corresponding concepts, may be annotated with corresponding concepts, which all together constructed as a directional graph. The concept-centric organization of the courses, the student behaviors and the concepts may help improve the development of the online course applications.
Fig. 4 illustrates an exemplary data organization of knowledge repository for online courses according to an embodiment of the disclosure.
In order to establish the online course knowledge repository, after concept extraction, external data may be obtained for the extracted concepts. In an example, a plurality of papers may be crawled for each concept from the websites, and an abstract for the concept may be crawled from websites such as from Wikipedia. Moreover, the Blogs, Technical QA, and Books related to each concept may be crawled from search engines.
As illustrated in Fig. 4, the external resources ER for the concepts include books BK, blogs BL, pages PA, and question and answers QA related to the concepts. In an embodiment, each of the external resources ER may be linked to one or more concepts of the concept graph G. In an example, the BERT encoder may be used to encode the text of a book into latency space, the relevance between the book and the concepts of graph G may be evaluated based on the BERT embedding, and the one or more most relevant concepts may be identified for the book based on the evaluated relevance. The linking of the blog, paper and the QA to the concepts may be performed in similar way. As illustrated in Fig. 4, the book BK3 is linked to the concept C, and the blog BL3 is linked to the concept E. It is appreciated that a lot of links may be established among the external data and the concepts, and one external data may be linked to one or more concepts. The concept-centric organization of the courses, the student behaviors, the external resources and the concepts may help improve the development of the online course applications.
Table 1
Figure PCTCN2021111149-appb-000002
Table 1 illustrates an example of course resource in the online course knowledge repository after concept-based organization according to an embodiment of the disclosure. The data structure of table 1 may be utilized for the course data C in Figs. 1-4.
As illustrated in table 1, each course is assigned a course ID, example of which is C_1729. The name “Artificial Intelligence” and the field “CS” of the course are stored as text type.
The title of a video, as well as the name of the teaching unit including the video, may be saved as text type for the video. Each video has a video ID which serves as the identifier of the video. In the example, the chapter ID and the video ID is stored in the table. The video may be retrieved based on the video ID. The video text is stored in the table.
An exercise E may be assigned an exercise identifier, and a question may be assigned a question identifier. The correspondence between exercise ID and problem ID may be preserved in the list of the course. There may be information about question types. For example, there may be three question types: single choice, multiple choice, or subjective problems, which are identified with 0, 1, and 2. As shown in the course example shown in table 1, two exercises corresponding to the video are stored. For example, the exercise ID Ex_7552, question ID Qm_14512 and the question text are stored, where the question type “0” is also stored.
Although not shown in the table 1, other information mentioned in the disclosure such as the teach and school description of the course, the description of the chapter, and so on may also be stored as text type in the table. Moreover, the concepts associated with a course, a video, an exercise may also be preserved. For example, the concepts contained in the course name, the video text and the problem text may be annotated.
Table 2
Figure PCTCN2021111149-appb-000003
Table 2 illustrates an example of student behavior in the online course knowledge repository after concept-based organization according to an embodiment of the disclosure. The data structure of table 2 may be utilized for the student behavior data B in the Figs. 1-4.
The example student behavior shows the video watching behavior, exercise behavior and comment and reply behavior of student U_112 in course “Artificial Intelligence” . As illustrated in table 2, the student behavior record includes the user ID of the student, the course ID and the course name related to this behavior record. The student behavior record also includes the video ID of the video watched by the student. For example, the video ID V_59697 and the comment issued while watching the video are recorded, the video ID V_59703 and the reply issued while watching the video are recorded. The comment or reply behavior may be commonly referred to as discussion behavior.
Although not shown in the table 2, the table 2 may also include video section ID of a video section such as the above mentioned 5-second video section. The video section ID may be recorded associated with the video ID shown in table 2, or may replace the video ID shown in table 2. Additional information mentioned in the disclosure such as the jumping operation, video playing speed, subtitles of the video section may be recorded associated with the video ID.
The student behavior record also includes the exercise ID, the question ID and the answer of the student. For example, the exercise ID Ex_7552, the question ID Qm_14512 and the answer “A” are recorded in association. Although not shown in the table 2, additional information such as the revision history, score and son on may also be recorded as the exercising behavior. Moreover, the concepts associated with a discussion may also be preserved. For example, the concepts contained in the comment text and reply text may be annotated.
Fig. 5 illustrates an exemplary process for establishing a knowledge repository for online courses according to an embodiment of the disclosure.
At step 510, a plurality of concepts are obtained from a plurality of courses. The plurality of courses may be available online courses and may include videos and exercises.
At step 520, each of the videos and the exercises included in the courses is linked to one or more related concepts of the plurality of concepts.
At step 530, each of a plurality of student behaviors is linked to one or more related concepts of the plurality of concepts.
The knowledge repository is established by organizing the courses, the student behavior and the concepts on the basis of the concepts. The established knowledge repository comprises the plurality of courses including the videos and exercises, the plurality of student behaviors, the plurality of concepts, the links between the videos and the plurality of concepts, the links between the exercises and the plurality of concepts, the links between the student behaviors and the plurality of concepts.
In an embodiment, external knowledge data for each of the plurality of concepts may be obtained from websites, and each of the external knowledge data may be linked to  one or more related concepts of the plurality of concepts. The established knowledge repository further comprises the external knowledge data and the links between the external knowledge data and the plurality of concepts. In an embodiment, the external knowledge data for each of the plurality of concepts comprise papers, blogs, technical Question and Answers and books.
In an embodiment, prerequisite relations among the plurality of concepts may be obtained. The established knowledge repository further comprises the prerequisite relations among the plurality of concepts. In an embodiment, prerequisite relations among the plurality of concepts may be obtained by using a text-based classifier and a graph-based classifier.
In an embodiment, the established knowledge repository comprises student information, wherein the student information comprises student profiles of a plurality of students and the student behaviors of the plurality of students. In an embodiment, the student profile of a student comprises gender, location, age and grade level of the student. In an embodiment, the student behavior of a student comprises exercising records, discussion records and video watching records of the student. In an embodiment, the exercising records of the student comprise a timestamp of completing an exercise, answers submitted for the exercise, score obtained for the exercise, and revision history of the student. The discussion records of the student comprise postings and replies of the student. The video watching records of the student comprise video clips, subtitle text of the video clips, the student’s swiping and/or jumping actions to each of the video clips. The video clips may have a length on the scale of seconds.
In an embodiment, each of the plurality of courses comprises one or more teaching units, each of the one or more teaching units comprises one or more videos and one or more exercises. In an embodiment, each of the plurality of courses comprises teacher information and school information.
In an embodiment, at step 510, the concepts may be extracted from subtitles of the videos included in the plurality of courses. In an embodiment, at step 510, candidate concepts may be extracted from subtitles of the videos included in the plurality of courses by using multiple concept extraction methods, and the concepts may be selected out of the candidate concepts based on a ranking of the candidate concepts.
In an embodiment, at step 520, the video may be linked to the one or more related concepts extracted from the video. Relevance between the exercise and concepts extracted from videos included in the teaching unit containing the exercise may be estimated, and the exercise may be linked to the one or more related concepts out of the extracted concepts based on the estimated relevance.
In an embodiment, at step 530, relevance between the discussion record of the student behavior and at least a part of the plurality of concepts may be estimated, and the  discussion record may be linked to the one or more related concepts out of the at least a part of concepts based on the estimated relevance. In an embodiment, at step 530, relevance between the discussion record of the student behavior and concepts extracted from videos included in the teaching unit related to the discussion record may be estimated, and the discussion record may be linked to the one or more related concepts out of the extracted concepts based on the estimated relevance.
In an embodiment, discipline information may be annotated to the plurality of courses according to a discipline classification standard.
Fig. 6 illustrates an exemplary computing system according to an embodiment of the present disclosure. The computing system 60 may comprise at least one processor 610. The computing system 60 may further comprise at least one storage device 620. In an aspect, the storage device 620 may store computer-executable instructions that, when executed, cause the processor 610 to obtain a plurality of concepts from a plurality of courses, wherein the plurality of courses including videos and exercises; link each of the videos and the exercises included in the courses to one or more related concepts of the plurality of concepts; and link each of a plurality of student behaviors to one or more related concepts of the plurality of concepts, wherein the established knowledge repository comprises the plurality of courses including the videos and exercises, the plurality of student behaviors, the plurality of concepts, the links between the videos and the plurality of concepts, the links between the exercises and the plurality of concepts, the links between the student behaviors and the plurality of concepts.
It should be appreciated that the storage device 620 may store computer-executable instructions that, when executed, cause the processor 610 to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-5.
The exemplary computing system 60 may be implemented as platform for providing a knowledge repository for online courses. The one or more storage devices 620 of the platform may store the knowledge repository, wherein the knowledge repository comprises a plurality of courses including videos and exercises, a plurality of student behaviors, a plurality of concepts, a plurality of prerequisite relations among the plurality of concepts, links between the videos and the plurality of concepts, links between the exercises and the plurality of concepts, links between the student behaviors and the plurality of concepts. It is appreciated that the storage devices 620 may store the knowledge repository according to various embodiments as described in connection with Figs. 1-5. In an embodiment, in response to a data request, the processor 610 may retrieve data from the knowledge repository stored in the storage device 620.
The exemplary computing system 60 may be implemented as application develop platform. The processor 610 may be used to execute the instructions of an AI  application, and the one or more storage devices 620 may store the knowledge repository according to various embodiments as described in connection with Figs. 1-5. In an embodiment, the processor 610 may execute the instructions of an AI application based on the knowledge repository stored in the storage devices 620, for example, models of the AI application may be trained based on the knowledge repository stored in the storage device 620.
The embodiments of the present disclosure may be embodied in a computer-readable medium such as non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
The embodiments of the present disclosure may be embodied in a computer program product comprising computer-executable instructions that, when executed, cause one or more processors to perform any operations according to the embodiments of the present disclosure as described in connection with FIGs. 1-6.
It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.
It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.

Claims (20)

  1. A method for establishing a knowledge repository for online courses, comprising:
    obtaining a plurality of concepts from a plurality of courses, wherein the plurality of courses including videos and exercises;
    linking each of the videos and the exercises included in the courses to one or more related concepts of the plurality of concepts; and
    linking each of a plurality of student behaviors to one or more related concepts of the plurality of concepts;
    wherein the established knowledge repository comprises the plurality of courses including the videos and exercises, the plurality of student behaviors, the plurality of concepts, the links between the videos and the plurality of concepts, the links between the exercises and the plurality of concepts, the links between the student behaviors and the plurality of concepts.
  2. The method of claim 1, further comprising:
    obtaining external knowledge data for each of the plurality of concepts from websites; and
    linking each of the external knowledge data to one or more related concepts of the plurality of concepts,
    wherein the established knowledge repository further comprises the external knowledge data and the links between the external knowledge data and the plurality of concepts.
  3. The method of claim 1, further comprising:
    obtaining prerequisite relations among the plurality of concepts;
    wherein the established knowledge repository further comprises the prerequisite relations among the plurality of concepts.
  4. The method of one of claims 1 to 3, wherein the established knowledge repository comprises student information, wherein the student information comprises student profiles of a plurality of students and the student behaviors of the plurality of students.
  5. The method of claim 4, wherein the student profile of a student comprises gender, location, age and grade level of the student.
  6. The method of claim 4, wherein the student behavior of a student comprises exercising records, discussion records and video watching records of the student.
  7. The method of claim 6, wherein the exercising records of the student comprise a timestamp of completing an exercise, answers submitted for the exercise, score obtained for the exercise, and revision history of the student,
    the discussion records of the student comprise postings and replies of the student,
    the video watching records of the student comprise video clips, subtitle text of the video clips, the student’s swiping and/or jumping actions to each of the video clips.
  8. The method of claim 4, wherein each of the plurality of courses comprises one or more teaching units, each of the one or more teaching units comprises one or more videos and one or more exercises.
  9. The method of claim 8, wherein each of the plurality of courses comprises teacher information and school information.
  10. The method of claim 8, wherein the obtaining a plurality of concepts from a plurality of courses comprises extracting the concepts from subtitles of the videos included in the plurality of courses.
  11. The method of claim 10, wherein the obtaining a plurality of concepts from a plurality of courses comprises:
    extracting candidate concepts from subtitles of the videos included in the plurality of courses by using multiple concept extraction methods; and
    selecting the concepts out of the candidate concepts based on a ranking of the candidate concepts.
  12. The method of claim 10, wherein the linking each of the videos and the exercises included in the courses to one or more related concepts of the plurality of concepts comprises:
    linking the video to the one or more related concepts extracted from the video; and
    estimating relevance between the exercise and concepts extracted from videos included in the teaching unit containing the exercise, and linking the exercise to the one or more related concepts out of the extracted concepts based on the estimated  relevance.
  13. A computer system for providing a knowledge repository for online courses, comprising:
    one or more processors; and
    one or more storage devices storing the knowledge repository, wherein the knowledge repository comprises a plurality of courses including videos and exercises, a plurality of student behaviors, a plurality of concepts, links between the videos and the plurality of concepts, links between the exercises and the plurality of concepts, links between the student behaviors and the plurality of concepts.
  14. The computer system of claim 13, wherein the knowledge repository further comprises external knowledge data and links between the external knowledge data and the plurality of concepts.
  15. The computer system of claim 14, wherein the knowledge repository further comprises prerequisite relations among the plurality of concepts.
  16. The computer system of one of claim 13 to 15, wherein the knowledge repository comprises student information, wherein the student information comprises student profiles of a plurality of students and the student behaviors of the plurality of students.
  17. The computer system of claim 16, wherein the student behavior of a student comprises exercising records, discussion records and video watching records of the student.
  18. The computer system of claim 17, wherein the exercising records of the student comprise a timestamp of completing an exercise, answers submitted for the exercise, score obtained for the exercise, and revision history of the student,
    the discussion records of the student comprise postings and replies of the student,
    the video watching records of the student comprise video clips, subtitle text of the video clips, the student’s swiping and/or jumping actions to each of the video clips.
  19. A computer system, comprising:
    one or more processors; and
    one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the method of one of claims 1-12.
  20. One or more computer readable storage media storing computer-executable instructions that, when executed, cause one or more processors to perform the operations of the method of one of claims 1-12.
PCT/CN2021/111149 2021-08-06 2021-08-06 Method for establishing knowledge repository for online courses WO2023010514A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180101405.2A CN118020080A (en) 2021-08-06 2021-08-06 Method for establishing knowledge base for online courses
PCT/CN2021/111149 WO2023010514A1 (en) 2021-08-06 2021-08-06 Method for establishing knowledge repository for online courses

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/111149 WO2023010514A1 (en) 2021-08-06 2021-08-06 Method for establishing knowledge repository for online courses

Publications (1)

Publication Number Publication Date
WO2023010514A1 true WO2023010514A1 (en) 2023-02-09

Family

ID=85154155

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/111149 WO2023010514A1 (en) 2021-08-06 2021-08-06 Method for establishing knowledge repository for online courses

Country Status (2)

Country Link
CN (1) CN118020080A (en)
WO (1) WO2023010514A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911915A (en) * 2023-12-07 2024-04-19 华南师范大学 Cross-course knowledge tracking method and device based on transfer learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002056278A2 (en) * 2001-01-12 2002-07-18 Discourse Technologies Inc Method and system for online teaching using web pages
CN101542512A (en) * 2007-04-23 2009-09-23 株式会社Happyedu On-line education method coupled with an item pool and a lecture
CN108959270A (en) * 2018-08-10 2018-12-07 新华智云科技有限公司 A kind of entity link method based on deep learning
CN109191929A (en) * 2018-09-25 2019-01-11 上海优谦智能科技有限公司 The intellectual education system of knowledge based map
CN113010580A (en) * 2021-03-31 2021-06-22 东北大学 Knowledge tracking method based on learning migration

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002056278A2 (en) * 2001-01-12 2002-07-18 Discourse Technologies Inc Method and system for online teaching using web pages
CN101542512A (en) * 2007-04-23 2009-09-23 株式会社Happyedu On-line education method coupled with an item pool and a lecture
CN108959270A (en) * 2018-08-10 2018-12-07 新华智云科技有限公司 A kind of entity link method based on deep learning
CN109191929A (en) * 2018-09-25 2019-01-11 上海优谦智能科技有限公司 The intellectual education system of knowledge based map
CN113010580A (en) * 2021-03-31 2021-06-22 东北大学 Knowledge tracking method based on learning migration

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911915A (en) * 2023-12-07 2024-04-19 华南师范大学 Cross-course knowledge tracking method and device based on transfer learning

Also Published As

Publication number Publication date
CN118020080A (en) 2024-05-10

Similar Documents

Publication Publication Date Title
Onan Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach
Nguyen et al. Ms marco: A human-generated machine reading comprehension dataset
Bajaj et al. Ms marco: A human generated machine reading comprehension dataset
Turney et al. Literal and metaphorical sense identification through concrete and abstract context
Gugnani et al. Generating unified candidate skill graph for career path recommendation
Yu et al. MOOCCubeX: a large knowledge-centered repository for adaptive learning in MOOCs
Atapattu et al. Automated extraction of semantic concepts from semi-structured data: Supporting computer-based education through the analysis of lecture notes
Hazar et al. Recommendation system based on video processing in an E-learning platform
WO2023010514A1 (en) Method for establishing knowledge repository for online courses
Sharma et al. An effective deep learning pipeline for improved question classification into bloom’s taxonomy’s domains
Chu et al. Distribution of Large‐Scale English Test Scores Based on Data Mining
Riza et al. Natural language processing and levenshtein distance for generating error identification typed questions on TOEFL
Karpagam et al. A mobile based intelligent question answering system for education domain
Talaghzi et al. A combined E-learning course recommender system
Karpagam et al. Deep learning approaches for answer selection in question answering system for conversation agents
Ammari et al. Deriving group profiles from social media to facilitate the design of simulated environments for learning
Kumar et al. Augmenting textbooks with CQA question-answers and annotated YouTube videos to increase its relevance
Alvarado Mantecon Towards the automatic classification of student answers to open-ended questions
Jeng et al. Retrieving video features for language acquisition
Ishigaki et al. Distant supervision for extractive question summarization
Zhang et al. Examination-style reading comprehension with neural augmented retrieval
CN111813919A (en) MOOC course evaluation method based on syntactic analysis and keyword detection
Tian et al. Automated matching of exercises with knowledge components
Yu [Retracted] PageRank Topic Finder based Algorithm for Multimedia Resources in Preschool Education
KR20190052320A (en) Apparatus for providing personalized contents

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21952389

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE