CN110929161B

CN110929161B - Large-scale user-oriented personalized teaching resource recommendation method

Info

Publication number: CN110929161B
Application number: CN201911212608.XA
Authority: CN
Inventors: 龚少麟; 贲伟; 赵文涛
Original assignee: Nanjing Laiwangxin Technology Research Institute Co ltd
Current assignee: Nanjing Laiwangxin Technology Research Institute Co ltd
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2023-04-07
Anticipated expiration: 2039-12-02
Also published as: WO2021109464A1; CN110929161A

Abstract

The invention discloses a large-scale user-oriented personalized teaching resource recommendation method, which comprises the following steps: acquiring user interaction data, and performing data preprocessing on the user interaction data to obtain a user resource scoring matrix; performing feature dimensionality reduction on the user resource scoring matrix to obtain a teaching resource feature matrix of the user; clustering the teaching resource feature matrix to obtain teaching resource clusters, and sequencing the teaching resources in the teaching resource clusters; the method comprises the steps of obtaining scores of users on all teaching resources, calculating interest degrees of the users on the teaching resources by sequentially utilizing teaching resource interest degree models, and generating a teaching resource recommendation list by arranging all the teaching resources in a descending order according to the interest degrees. Compared with the prior art, the method and the system can provide rapid and accurate digital teaching resource recommendation service for a large number of users, enhance user experience, and provide an effective solution for personalized utilization of intelligent campus teaching resources.

Description

Large-scale user-oriented personalized teaching resource recommendation method

Technical Field

The invention relates to the field of intelligent campus teaching resource personalized recommendation, in particular to a large-scale user-oriented personalized teaching resource recommendation method.

Background

The recommendation system is an information filtering system, can effectively recommend personalized information according to information requirements, interests and the like of users, and is successfully applied to the fields of online videos, social networks, online music, electronic commerce and the like. With the continuous improvement of a teaching resource library in the construction of a smart campus, rich teaching resources such as electronic books, documents, electronic courseware, micro-class videos and the like are utilized, and personalized recommendation is performed based on collaborative filtering, so that the learning effect of students is improved.

One of the most popular techniques for recommendation engines is collaborative filtering, which relies only on past user actions, such as past transactions or educational resource feedback. Conventional collaborative filtering algorithms such as neighborhood methods and latent factor models typically have several major problems. The first is the sparsity of the user-goods scoring matrix, mainly because most users will score only a small percentage of all goods. And secondly the inability to provide recommendations for very large data sets in real-time or near real-time.

To solve the above problems, many researchers have tried different methods such as clustering and mixing techniques. However, these methods are not applicable to a mass data set environment. The recent research results successfully utilize the Hadoop technology to realize the parallelization of the collaborative filtering algorithm, but the Map Reduce has long calculation time and low efficiency.

Disclosure of Invention

The invention provides a large-scale user-oriented personalized teaching resource recommendation method. In reality, it is not possible for every user to have a behavioral relationship with all educational resources. In fact, the user-resource pairs with interactions are only a small fraction. In other words, the user-resource relationship list is very sparse. The teaching resource scoring matrix is very sparse and can directly influence the accuracy of the model.

In colleges and universities, there are a large number of teaching resources, such as teaching resources of the colleges and universities, large Open Online Courses (MOOC) or Small-scale restricted Online Courses (SPOC) resources, teaching recording and broadcasting resources, and the like. In a big data environment, searching all resources for resources with higher similarity ranking with a target resource greatly affects the efficiency of a recommendation system.

The invention aims to provide a teaching resource recommendation method which is suitable for an ultra-large sparse feature matrix and has better calculation efficiency for a large-scale data set environment. The method and the device solve the problems that the existing collaborative filtering scheme is inaccurate in model caused by data sparsity and cannot meet the requirement of efficient calculation of a large-scale data set.

A large-scale user-oriented personalized teaching resource recommendation method comprises the following steps:

step 1, acquiring user interaction data, and performing data preprocessing on the user interaction data to acquire a user resource scoring matrix;

step 2, performing feature dimension reduction on the user resource scoring matrix to obtain a teaching resource feature matrix of the user;

step 3, clustering the teaching resource feature matrix to obtain teaching resource clusters, and sequencing the teaching resources in the teaching resource clusters;

and 4, acquiring scores of the user on all the teaching resources, calculating interest degrees of the user on all the teaching resources by sequentially utilizing the teaching resource interest degree models, and generating a teaching resource recommendation list by sequencing all the teaching resources in a descending manner according to the interest degrees.

Further, in one implementation, the step 1 includes:

step 101, collecting a teaching resource scoring data set, and loading the teaching resource scoring data set into a data warehouse for storage, wherein the teaching resource scoring data set comprises teaching resources, scoring data and user information corresponding to the scoring data;

102, analyzing the teaching resources in the teaching resource grading data set, searching and deleting teaching resources and users with abnormal grading data, wherein the teaching resources and users with abnormal grading data comprise: the scoring method comprises the following steps of (1) over-range scoring data and corresponding users thereof, and malicious scoring users and corresponding scoring data thereof;

and 103, extracting the user ID, the resource ID and the grading characteristic value, and constructing a user resource grading matrix.

Further, in one implementation, the step 2 includes:

step 201, mapping a user resource scoring matrix to a low-dimensional potential factor space by using an alternating least square method based on a Spark big data analysis platform;

step 202, minimizing a first objective function to calculate and obtain a user characteristic matrix and a teaching resource characteristic matrix, wherein the first objective function, namely a square error loss function, is represented as:

wherein r is _ui Represents the grade of the user u on the teaching resource i, and the grade r _ui Has a value range of [0,5 ]]Integer of (a), p _u Feature vector, q, representing user u _i The characteristic vector of the teaching resource i is represented, lambda represents a regularization parameter, and the value range of the regularization parameter lambda is [0,1 ]]，

Representing the interaction between user u and tutorial resource i.

Further, in one implementation, the step 3 includes:

step 301, clustering teaching resources similar to the resource features by using a K-Means clustering algorithm on the teaching resource feature matrix based on a Spark big data analysis platform, and obtaining teaching resource clusters by minimizing a second objective function, namely a square error function, which is expressed as:

wherein k represents the number of clusters, b represents the number of clusters, n represents the number of data points in cluster b, a represents the number of data points, and x _a A value representing data point a in cluster b, c _b A value representing the center of cluster b of clusters,

is x _a And c _b The euclidean distance therebetween;

and 302, retrieving the teaching resources in each teaching resource cluster, sequencing the teaching resources from small to large according to the distance between the teaching resources and the cluster center, and storing the sequencing result in a data warehouse.

Further, in an implementation manner, the step 4 includes:

step 401, obtaining a teaching resource cluster to which a history scoring resource belongs according to the history scoring resource of a user, wherein the history scoring resource is the teaching resource with user scoring data;

step 402, obtaining the first K teaching resources which can represent the clustering characteristics of the teaching resources in the teaching resource cluster to which the historical scoring resources belong, and calculating the interestingness of the K teaching resources by using a teaching resource interestingness model, wherein 0-plus K-plus N and N represent the number of the teaching resources in the teaching resource cluster;

step 403, calculating by using a teaching resource interest degree model according to all historical scoring resources of the user to obtain the interest degree of the user to all teaching resources;

the teaching resource interestingness model is expressed as:

wherein p is _ud Representing the interest degree of the user u in the teaching resource d, N (u) representing a teaching resource set evaluated by the user u, S (d) representing a resource cluster set to which the teaching resource d belongs, c representing a teaching resource evaluated by the user u and in the resource cluster set to which the teaching resource d belongs, and w _d Represents the feature matching degree, r, of the teaching resource d _uc Representing the user u's rating of the instructional resource c.

According to the technical scheme, the embodiment of the invention provides a large-scale user-oriented personalized teaching resource recommendation method, which comprises the following steps: step 1, acquiring user interaction data, and performing data preprocessing on the user interaction data to acquire a user resource scoring matrix; step 2, performing feature dimension reduction on the user resource scoring matrix to obtain a teaching resource feature matrix and a user feature matrix of the user; step 3, clustering the teaching resource feature matrix to obtain teaching resource clusters, and sequencing teaching resources in the teaching resource clusters; and 4, acquiring scores of all the teaching resources by the user, sequentially utilizing the teaching resource interest degree models in sequence, calculating to obtain the interest degrees of the user on all the teaching resources, and performing descending order arrangement on all the teaching resources according to the interest degrees to generate a teaching resource recommendation list.

The method comprises the steps of constructing a user-resource scoring matrix by preprocessing a teaching resource scoring data set, obtaining a preference matrix of a user for implicit characteristics and an implicit characteristic matrix contained in teaching resources by using an ALS dimensionality reduction algorithm, forming teaching resource clusters by using a K-Means clustering algorithm for the implicit characteristic matrix contained in the teaching resources, and finally forming a resource interest degree list recommended by the user through a resource recommendation model. Efficient parallel computation is achieved and the problem of expandability is solved through the method based on the Apache Spark.

In summary, the invention utilizes the technologies of Spark high concurrency processing, ALS dimensionality reduction algorithm, K-Means clustering algorithm and the like, and compared with the prior art, can provide rapid and accurate digital teaching resource recommendation service for a large number of users, enhance user experience, improve learning effect and provide an effective solution for personalized utilization of intelligent campus teaching resources.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

Fig. 1 is a schematic workflow diagram of a large-scale user-oriented personalized teaching resource recommendation method according to an embodiment of the present invention;

FIG. 2 is a schematic view of a workflow of a feature dimension reduction algorithm in the large-scale user-oriented personalized teaching resource recommendation method according to the embodiment of the present invention;

FIG. 3 is a schematic view of a workflow of a resource clustering algorithm in a large-scale user-oriented personalized teaching resource recommendation method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a finally generated recommendation list in the large-scale user-oriented personalized teaching resource recommendation method according to the embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.

The embodiment of the invention discloses a large-scale user-oriented personalized teaching resource recommendation method which is applied to an online learning platform of colleges and universities and can provide efficient and accurate teaching resource recommendation capability for users under the scenes of large-scale users and teaching resources in an autonomous learning process.

In reality, it is not possible for every user to have a behavioral relationship with all educational resources. In fact, the user-resource pairs with interactions are only a small part. In other words, the user-resource relationship list is very sparse. The teaching resource scoring matrix is very sparse, and the accuracy of the model can be directly influenced. A great amount of teaching resources exist in colleges and universities, such as local school teaching resources, MOOC/SPOC resources, teaching recorded broadcast resources and the like. In a big data environment, searching all resources for resources with higher similarity ranking with target resources greatly affects the efficiency of a recommendation system.

The embodiment of the invention adopts a Spark big data analysis platform, spark is a source-opened big data analysis frame which is divided into four big modules, wherein the Spark SQL module provides SQL-like query; spark Streaming is a Streaming computing module and is mainly used for processing online real-time sequence data; the MLlib module provides various models and tuning tools for machine learning, such as classification, regression, clustering, collaborative filtering, dimension reduction, and the like; the GraphX module provides a graph-based algorithm. Spark is based on Hadoop MapReduce, and expands the MapReduce model to be effectively used for more types of computation, including interactive query and stream processing. In addition, the processing speed of the application program can be improved through the memory calculation of Spark. The Apache Spark realizes efficient parallel computation of a recommendation algorithm, and with the increase of the data volume of teaching resources, the problem of expandability of a recommendation scheme can be solved by expanding Spark cluster nodes.

As shown in fig. 1, a large-scale user-oriented personalized teaching resource recommendation method according to an embodiment of the present invention includes:

step 2, performing feature dimension reduction on the user resource scoring matrix to obtain a teaching resource feature matrix and a user feature matrix of the user;

and 4, acquiring scores of the user on all the teaching resources, calculating interest degrees of the user on all the teaching resources by sequentially utilizing the teaching resource interest degree models, and generating a teaching resource recommendation list by sequencing all the teaching resources in a descending manner according to the interest degrees. Fig. 4 is a schematic diagram of a finally generated recommendation list in the personalized teaching resource recommendation method for large-scale users according to the embodiment of the present invention, and other expression forms may also be adopted in specific implementation.

In the method for recommending personalized teaching resources for large-scale users according to the embodiment of the present invention, the step 1 includes:

step 101, collecting a teaching resource scoring data set, and loading the teaching resource scoring data set into a data warehouse for storage, wherein the teaching resource scoring data set comprises teaching resources, scoring data and user information corresponding to the scoring data; in this step, a teaching resource scoring data set is collected through a teaching resource access platform of a school.

in this embodiment, the out-of-range scoring data is teaching resources with a scoring value smaller than 1 or larger than 5, and the malicious scoring users are users with scoring data smaller than 2.

As shown in fig. 2, in the method for recommending personalized teaching resources for large-scale users according to the embodiment of the present invention, the step 2 includes:

step 201, mapping a user resource scoring matrix to a low-dimensional latent factor space by using an Alternating Least Squares (ALS) method based on a Spark big data analysis platform; specifically, in this step, the teaching resource scoring data set needs to be divided into a training set and a test set, and then a model is trained on the training set, and the performance of the index is evaluated on the test set. In this embodiment, the training set and the test set can be obtained by 60/40 division.

Step 202, minimizing a first objective function to calculate and obtain a user characteristic matrix and a teaching resource characteristic matrix, where the first objective function is a squared error loss function, and is expressed as:

wherein r is _ui Representing the grade of the user u on the teaching resource i, and the grade r _ui Has a value range of [0,5 ]]Integer of (a), p _u Feature vector representing a user, q _i Representing the characteristic vector of the teaching resource i, lambda represents the regularization parameter, and the value range of the regularization parameter lambda is [0, 1')]，

Representing the interaction between user u and tutorial resource i.

In this step, firstly, it is necessary to assume the number g of hidden factors, where the number g of hidden factors may be an integer in the range of [10,100], such as {20, 50, 70, 100}, and λ is {0.01,0.1,1}, respectively training on a training set to obtain a model, and then calculating a Root Mean Square Error (RMSE) on a test set as an evaluation index. And finally, comparing the evaluation index to be the minimum value, and then, taking the g value as the optimal solution.

Through the step 2, the sparsity problem of the conventional collaborative filtering method can be solved. Parallelized processing is facilitated and implicit data sets can be processed efficiently.

As shown in fig. 3, in the method for recommending personalized teaching resources for large-scale users according to the embodiment of the present invention, the step 3 includes:

step 301, clustering teaching resources similar to the resource features on the teaching resource feature matrix by using a K-Means clustering algorithm based on a Spark big data analysis platform, and obtaining teaching resource clusters by minimizing a second objective function, namely a square error function, which is expressed as:

wherein k represents the number of clusters, b represents the number of clusters, n represents the number of data points in cluster b, a represents the number of data points, and x _a A value representing data point a in cluster b, c _b A value indicating the center of the cluster group b,

is x _a And c _b The euclidean distance therebetween;

in this step, it is first required to assume the number k of clusters, which may be an integer in the range of [10,100], for example {10, 20, 40, 60, 80}, respectively train on a training set to obtain a model, and then calculate an objective function on a test set. And finally, comparing the target function to be the minimum value, and then taking the k value as the optimal solution.

Before the step, the teaching resource feature matrix is firstly divided into a training set and a test set, then a model is trained on the training set, and the performance of the index is evaluated on the test set. Specifically, the training set and the test set can be obtained according to 60/40 division.

Step 302, retrieving the teaching resources in each teaching resource cluster, sequencing the teaching resources from small to large according to the distance between the teaching resources and the cluster center, and storing the sequencing result in a data warehouse.

In the method for recommending personalized teaching resources for large-scale users according to the embodiment of the present invention, step 4 includes:

step 401, obtaining a teaching resource cluster to which a history scoring resource belongs according to the history scoring resource of a user, wherein the history scoring resource is the teaching resource with user scoring data; in this embodiment, before recommending resources to a user, the user must score at least one resource.

step 403, calculating interest level of the user in all teaching resources by using a teaching resource interest level model according to all historical scoring resources of the user;

the teaching resource interestingness model is expressed as:

wherein p is _ud Representing the interest degree of the user u in the teaching resource d, N (u) representing a teaching resource set evaluated by the user u, S (d) representing a resource cluster set to which the teaching resource d belongs, c representing a teaching resource evaluated by the user u and in the resource cluster set to which the teaching resource d belongs, and w _d Representing the degree of feature matching, r, of the teaching resource d _uc Representing the user u's rating of the instructional resource c.

In this embodiment, the feature matching degree may be represented by an inverse number of a distance between the teaching resource and the cluster center, and the smaller the distance, the higher the matching degree.

Fig. 4 is a schematic diagram of a finally generated recommendation list in a large-scale user-oriented personalized teaching resource recommendation method according to an embodiment of the present invention, and other expression forms may also be adopted in specific implementations. The method finally generates the teaching resource recommendation list by calculating the sum of the interestingness of the user to the same teaching resource and sequencing all the teaching resources in a descending order according to the sum of the interestingness. Through the teaching resource recommendation list, the user can quickly and accurately recommend the teaching resource recommendation service.

The invention comprises a large-scale user-oriented personalized teaching resource recommendation method. The method comprises the steps of constructing a user-resource scoring matrix by preprocessing a teaching resource scoring data set, obtaining a teaching resource feature matrix and a user feature matrix by using an ALS (adaptive energy storage) dimensionality reduction algorithm, forming teaching resource clusters by using a K-means clustering algorithm (K-means clustering algorithm) for the teaching resource feature matrix, and finally forming a teaching resource recommendation list of a user through a resource recommendation model.

The method comprises the steps of constructing a user-resource scoring matrix by preprocessing a teaching resource scoring data set, obtaining a preference matrix of a user for implicit characteristics and an implicit characteristic matrix contained in teaching resources by using an ALS dimensionality reduction algorithm, forming teaching resource clusters by using a K-Means clustering algorithm for the implicit characteristic matrix contained in the teaching resources, and finally forming a resource interest degree list recommended by the user through a resource recommendation model. Efficient parallel computation is achieved and the problem of expandability is solved through the Apache Spark.

According to the method, technologies such as Spark high concurrency processing, ALS dimensionality reduction algorithm, K-Means clustering algorithm and the like are utilized, compared with the prior art, rapid and accurate digital teaching resource recommendation service can be provided for a large number of users, user experience is enhanced, learning effect is improved, and an effective solution is provided for personalized utilization of intelligent campus teaching resources.

In a specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, the program may include some or all of the steps in each embodiment of the personalized teaching resource recommendation method for large-scale users provided by the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The same and similar parts among the various embodiments in this specification may be referred to each other. The above-described embodiments of the present invention do not limit the scope of the present invention.

Claims

1. A large-scale user-oriented personalized teaching resource recommendation method is characterized by comprising the following steps:

step 2, performing feature dimensionality reduction on the user resource scoring matrix to obtain a teaching resource feature matrix of the user;

step 4, obtaining scores of the user on all teaching resources, calculating interest degrees of the user on all teaching resources by sequentially using a teaching resource interest degree model, and generating a teaching resource recommendation list by arranging all teaching resources in a descending order according to the interest degrees;

the step 1 comprises the following steps:

102, analyzing the teaching resources in the teaching resource grading data set, searching and deleting teaching resources and users with abnormal grading data, wherein the teaching resources and users with abnormal grading data comprise: the scoring data beyond the range and the corresponding users thereof, and the malicious scoring users and the corresponding scoring data thereof;

103, extracting a user ID, a resource ID and a grading characteristic value, and constructing a user resource grading matrix;

the step 2 includes:

wherein r is _ui For indicatingScoring of teaching resource i by user u, and scoring r _ui Has a value range of [0,5 ]]Integer of (b), p _u Feature vector, q, representing user u _i The characteristic vector of the teaching resource i is represented, lambda represents a regularization parameter, and the value range of the regularization parameter lambda is [0,1 ]]，

Representing the interaction between the user u and the teaching resource i;

the step 3 comprises the following steps:

is x _a And c _b The euclidean distance therebetween;

step 302, retrieving the teaching resources in each teaching resource cluster, sequencing the teaching resources from small to large according to the distance between the teaching resources and a cluster center, and storing the sequencing result in a data warehouse;

the step 4 comprises the following steps:

step 401, obtaining teaching resource clusters to which historical scoring resources belong according to the historical scoring resources of users, wherein the historical scoring resources are teaching resources with user scoring data;

step 402, obtaining the first K teaching resources which can represent the clustering characteristics of the teaching resources in the teaching resource cluster to which the history scoring resources belong, and calculating the interestingness of the K teaching resources by using a teaching resource interestingness model, wherein 0-plus-K-plus-N and N represent the number of the teaching resources in the teaching resource cluster;

the teaching resource interestingness model is expressed as:

wherein p is _ud Representing the interest degree of the user u in the teaching resource d, N (u) representing a teaching resource set evaluated by the user u, S (d) representing a resource cluster set to which the teaching resource d belongs, c representing a teaching resource evaluated by the user u and in the resource cluster set to which the teaching resource d belongs, w _d Representing the degree of feature matching, r, of the teaching resource d _uc Representing the user u's rating of the educational resource c.