CN110929161B - Large-scale user-oriented personalized teaching resource recommendation method - Google Patents

Large-scale user-oriented personalized teaching resource recommendation method Download PDF

Info

Publication number
CN110929161B
CN110929161B CN201911212608.XA CN201911212608A CN110929161B CN 110929161 B CN110929161 B CN 110929161B CN 201911212608 A CN201911212608 A CN 201911212608A CN 110929161 B CN110929161 B CN 110929161B
Authority
CN
China
Prior art keywords
teaching
resource
user
resources
teaching resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911212608.XA
Other languages
Chinese (zh)
Other versions
CN110929161A (en
Inventor
龚少麟
贲伟
赵文涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Laiwangxin Technology Research Institute Co ltd
Original Assignee
Nanjing Laiwangxin Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Laiwangxin Technology Research Institute Co ltd filed Critical Nanjing Laiwangxin Technology Research Institute Co ltd
Priority to CN201911212608.XA priority Critical patent/CN110929161B/en
Publication of CN110929161A publication Critical patent/CN110929161A/en
Priority to PCT/CN2020/090567 priority patent/WO2021109464A1/en
Application granted granted Critical
Publication of CN110929161B publication Critical patent/CN110929161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Educational Technology (AREA)
  • Educational Administration (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a large-scale user-oriented personalized teaching resource recommendation method, which comprises the following steps: acquiring user interaction data, and performing data preprocessing on the user interaction data to obtain a user resource scoring matrix; performing feature dimensionality reduction on the user resource scoring matrix to obtain a teaching resource feature matrix of the user; clustering the teaching resource feature matrix to obtain teaching resource clusters, and sequencing the teaching resources in the teaching resource clusters; the method comprises the steps of obtaining scores of users on all teaching resources, calculating interest degrees of the users on the teaching resources by sequentially utilizing teaching resource interest degree models, and generating a teaching resource recommendation list by arranging all the teaching resources in a descending order according to the interest degrees. Compared with the prior art, the method and the system can provide rapid and accurate digital teaching resource recommendation service for a large number of users, enhance user experience, and provide an effective solution for personalized utilization of intelligent campus teaching resources.

Description

Large-scale user-oriented personalized teaching resource recommendation method
Technical Field
The invention relates to the field of intelligent campus teaching resource personalized recommendation, in particular to a large-scale user-oriented personalized teaching resource recommendation method.
Background
The recommendation system is an information filtering system, can effectively recommend personalized information according to information requirements, interests and the like of users, and is successfully applied to the fields of online videos, social networks, online music, electronic commerce and the like. With the continuous improvement of a teaching resource library in the construction of a smart campus, rich teaching resources such as electronic books, documents, electronic courseware, micro-class videos and the like are utilized, and personalized recommendation is performed based on collaborative filtering, so that the learning effect of students is improved.
One of the most popular techniques for recommendation engines is collaborative filtering, which relies only on past user actions, such as past transactions or educational resource feedback. Conventional collaborative filtering algorithms such as neighborhood methods and latent factor models typically have several major problems. The first is the sparsity of the user-goods scoring matrix, mainly because most users will score only a small percentage of all goods. And secondly the inability to provide recommendations for very large data sets in real-time or near real-time.
To solve the above problems, many researchers have tried different methods such as clustering and mixing techniques. However, these methods are not applicable to a mass data set environment. The recent research results successfully utilize the Hadoop technology to realize the parallelization of the collaborative filtering algorithm, but the Map Reduce has long calculation time and low efficiency.
Disclosure of Invention
The invention provides a large-scale user-oriented personalized teaching resource recommendation method. In reality, it is not possible for every user to have a behavioral relationship with all educational resources. In fact, the user-resource pairs with interactions are only a small fraction. In other words, the user-resource relationship list is very sparse. The teaching resource scoring matrix is very sparse and can directly influence the accuracy of the model.
In colleges and universities, there are a large number of teaching resources, such as teaching resources of the colleges and universities, large Open Online Courses (MOOC) or Small-scale restricted Online Courses (SPOC) resources, teaching recording and broadcasting resources, and the like. In a big data environment, searching all resources for resources with higher similarity ranking with a target resource greatly affects the efficiency of a recommendation system.
The invention aims to provide a teaching resource recommendation method which is suitable for an ultra-large sparse feature matrix and has better calculation efficiency for a large-scale data set environment. The method and the device solve the problems that the existing collaborative filtering scheme is inaccurate in model caused by data sparsity and cannot meet the requirement of efficient calculation of a large-scale data set.
A large-scale user-oriented personalized teaching resource recommendation method comprises the following steps:
step 1, acquiring user interaction data, and performing data preprocessing on the user interaction data to acquire a user resource scoring matrix;
step 2, performing feature dimension reduction on the user resource scoring matrix to obtain a teaching resource feature matrix of the user;
step 3, clustering the teaching resource feature matrix to obtain teaching resource clusters, and sequencing the teaching resources in the teaching resource clusters;
and 4, acquiring scores of the user on all the teaching resources, calculating interest degrees of the user on all the teaching resources by sequentially utilizing the teaching resource interest degree models, and generating a teaching resource recommendation list by sequencing all the teaching resources in a descending manner according to the interest degrees.
Further, in one implementation, the step 1 includes:
step 101, collecting a teaching resource scoring data set, and loading the teaching resource scoring data set into a data warehouse for storage, wherein the teaching resource scoring data set comprises teaching resources, scoring data and user information corresponding to the scoring data;
102, analyzing the teaching resources in the teaching resource grading data set, searching and deleting teaching resources and users with abnormal grading data, wherein the teaching resources and users with abnormal grading data comprise: the scoring method comprises the following steps of (1) over-range scoring data and corresponding users thereof, and malicious scoring users and corresponding scoring data thereof;
and 103, extracting the user ID, the resource ID and the grading characteristic value, and constructing a user resource grading matrix.
Further, in one implementation, the step 2 includes:
step 201, mapping a user resource scoring matrix to a low-dimensional potential factor space by using an alternating least square method based on a Spark big data analysis platform;
step 202, minimizing a first objective function to calculate and obtain a user characteristic matrix and a teaching resource characteristic matrix, wherein the first objective function, namely a square error loss function, is represented as:
Figure BDA0002298548230000021
wherein r is ui Represents the grade of the user u on the teaching resource i, and the grade r ui Has a value range of [0,5 ]]Integer of (a), p u Feature vector, q, representing user u i The characteristic vector of the teaching resource i is represented, lambda represents a regularization parameter, and the value range of the regularization parameter lambda is [0,1 ]],
Figure BDA0002298548230000031
Representing the interaction between user u and tutorial resource i.
Further, in one implementation, the step 3 includes:
step 301, clustering teaching resources similar to the resource features by using a K-Means clustering algorithm on the teaching resource feature matrix based on a Spark big data analysis platform, and obtaining teaching resource clusters by minimizing a second objective function, namely a square error function, which is expressed as:
Figure BDA0002298548230000032
wherein k represents the number of clusters, b represents the number of clusters, n represents the number of data points in cluster b, a represents the number of data points, and x a A value representing data point a in cluster b, c b A value representing the center of cluster b of clusters,
Figure BDA0002298548230000033
is x a And c b The euclidean distance therebetween;
and 302, retrieving the teaching resources in each teaching resource cluster, sequencing the teaching resources from small to large according to the distance between the teaching resources and the cluster center, and storing the sequencing result in a data warehouse.
Further, in an implementation manner, the step 4 includes:
step 401, obtaining a teaching resource cluster to which a history scoring resource belongs according to the history scoring resource of a user, wherein the history scoring resource is the teaching resource with user scoring data;
step 402, obtaining the first K teaching resources which can represent the clustering characteristics of the teaching resources in the teaching resource cluster to which the historical scoring resources belong, and calculating the interestingness of the K teaching resources by using a teaching resource interestingness model, wherein 0-plus K-plus N and N represent the number of the teaching resources in the teaching resource cluster;
step 403, calculating by using a teaching resource interest degree model according to all historical scoring resources of the user to obtain the interest degree of the user to all teaching resources;
the teaching resource interestingness model is expressed as:
Figure BDA0002298548230000034
wherein p is ud Representing the interest degree of the user u in the teaching resource d, N (u) representing a teaching resource set evaluated by the user u, S (d) representing a resource cluster set to which the teaching resource d belongs, c representing a teaching resource evaluated by the user u and in the resource cluster set to which the teaching resource d belongs, and w d Represents the feature matching degree, r, of the teaching resource d uc Representing the user u's rating of the instructional resource c.
According to the technical scheme, the embodiment of the invention provides a large-scale user-oriented personalized teaching resource recommendation method, which comprises the following steps: step 1, acquiring user interaction data, and performing data preprocessing on the user interaction data to acquire a user resource scoring matrix; step 2, performing feature dimension reduction on the user resource scoring matrix to obtain a teaching resource feature matrix and a user feature matrix of the user; step 3, clustering the teaching resource feature matrix to obtain teaching resource clusters, and sequencing teaching resources in the teaching resource clusters; and 4, acquiring scores of all the teaching resources by the user, sequentially utilizing the teaching resource interest degree models in sequence, calculating to obtain the interest degrees of the user on all the teaching resources, and performing descending order arrangement on all the teaching resources according to the interest degrees to generate a teaching resource recommendation list.
The method comprises the steps of constructing a user-resource scoring matrix by preprocessing a teaching resource scoring data set, obtaining a preference matrix of a user for implicit characteristics and an implicit characteristic matrix contained in teaching resources by using an ALS dimensionality reduction algorithm, forming teaching resource clusters by using a K-Means clustering algorithm for the implicit characteristic matrix contained in the teaching resources, and finally forming a resource interest degree list recommended by the user through a resource recommendation model. Efficient parallel computation is achieved and the problem of expandability is solved through the method based on the Apache Spark.
In summary, the invention utilizes the technologies of Spark high concurrency processing, ALS dimensionality reduction algorithm, K-Means clustering algorithm and the like, and compared with the prior art, can provide rapid and accurate digital teaching resource recommendation service for a large number of users, enhance user experience, improve learning effect and provide an effective solution for personalized utilization of intelligent campus teaching resources.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
Fig. 1 is a schematic workflow diagram of a large-scale user-oriented personalized teaching resource recommendation method according to an embodiment of the present invention;
FIG. 2 is a schematic view of a workflow of a feature dimension reduction algorithm in the large-scale user-oriented personalized teaching resource recommendation method according to the embodiment of the present invention;
FIG. 3 is a schematic view of a workflow of a resource clustering algorithm in a large-scale user-oriented personalized teaching resource recommendation method according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a finally generated recommendation list in the large-scale user-oriented personalized teaching resource recommendation method according to the embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention more comprehensible, the present invention is described in detail with reference to the accompanying drawings and the detailed description thereof.
The embodiment of the invention discloses a large-scale user-oriented personalized teaching resource recommendation method which is applied to an online learning platform of colleges and universities and can provide efficient and accurate teaching resource recommendation capability for users under the scenes of large-scale users and teaching resources in an autonomous learning process.
In reality, it is not possible for every user to have a behavioral relationship with all educational resources. In fact, the user-resource pairs with interactions are only a small part. In other words, the user-resource relationship list is very sparse. The teaching resource scoring matrix is very sparse, and the accuracy of the model can be directly influenced. A great amount of teaching resources exist in colleges and universities, such as local school teaching resources, MOOC/SPOC resources, teaching recorded broadcast resources and the like. In a big data environment, searching all resources for resources with higher similarity ranking with target resources greatly affects the efficiency of a recommendation system.
The invention aims to provide a teaching resource recommendation method which is suitable for an ultra-large sparse feature matrix and has better calculation efficiency for a large-scale data set environment. The method and the device solve the problems that the existing collaborative filtering scheme is inaccurate in model caused by data sparsity and cannot meet the requirement of efficient calculation of a large-scale data set.
The embodiment of the invention adopts a Spark big data analysis platform, spark is a source-opened big data analysis frame which is divided into four big modules, wherein the Spark SQL module provides SQL-like query; spark Streaming is a Streaming computing module and is mainly used for processing online real-time sequence data; the MLlib module provides various models and tuning tools for machine learning, such as classification, regression, clustering, collaborative filtering, dimension reduction, and the like; the GraphX module provides a graph-based algorithm. Spark is based on Hadoop MapReduce, and expands the MapReduce model to be effectively used for more types of computation, including interactive query and stream processing. In addition, the processing speed of the application program can be improved through the memory calculation of Spark. The Apache Spark realizes efficient parallel computation of a recommendation algorithm, and with the increase of the data volume of teaching resources, the problem of expandability of a recommendation scheme can be solved by expanding Spark cluster nodes.
As shown in fig. 1, a large-scale user-oriented personalized teaching resource recommendation method according to an embodiment of the present invention includes:
step 1, acquiring user interaction data, and performing data preprocessing on the user interaction data to acquire a user resource scoring matrix;
step 2, performing feature dimension reduction on the user resource scoring matrix to obtain a teaching resource feature matrix and a user feature matrix of the user;
step 3, clustering the teaching resource feature matrix to obtain teaching resource clusters, and sequencing the teaching resources in the teaching resource clusters;
and 4, acquiring scores of the user on all the teaching resources, calculating interest degrees of the user on all the teaching resources by sequentially utilizing the teaching resource interest degree models, and generating a teaching resource recommendation list by sequencing all the teaching resources in a descending manner according to the interest degrees. Fig. 4 is a schematic diagram of a finally generated recommendation list in the personalized teaching resource recommendation method for large-scale users according to the embodiment of the present invention, and other expression forms may also be adopted in specific implementation.
In the method for recommending personalized teaching resources for large-scale users according to the embodiment of the present invention, the step 1 includes:
step 101, collecting a teaching resource scoring data set, and loading the teaching resource scoring data set into a data warehouse for storage, wherein the teaching resource scoring data set comprises teaching resources, scoring data and user information corresponding to the scoring data; in this step, a teaching resource scoring data set is collected through a teaching resource access platform of a school.
102, analyzing the teaching resources in the teaching resource grading data set, searching and deleting teaching resources and users with abnormal grading data, wherein the teaching resources and users with abnormal grading data comprise: the scoring method comprises the following steps of (1) over-range scoring data and corresponding users thereof, and malicious scoring users and corresponding scoring data thereof;
in this embodiment, the out-of-range scoring data is teaching resources with a scoring value smaller than 1 or larger than 5, and the malicious scoring users are users with scoring data smaller than 2.
And 103, extracting the user ID, the resource ID and the grading characteristic value, and constructing a user resource grading matrix.
As shown in fig. 2, in the method for recommending personalized teaching resources for large-scale users according to the embodiment of the present invention, the step 2 includes:
step 201, mapping a user resource scoring matrix to a low-dimensional latent factor space by using an Alternating Least Squares (ALS) method based on a Spark big data analysis platform; specifically, in this step, the teaching resource scoring data set needs to be divided into a training set and a test set, and then a model is trained on the training set, and the performance of the index is evaluated on the test set. In this embodiment, the training set and the test set can be obtained by 60/40 division.
Step 202, minimizing a first objective function to calculate and obtain a user characteristic matrix and a teaching resource characteristic matrix, where the first objective function is a squared error loss function, and is expressed as:
Figure BDA0002298548230000071
wherein r is ui Representing the grade of the user u on the teaching resource i, and the grade r ui Has a value range of [0,5 ]]Integer of (a), p u Feature vector representing a user, q i Representing the characteristic vector of the teaching resource i, lambda represents the regularization parameter, and the value range of the regularization parameter lambda is [0, 1')],
Figure BDA0002298548230000072
Representing the interaction between user u and tutorial resource i.
In this step, firstly, it is necessary to assume the number g of hidden factors, where the number g of hidden factors may be an integer in the range of [10,100], such as {20, 50, 70, 100}, and λ is {0.01,0.1,1}, respectively training on a training set to obtain a model, and then calculating a Root Mean Square Error (RMSE) on a test set as an evaluation index. And finally, comparing the evaluation index to be the minimum value, and then, taking the g value as the optimal solution.
Through the step 2, the sparsity problem of the conventional collaborative filtering method can be solved. Parallelized processing is facilitated and implicit data sets can be processed efficiently.
As shown in fig. 3, in the method for recommending personalized teaching resources for large-scale users according to the embodiment of the present invention, the step 3 includes:
step 301, clustering teaching resources similar to the resource features on the teaching resource feature matrix by using a K-Means clustering algorithm based on a Spark big data analysis platform, and obtaining teaching resource clusters by minimizing a second objective function, namely a square error function, which is expressed as:
Figure BDA0002298548230000073
wherein k represents the number of clusters, b represents the number of clusters, n represents the number of data points in cluster b, a represents the number of data points, and x a A value representing data point a in cluster b, c b A value indicating the center of the cluster group b,
Figure BDA0002298548230000074
is x a And c b The euclidean distance therebetween;
in this step, it is first required to assume the number k of clusters, which may be an integer in the range of [10,100], for example {10, 20, 40, 60, 80}, respectively train on a training set to obtain a model, and then calculate an objective function on a test set. And finally, comparing the target function to be the minimum value, and then taking the k value as the optimal solution.
Before the step, the teaching resource feature matrix is firstly divided into a training set and a test set, then a model is trained on the training set, and the performance of the index is evaluated on the test set. Specifically, the training set and the test set can be obtained according to 60/40 division.
Step 302, retrieving the teaching resources in each teaching resource cluster, sequencing the teaching resources from small to large according to the distance between the teaching resources and the cluster center, and storing the sequencing result in a data warehouse.
In the method for recommending personalized teaching resources for large-scale users according to the embodiment of the present invention, step 4 includes:
step 401, obtaining a teaching resource cluster to which a history scoring resource belongs according to the history scoring resource of a user, wherein the history scoring resource is the teaching resource with user scoring data; in this embodiment, before recommending resources to a user, the user must score at least one resource.
Step 402, obtaining the first K teaching resources which can represent the clustering characteristics of the teaching resources in the teaching resource cluster to which the historical scoring resources belong, and calculating the interestingness of the K teaching resources by using a teaching resource interestingness model, wherein 0-plus K-plus N and N represent the number of the teaching resources in the teaching resource cluster;
step 403, calculating interest level of the user in all teaching resources by using a teaching resource interest level model according to all historical scoring resources of the user;
the teaching resource interestingness model is expressed as:
Figure BDA0002298548230000081
wherein p is ud Representing the interest degree of the user u in the teaching resource d, N (u) representing a teaching resource set evaluated by the user u, S (d) representing a resource cluster set to which the teaching resource d belongs, c representing a teaching resource evaluated by the user u and in the resource cluster set to which the teaching resource d belongs, and w d Representing the degree of feature matching, r, of the teaching resource d uc Representing the user u's rating of the instructional resource c.
In this embodiment, the feature matching degree may be represented by an inverse number of a distance between the teaching resource and the cluster center, and the smaller the distance, the higher the matching degree.
Fig. 4 is a schematic diagram of a finally generated recommendation list in a large-scale user-oriented personalized teaching resource recommendation method according to an embodiment of the present invention, and other expression forms may also be adopted in specific implementations. The method finally generates the teaching resource recommendation list by calculating the sum of the interestingness of the user to the same teaching resource and sequencing all the teaching resources in a descending order according to the sum of the interestingness. Through the teaching resource recommendation list, the user can quickly and accurately recommend the teaching resource recommendation service.
The invention comprises a large-scale user-oriented personalized teaching resource recommendation method. The method comprises the steps of constructing a user-resource scoring matrix by preprocessing a teaching resource scoring data set, obtaining a teaching resource feature matrix and a user feature matrix by using an ALS (adaptive energy storage) dimensionality reduction algorithm, forming teaching resource clusters by using a K-means clustering algorithm (K-means clustering algorithm) for the teaching resource feature matrix, and finally forming a teaching resource recommendation list of a user through a resource recommendation model.
The method comprises the steps of constructing a user-resource scoring matrix by preprocessing a teaching resource scoring data set, obtaining a preference matrix of a user for implicit characteristics and an implicit characteristic matrix contained in teaching resources by using an ALS dimensionality reduction algorithm, forming teaching resource clusters by using a K-Means clustering algorithm for the implicit characteristic matrix contained in the teaching resources, and finally forming a resource interest degree list recommended by the user through a resource recommendation model. Efficient parallel computation is achieved and the problem of expandability is solved through the Apache Spark.
According to the method, technologies such as Spark high concurrency processing, ALS dimensionality reduction algorithm, K-Means clustering algorithm and the like are utilized, compared with the prior art, rapid and accurate digital teaching resource recommendation service can be provided for a large number of users, user experience is enhanced, learning effect is improved, and an effective solution is provided for personalized utilization of intelligent campus teaching resources.
In a specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, the program may include some or all of the steps in each embodiment of the personalized teaching resource recommendation method for large-scale users provided by the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The same and similar parts among the various embodiments in this specification may be referred to each other. The above-described embodiments of the present invention do not limit the scope of the present invention.

Claims (1)

1. A large-scale user-oriented personalized teaching resource recommendation method is characterized by comprising the following steps:
step 1, acquiring user interaction data, and performing data preprocessing on the user interaction data to acquire a user resource scoring matrix;
step 2, performing feature dimensionality reduction on the user resource scoring matrix to obtain a teaching resource feature matrix of the user;
step 3, clustering the teaching resource feature matrix to obtain teaching resource clusters, and sequencing the teaching resources in the teaching resource clusters;
step 4, obtaining scores of the user on all teaching resources, calculating interest degrees of the user on all teaching resources by sequentially using a teaching resource interest degree model, and generating a teaching resource recommendation list by arranging all teaching resources in a descending order according to the interest degrees;
the step 1 comprises the following steps:
step 101, collecting a teaching resource scoring data set, and loading the teaching resource scoring data set into a data warehouse for storage, wherein the teaching resource scoring data set comprises teaching resources, scoring data and user information corresponding to the scoring data;
102, analyzing the teaching resources in the teaching resource grading data set, searching and deleting teaching resources and users with abnormal grading data, wherein the teaching resources and users with abnormal grading data comprise: the scoring data beyond the range and the corresponding users thereof, and the malicious scoring users and the corresponding scoring data thereof;
103, extracting a user ID, a resource ID and a grading characteristic value, and constructing a user resource grading matrix;
the step 2 includes:
step 201, mapping a user resource scoring matrix to a low-dimensional potential factor space by using an alternating least square method based on a Spark big data analysis platform;
step 202, minimizing a first objective function to calculate and obtain a user characteristic matrix and a teaching resource characteristic matrix, wherein the first objective function, namely a square error loss function, is represented as:
Figure FDA0003985579830000011
wherein r is ui For indicatingScoring of teaching resource i by user u, and scoring r ui Has a value range of [0,5 ]]Integer of (b), p u Feature vector, q, representing user u i The characteristic vector of the teaching resource i is represented, lambda represents a regularization parameter, and the value range of the regularization parameter lambda is [0,1 ]],
Figure FDA0003985579830000024
Representing the interaction between the user u and the teaching resource i;
the step 3 comprises the following steps:
step 301, clustering teaching resources similar to the resource features on the teaching resource feature matrix by using a K-Means clustering algorithm based on a Spark big data analysis platform, and obtaining teaching resource clusters by minimizing a second objective function, namely a square error function, which is expressed as:
Figure FDA0003985579830000021
wherein k represents the number of clusters, b represents the number of clusters, n represents the number of data points in cluster b, a represents the number of data points, and x a A value representing data point a in cluster b, c b A value representing the center of cluster b of clusters,
Figure FDA0003985579830000022
is x a And c b The euclidean distance therebetween;
step 302, retrieving the teaching resources in each teaching resource cluster, sequencing the teaching resources from small to large according to the distance between the teaching resources and a cluster center, and storing the sequencing result in a data warehouse;
the step 4 comprises the following steps:
step 401, obtaining teaching resource clusters to which historical scoring resources belong according to the historical scoring resources of users, wherein the historical scoring resources are teaching resources with user scoring data;
step 402, obtaining the first K teaching resources which can represent the clustering characteristics of the teaching resources in the teaching resource cluster to which the history scoring resources belong, and calculating the interestingness of the K teaching resources by using a teaching resource interestingness model, wherein 0-plus-K-plus-N and N represent the number of the teaching resources in the teaching resource cluster;
step 403, calculating by using a teaching resource interest degree model according to all historical scoring resources of the user to obtain the interest degree of the user to all teaching resources;
the teaching resource interestingness model is expressed as:
Figure FDA0003985579830000023
wherein p is ud Representing the interest degree of the user u in the teaching resource d, N (u) representing a teaching resource set evaluated by the user u, S (d) representing a resource cluster set to which the teaching resource d belongs, c representing a teaching resource evaluated by the user u and in the resource cluster set to which the teaching resource d belongs, w d Representing the degree of feature matching, r, of the teaching resource d uc Representing the user u's rating of the educational resource c.
CN201911212608.XA 2019-12-02 2019-12-02 Large-scale user-oriented personalized teaching resource recommendation method Active CN110929161B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911212608.XA CN110929161B (en) 2019-12-02 2019-12-02 Large-scale user-oriented personalized teaching resource recommendation method
PCT/CN2020/090567 WO2021109464A1 (en) 2019-12-02 2020-05-15 Personalized teaching resource recommendation method for large-scale users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911212608.XA CN110929161B (en) 2019-12-02 2019-12-02 Large-scale user-oriented personalized teaching resource recommendation method

Publications (2)

Publication Number Publication Date
CN110929161A CN110929161A (en) 2020-03-27
CN110929161B true CN110929161B (en) 2023-04-07

Family

ID=69848115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911212608.XA Active CN110929161B (en) 2019-12-02 2019-12-02 Large-scale user-oriented personalized teaching resource recommendation method

Country Status (2)

Country Link
CN (1) CN110929161B (en)
WO (1) WO2021109464A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929161B (en) * 2019-12-02 2023-04-07 南京莱斯网信技术研究院有限公司 Large-scale user-oriented personalized teaching resource recommendation method
CN111460145A (en) * 2020-03-18 2020-07-28 天闻数媒科技(北京)有限公司 Learning resource recommendation method, device and storage medium
CN111931043B (en) * 2020-07-23 2023-09-29 重庆邮电大学 Recommending method and system for science and technology resources
CN112732867B (en) * 2020-12-29 2024-03-15 广州视源电子科技股份有限公司 File processing method and device
CN112650948B (en) * 2020-12-30 2022-04-29 华中师范大学 Information network construction method, system and application for education informatization evaluation
CN112749342A (en) * 2021-01-20 2021-05-04 北京工业大学 Personalized recommendation method for network education and teaching resources
CN113672809A (en) * 2021-08-18 2021-11-19 广州创显科教股份有限公司 Intelligent learning guiding method and system based on personalized recommendation algorithm
CN116401567B (en) * 2023-06-02 2023-09-08 支付宝(杭州)信息技术有限公司 Clustering model training, user clustering and information pushing method and device
CN116628339B (en) * 2023-06-09 2023-11-17 国信蓝桥教育科技股份有限公司 Educational resource recommendation method and system based on artificial intelligence
CN117575745B (en) * 2024-01-17 2024-04-30 山东正禾大教育科技有限公司 Course teaching resource individual recommendation method based on AI big data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528693A (en) * 2016-10-25 2017-03-22 广东科海信息科技股份有限公司 Individualized learning-oriented educational resource recommendation method and system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140143012A1 (en) * 2012-11-21 2014-05-22 Insightera Ltd. Method and system for predictive marketing campigns based on users online behavior and profile
CN103049865A (en) * 2012-12-17 2013-04-17 中国农业大学 Method and system for initiatively recommending product information service
EP2770472A1 (en) * 2013-02-25 2014-08-27 Thomson Licensing Method and system for item recommendation
CN106570653B (en) * 2016-11-10 2018-08-07 国网山东省电力公司济南供电公司 Distribution repairing work order distributes support system and optimization method
CN106339829B (en) * 2016-11-10 2018-09-21 国网山东省电力公司济南供电公司 The power distribution network that technology is moved based on great Yun objects actively repairs overall view monitoring system
CN106919699A (en) * 2017-03-09 2017-07-04 华北电力大学 A kind of recommendation method for personalized information towards large-scale consumer
CN108491547A (en) * 2018-04-04 2018-09-04 深圳明创自控技术有限公司 A kind of internet teaching system based on big data
CN109241405B (en) * 2018-08-13 2021-11-23 华中师范大学 Learning resource collaborative filtering recommendation method and system based on knowledge association
CN110929161B (en) * 2019-12-02 2023-04-07 南京莱斯网信技术研究院有限公司 Large-scale user-oriented personalized teaching resource recommendation method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528693A (en) * 2016-10-25 2017-03-22 广东科海信息科技股份有限公司 Individualized learning-oriented educational resource recommendation method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Qingmei Zhou 等.Spectral Clustering-based Matrix Completion Method for Top-n Recommendation.《ICCDE" 19: Proceedings of the 2019 5th International Conference on Computing and Data Engineering》.2019,1-6. *
Tao Li 等.Hybrid Recommendation Algorithm Based on Hamming Clustering for User"s Access Log and Weighted User Behavior.《2018 15th International Conference on Service Systems and Service Management (ICSSSM)》.2018,1-7. *

Also Published As

Publication number Publication date
WO2021109464A1 (en) 2021-06-10
CN110929161A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN110929161B (en) Large-scale user-oriented personalized teaching resource recommendation method
CN108509551B (en) Microblog network key user mining system and method based on Spark environment
CN110674407B (en) Hybrid recommendation method based on graph convolution neural network
Chen et al. General functional matrix factorization using gradient boosting
CN111382283B (en) Resource category label labeling method and device, computer equipment and storage medium
Wang et al. Cooperative bi-path metric for few-shot learning
CN112380453B (en) Article recommendation method and device, storage medium and equipment
CN113806630B (en) Attention-based multi-view feature fusion cross-domain recommendation method and device
Cong Personalized recommendation of film and television culture based on an intelligent classification algorithm
CN110795613B (en) Commodity searching method, device and system and electronic equipment
Zhong et al. Design of a personalized recommendation system for learning resources based on collaborative filtering
Yu et al. Deep metric learning with dynamic margin hard sampling loss for face verification
CN108154380A (en) The method for carrying out the online real-time recommendation of commodity to user based on extensive score data
CN113887698A (en) Overall knowledge distillation method and system based on graph neural network
Yu et al. The personalized recommendation algorithms in educational application
CN109885758A (en) A kind of recommended method of the novel random walk based on bigraph (bipartite graph)
An Data mining analysis method of consumer behaviour characteristics based on social media big data
Yang et al. [Retracted] A Classification Technique for English Teaching Resources and Merging Using Swarm Intelligence Algorithm
Zhao et al. EDense: a convolutional neural network with ELM-based dense connections
Wang et al. Personalized recommendation method of ideological and political education resources based on data mining
Long Analysis of preschool education resource sharing based on collaborative filtering algorithm
Tan et al. A Course Recommendation System Based on Collaborative Filtering Algorithm
Yushui et al. K-means clustering algorithm for large-scale Chinese commodity information web based on Hadoop
Wang et al. Research on Personalized Service Path of Learning Resources by Data Driven
de Groof et al. Mining significant terminologies in online social media using parallelized LDA for the promotion of cultural products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant