CN116127192A - Personalized recommendation method based on big data - Google Patents
Personalized recommendation method based on big data Download PDFInfo
- Publication number
- CN116127192A CN116127192A CN202211741970.8A CN202211741970A CN116127192A CN 116127192 A CN116127192 A CN 116127192A CN 202211741970 A CN202211741970 A CN 202211741970A CN 116127192 A CN116127192 A CN 116127192A
- Authority
- CN
- China
- Prior art keywords
- data
- user
- users
- recommendation
- total number
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a personalized recommendation method based on big data. The top 15 pieces of data with the highest click rate in the elastic search database are recommended for the new user. For old users, linear weighted fusion is carried out on recommendation results of collaborative filtering algorithm based on the users and recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation. The invention can update the database at regular time through the high automation of the web crawler; secondly, the TF-IDF algorithm and the collaborative filtering algorithm are combined, so that the sparse problem of the data can be relieved in a certain colloquial way.
Description
Technical Field
The invention relates to the technical field of big data research, in particular to a personalized recommendation method based on big data.
Background
With the rapid development of network technology, the internet has spread throughout the life, and massive data is generated every day. In a huge amount of information, the information which a user wants to acquire is only a very small part of the total amount of information, and the user often seems to be in a no way in the face of the current situation of information overload. Therefore, it is important to be able to obtain the information intended by the user in a highly efficient manner.
The recommendation methods commonly used at present are mainly based on content recommendation, association rule recommendation and collaborative filtering recommendation. For content-based recommendation and collaborative filtering recommendation, sparse problems and new user problems exist, and for association rule-based recommendation, problems such as difficult rule extraction, time consumption, low individuation degree and the like exist.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention combines the elastic search data storage search with a collaborative filtering algorithm based on a user and a collaborative filtering recommendation algorithm based on a project. Recommending the top 15 pieces of data with highest click rate in the elastic search database for the new user; for old users, linear weighted fusion is carried out on recommendation results of a collaborative filtering algorithm based on the users and a recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation. The technical proposal is as follows:
a personalized recommendation method based on big data comprises the following steps:
step 1: utilizing web crawler technology, specifying fields includes: title, release time, administration time, timeliness, text, etc., obtaining data from related websites;
step 2: firstly, processing fields of the crawled data, removing meaningless data, de-duplicating the data, secondly, storing the washed data into json format according to new fields such as titles, data derived areas and the like;
importing data stored in json format into an elastic search database, selecting an ik word segmentation device corresponding to an elastic search version, and creating an index by using the finest granularity ik_max_word; searching using a minimum cut ik_smart;
step 3: constructing a collaborative filtering algorithm based on a user, and constructing a user-data matrix U with m multiplied by n according to historical behavior information of the user on certain data, including searching, commenting and collecting, as follows:
wherein m represents the number of users, n represents the total number of data, and if a user has historical behavior on a certain piece of data, scoring is performed, u mn Representing the grading of the user m to the data n, and if no historical behavior exists, assigning 0;
constructing an n x s data-tag matrix C based on tag information of the data, as follows:
wherein n represents the total number of data, s represents the total number of tags, c ns Indicating whether the data n contains a label s, if so, assigning 1, otherwise, assigning 0;
from the matrices U and C, a user-tag preference matrix P of size mxs is constructed as follows:
wherein m represents the total number of users, s represents the total number of tags, and p ms Representing the preference degree of the user m to the label s;
the user-tag preference matrix P is improved by TF-IDF algorithm, specifically as follows:
wherein p is ua Indicating the degree of preference of user u for tag a,label representing user u to label aRecord the number of times>Indicating the total number of times user u marks the tag, +.>Indicating the total number of times the tag is to be used,represents the total number of tags, n ua Indicating the number of users marked with label a, n m Representing the total number of users;
and calculating the similarity between users by using the improved user-label matrix and cosine similarity, wherein the concrete calculation formula is as follows:
wherein w is u,v The larger the value, the more similar user u and user v are, and thus the greater the probability of recommending data that user v prefers to user u; n is n u A set of data representing user u preferences, n v A set of data representing user v preferences; n u ∩n v The i represents a set of common preference data for users u and v;
the similarity is arranged in a descending order, the first k users which are most similar to the target user are found, the first k users are represented by a set S (u, k), the preference degree of the target user u to the data i is calculated, and a specific calculation formula is as follows:
wherein n is i Representing a set of users who have historic behavior on data i, u vi A score indicating that user v has historical behavioral information for data i;
step 4, constructing a recommendation result of the collaborative filtering algorithm based on the project, wherein the recommendation result is basically consistent with the implementation process of the collaborative filtering algorithm based on the user in the step 3: firstly, constructing a data-user matrix according to historical behavior information of different users on different data; secondly, calculating the similarity between the data by using the cosine similarity; thirdly, obtaining a recommendation result, wherein the data recommended to the target user is data which has no historical behavior of the target user and has relatively higher similarity with the data which has the historical behavior of the target user, and when the recommendation score is higher, the data recommended by the target user is more interested;
step 5: recommending the top 15 pieces of data with highest click rate in the elastic search database for the new user; for old users, linear weighted fusion is carried out on recommendation results of collaborative filtering algorithm based on the users and recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation.
Preferably, in step 1, different crawling objects are selected according to different department attributes of the user, and relevant laws and regulations, news dynamics, microblogs and case data are obtained.
Preferably, the assignment criteria of the user-data matrix U in step 3 are: after searching a certain data, assigning 1; after commenting, assigning a value of 2; after collection, assign 3.
Preferably, in step 3, the top k users most similar to the target user are counted, k selecting 20.
Preferably, the recommending method for the new user in step 5 can also enable the user to select the interested field when the platform design is initialized, and then select the related field data for recommending.
Preferably, step 6 can be added to perform subsequent iterative optimization on the model, and the method is mainly divided into two parts: the first part is the optimization of data, and the database is updated continuously based on a crawler program; the second part is the updating of the user-data matrix, and the user-data matrix and the final recommendation list are continuously updated according to whether the target user has behavior feedback on the recommended data after recommending the data to the user with the behavior information and recording the behavior feedback in the user behavior information log table.
Compared with the prior art, the invention has the beneficial effects that: firstly, the database can be updated regularly through the high automation of the web crawler; secondly, the TF-IDF algorithm and the collaborative filtering algorithm are combined, so that the sparse problem of the data can be relieved in a certain colloquial way.
Drawings
FIG. 1 is a schematic diagram of a model implementation process.
FIG. 2 is a schematic diagram of a model optimization process.
Detailed Description
In order to clarify the technical scheme and working principle of the present invention, the following describes the embodiments of the present disclosure in further detail. Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.
The terms "step 1," "step 2," "step 3," and the like in the description and in the claims are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those described herein.
The embodiment of the disclosure provides a personalized recommendation method based on big data, which comprises the following steps:
step 1: based on the crawler technology, relevant data are acquired.
With web crawler technology, fields are specified, such as: title, release time, administration time, timeliness, text, etc., obtain data from the relevant website.
Preferably, different crawling objects are selected according to different department attributes of the user, and relevant laws and regulations, news dynamics, microblogs and case data are obtained.
Step 2: the data is cleaned and stored in an elastic search database.
And firstly, processing the field of the crawled data, removing meaningless data, de-duplicating the data, secondly, storing the washed data into json format according to new fields such as the title, the data derivative region and the like.
Importing data stored in json format into an elastic search database, selecting an ik word segmentation device corresponding to an elastic search version, and creating an index by using the finest granularity ik_max_word; the search is performed using the least cut ik_smart. Meanwhile, the dictionary can be expanded in a self-defined mode, and the searching accuracy is improved.
Step 3: and constructing a user-tag matrix by utilizing a collaborative filtering algorithm and combining the potential interests of the user and a TF-IDF algorithm.
According to the historical behavior information of a user on certain data, including searching, commenting and collecting, a user-data matrix U with the size of m multiplied by U is constructed as follows:
wherein m represents the number of users, n represents the total number of data, and if a user has historical behavior on a certain piece of data, scoring is performed, u mn Representing the score of user m for data n, if no historical behavior has passed, then a value of 0 is assigned.
Preferably, if a certain data is searched, a value of 1 is assigned; after commenting, assigning a value of 2; after collection, assign 3.
Constructing an n x s data-tag matrix C based on tag information of the data, as follows:
wherein n represents the total number of data, s represents the total number of tags, c ns Indicating whether the data n contains a tag s, if so, then a value of 1, otherwise, a value of 0.
From the matrices U and C, a user-tag preference matrix P of size mxs is constructed as follows:
wherein m represents the total number of users, s represents the total number of tags, and p ms Indicating the preference degree of the user m for the tag s.
The user-tag preference matrix P is improved by TF-IDF algorithm, specifically as follows:
wherein p is ua Indicating the degree of preference of user u for tag a,representing the number of marks of user u on tag a, < >>Indicating the total number of times user u marks the tag, +.>Indicating the total number of times the tag is to be used,represents the total number of tags, n ua Indicating the number of users marked with label a, n m Indicating the total number of users.
And calculating the similarity between users by using the improved user-label matrix and cosine similarity, wherein the concrete calculation formula is as follows:
wherein w is u,v The larger the value, the more similar user u and user v are, and thus the greater the probability of recommending data that user v prefers to user u; n is n u A set of data representing user u preferences, n v A set of data representing user v preferences.|n u ∩n v The i indicates the set of common preference data for users u and v.
The similarity is arranged in a descending order, the first k users which are most similar to the target user are found, the first k users are represented by a set S (u, k), the preference degree of the target user u to the data i is calculated, and a specific calculation formula is as follows:
wherein n is i Representing a set of users who have historic behavior on data i, u vi The scoring of historical behavior information of the data i by the user v is shown, wherein the scoring comprises searching, commenting and collecting, and the scoring is respectively assigned with 1, 2 and 3.
Preferably, k is selected to be 20.
Step 4, the recommended result of the collaborative filtering algorithm based on the project can be basically consistent with the implementation process of the collaborative filtering algorithm based on the user. Firstly, constructing a data-user matrix according to historical behavior information of different users on different data; secondly, calculating the similarity between the data by using the cosine similarity; and thirdly, obtaining a recommendation result, wherein the data recommended to the target user is data which has no historical behavior of the target user and has relatively higher similarity with the data which has the historical behavior of the target user, and when the recommendation score is higher, the data recommended by the target user is more interested.
And finally, carrying out linear weighted fusion on recommendation results of a collaborative filtering algorithm based on the user and a recommendation algorithm based on the item, and selecting data of the top 10 pieces of ranking to recommend.
Step 5: different recommendation algorithms are employed depending on the nature of the user.
The top 15 pieces of data with the highest click rate in the elastic search database can also be recommended to the new user, and a label can be set when the platform design is initialized, for example: the user selects the interested field, and then the related field data is selected for recommendation. For old users, linear weighted fusion is carried out on recommendation results of collaborative filtering algorithm based on the users and recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation.
Step 6: iterative optimization of the model.
Iterative optimization for models is largely divided into two parts. The first part is the optimization of data, and the database is updated continuously based on a crawler program; the second part is the updating of the user-data matrix, and particularly as shown in fig. 2, according to whether the target user has behavior feedback on the recommended data after recommending the data to the user with behavior information, and records the behavior feedback in the user behavior information log table, so as to continuously update the user-data matrix and the final recommendation list.
While the invention has been described above by way of example, it is evident that the invention is not limited to the particular embodiments described above, but rather, it is intended to provide various insubstantial modifications, both as to the method concepts and technical solutions of the invention; or the above conception and technical scheme of the invention are directly applied to other occasions without improvement and equivalent replacement, and all are within the protection scope of the invention.
Claims (6)
1. The personalized recommendation method based on the big data is characterized by comprising the following steps:
step 1: utilizing web crawler technology, specifying fields includes: title, release time, administration time, timeliness, text, etc., obtaining data from related websites;
step 2: firstly, processing fields of the crawled data, removing meaningless data, de-duplicating the data, secondly, storing the washed data into json format according to new fields such as titles, data derived areas and the like;
importing data stored in json format into an elastic search database, selecting an ik word segmentation device corresponding to an elastic search version, and creating an index by using the finest granularity ik_max_word; searching using a minimum cut ik_smart;
step 3: constructing a collaborative filtering algorithm based on a user, and constructing a user-data matrix U with m multiplied by n according to historical behavior information of the user on certain data, including searching, commenting and collecting, as follows:
wherein m represents the number of users, n represents the total number of data, and if a user has historical behavior on a certain piece of data, scoring is performed, u mn Representing the grading of the user m to the data n, and if no historical behavior exists, assigning 0;
constructing an n x s data-tag matrix C based on tag information of the data, as follows:
wherein n represents the total number of data, s represents the total number of tags, c ns Indicating whether the data n contains a label s, if so, assigning 1, otherwise, assigning 0;
from the matrices U and C, a user-tag preference matrix P of size mxs is constructed as follows:
wherein m represents the total number of users, s represents the total number of tags, and p ms Representing the preference degree of the user m to the label s;
the user-tag preference matrix P is improved by TF-IDF algorithm, specifically as follows:
wherein p is ua Indicating the degree of preference of user u for tag a,indicating the number of labels a marked by user u,indicating the total number of times user u marks the tag, +.>Representing the total number of tags, +.>Represents the total number of tags, n ua Indicating the number of users marked with label a, n m Representing the total number of users;
and calculating the similarity between users by using the improved user-label matrix and cosine similarity, wherein the concrete calculation formula is as follows:
wherein w is u,v The larger the value, the more similar user u and user v are, and thus the greater the probability of recommending data that user v prefers to user u; n is n u A set of data representing user u preferences, n v A set of data representing user v preferences; n u ∩n v The i represents a set of common preference data for users u and v;
the similarity is arranged in a descending order, the first k users which are most similar to the target user are found, the first k users are represented by a set S (u, k), the preference degree of the target user u to the data i is calculated, and a specific calculation formula is as follows:
wherein n is i Representing a set of users who have historic behavior on data i, u vi Indicating that user v has historical behavior credit for data iGrading the rest;
step 4, constructing a recommendation result of the collaborative filtering algorithm based on the project, wherein the recommendation result is basically consistent with the implementation process of the collaborative filtering algorithm based on the user in the step 3: firstly, constructing a data-user matrix according to historical behavior information of different users on different data; secondly, calculating the similarity between the data by using the cosine similarity; thirdly, obtaining a recommendation result, wherein the data recommended to the target user is data which has no historical behavior of the target user and has relatively higher similarity with the data which has the historical behavior of the target user, and when the recommendation score is higher, the data recommended by the target user is more interested;
step 5: recommending the top 15 pieces of data with highest click rate in the elastic search database for the new user; for old users, linear weighted fusion is carried out on recommendation results of collaborative filtering algorithm based on the users and recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation.
2. The personalized recommendation method based on big data according to claim 1, wherein in step 1, different crawling objects are selected according to different department attributes of the user, and relevant laws and regulations, news dynamics, microblog and case data are obtained.
3. The personalized recommendation method based on big data according to claim 1, wherein the assignment criteria of the user-data matrix U in step 3 is: after searching a certain data, assigning 1; after commenting, assigning a value of 2; after collection, assign 3.
4. A personalized recommendation method based on big data according to claim 3, wherein the top k users, k choices 20, most similar to the target user are counted in step 3.
5. The personalized recommendation method based on big data according to claim 1, wherein the recommendation method for new users in step 5 can also let users select the field of interest when the platform design is initialized, and then select the related field data for recommendation.
6. The personalized recommendation method based on big data according to claim 1, wherein step 6 is added to perform subsequent iterative optimization on the model, and the method is mainly divided into two parts: the first part is the optimization of data, and the database is updated continuously based on a crawler program; the second part is the updating of the user-data matrix, and the user-data matrix and the final recommendation list are continuously updated according to whether the target user has behavior feedback on the recommended data after recommending the data to the user with the behavior information and recording the behavior feedback in the user behavior information log table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211741970.8A CN116127192A (en) | 2022-12-31 | 2022-12-31 | Personalized recommendation method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211741970.8A CN116127192A (en) | 2022-12-31 | 2022-12-31 | Personalized recommendation method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116127192A true CN116127192A (en) | 2023-05-16 |
Family
ID=86305911
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211741970.8A Pending CN116127192A (en) | 2022-12-31 | 2022-12-31 | Personalized recommendation method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116127192A (en) |
-
2022
- 2022-12-31 CN CN202211741970.8A patent/CN116127192A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11580104B2 (en) | Method, apparatus, device, and storage medium for intention recommendation | |
CN111008265B (en) | Enterprise information searching method and device | |
JP4637969B1 (en) | Properly understand the intent of web pages and user preferences, and recommend the best information in real time | |
US9317613B2 (en) | Large scale entity-specific resource classification | |
CN104035927B (en) | Search method and system based on user behaviors | |
JP5721818B2 (en) | Use of model information group in search | |
CN109522480B (en) | Information recommendation method and device, electronic equipment and storage medium | |
US8271495B1 (en) | System and method for automating categorization and aggregation of content from network sites | |
US20170154116A1 (en) | Method and system for recommending contents based on social network | |
CN101189608A (en) | Systems and methods for analyzing a user's Web history | |
CN104036038A (en) | News recommendation method and system | |
CN101727454A (en) | Method for automatic classification of objects and system | |
CN105930469A (en) | Hadoop-based individualized tourism recommendation system and method | |
CN103400286A (en) | Recommendation system and method for user-behavior-based article characteristic marking | |
US9858332B1 (en) | Extracting and leveraging knowledge from unstructured data | |
CN104063476A (en) | Social network-based content recommending method and system | |
KR101355945B1 (en) | On line context aware advertising apparatus and method | |
CN105378730A (en) | Social media content analysis and output | |
US20180139296A1 (en) | Method of producing browsing attributes of users, and non-transitory computer-readable storage medium | |
US20160299951A1 (en) | Processing a search query and retrieving targeted records from a networked database system | |
CN112818230A (en) | Content recommendation method and device, electronic equipment and storage medium | |
CN114329207A (en) | Multi-service information sequencing system, method, storage medium and electronic equipment | |
CN112732995A (en) | Animal husbandry news information recommendation system | |
CN114090877A (en) | Position information recommendation method and device, electronic equipment and storage medium | |
Nawazish et al. | Integrating “Random Forest” with Indexing and Query Processing for Personalized Search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |