CN116127192A - Personalized recommendation method based on big data - Google Patents

Personalized recommendation method based on big data Download PDF

Info

Publication number
CN116127192A
CN116127192A CN202211741970.8A CN202211741970A CN116127192A CN 116127192 A CN116127192 A CN 116127192A CN 202211741970 A CN202211741970 A CN 202211741970A CN 116127192 A CN116127192 A CN 116127192A
Authority
CN
China
Prior art keywords
data
user
users
recommendation
total number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211741970.8A
Other languages
Chinese (zh)
Inventor
熊林海
周金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Original Assignee
Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xingzheyi Intelligent Transportation Technology Co ltd filed Critical Nanjing Xingzheyi Intelligent Transportation Technology Co ltd
Priority to CN202211741970.8A priority Critical patent/CN116127192A/en
Publication of CN116127192A publication Critical patent/CN116127192A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a personalized recommendation method based on big data. The top 15 pieces of data with the highest click rate in the elastic search database are recommended for the new user. For old users, linear weighted fusion is carried out on recommendation results of collaborative filtering algorithm based on the users and recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation. The invention can update the database at regular time through the high automation of the web crawler; secondly, the TF-IDF algorithm and the collaborative filtering algorithm are combined, so that the sparse problem of the data can be relieved in a certain colloquial way.

Description

Personalized recommendation method based on big data
Technical Field
The invention relates to the technical field of big data research, in particular to a personalized recommendation method based on big data.
Background
With the rapid development of network technology, the internet has spread throughout the life, and massive data is generated every day. In a huge amount of information, the information which a user wants to acquire is only a very small part of the total amount of information, and the user often seems to be in a no way in the face of the current situation of information overload. Therefore, it is important to be able to obtain the information intended by the user in a highly efficient manner.
The recommendation methods commonly used at present are mainly based on content recommendation, association rule recommendation and collaborative filtering recommendation. For content-based recommendation and collaborative filtering recommendation, sparse problems and new user problems exist, and for association rule-based recommendation, problems such as difficult rule extraction, time consumption, low individuation degree and the like exist.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention combines the elastic search data storage search with a collaborative filtering algorithm based on a user and a collaborative filtering recommendation algorithm based on a project. Recommending the top 15 pieces of data with highest click rate in the elastic search database for the new user; for old users, linear weighted fusion is carried out on recommendation results of a collaborative filtering algorithm based on the users and a recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation. The technical proposal is as follows:
a personalized recommendation method based on big data comprises the following steps:
step 1: utilizing web crawler technology, specifying fields includes: title, release time, administration time, timeliness, text, etc., obtaining data from related websites;
step 2: firstly, processing fields of the crawled data, removing meaningless data, de-duplicating the data, secondly, storing the washed data into json format according to new fields such as titles, data derived areas and the like;
importing data stored in json format into an elastic search database, selecting an ik word segmentation device corresponding to an elastic search version, and creating an index by using the finest granularity ik_max_word; searching using a minimum cut ik_smart;
step 3: constructing a collaborative filtering algorithm based on a user, and constructing a user-data matrix U with m multiplied by n according to historical behavior information of the user on certain data, including searching, commenting and collecting, as follows:
Figure BDA0004033361870000021
wherein m represents the number of users, n represents the total number of data, and if a user has historical behavior on a certain piece of data, scoring is performed, u mn Representing the grading of the user m to the data n, and if no historical behavior exists, assigning 0;
constructing an n x s data-tag matrix C based on tag information of the data, as follows:
Figure BDA0004033361870000022
wherein n represents the total number of data, s represents the total number of tags, c ns Indicating whether the data n contains a label s, if so, assigning 1, otherwise, assigning 0;
from the matrices U and C, a user-tag preference matrix P of size mxs is constructed as follows:
Figure BDA0004033361870000023
wherein m represents the total number of users, s represents the total number of tags, and p ms Representing the preference degree of the user m to the label s;
the user-tag preference matrix P is improved by TF-IDF algorithm, specifically as follows:
Figure BDA0004033361870000024
wherein p is ua Indicating the degree of preference of user u for tag a,
Figure BDA0004033361870000025
label representing user u to label aRecord the number of times>
Figure BDA0004033361870000026
Indicating the total number of times user u marks the tag, +.>
Figure BDA0004033361870000027
Indicating the total number of times the tag is to be used,
Figure BDA0004033361870000028
represents the total number of tags, n ua Indicating the number of users marked with label a, n m Representing the total number of users;
and calculating the similarity between users by using the improved user-label matrix and cosine similarity, wherein the concrete calculation formula is as follows:
Figure BDA0004033361870000029
wherein w is u,v The larger the value, the more similar user u and user v are, and thus the greater the probability of recommending data that user v prefers to user u; n is n u A set of data representing user u preferences, n v A set of data representing user v preferences; n u ∩n v The i represents a set of common preference data for users u and v;
the similarity is arranged in a descending order, the first k users which are most similar to the target user are found, the first k users are represented by a set S (u, k), the preference degree of the target user u to the data i is calculated, and a specific calculation formula is as follows:
Figure BDA0004033361870000031
wherein n is i Representing a set of users who have historic behavior on data i, u vi A score indicating that user v has historical behavioral information for data i;
step 4, constructing a recommendation result of the collaborative filtering algorithm based on the project, wherein the recommendation result is basically consistent with the implementation process of the collaborative filtering algorithm based on the user in the step 3: firstly, constructing a data-user matrix according to historical behavior information of different users on different data; secondly, calculating the similarity between the data by using the cosine similarity; thirdly, obtaining a recommendation result, wherein the data recommended to the target user is data which has no historical behavior of the target user and has relatively higher similarity with the data which has the historical behavior of the target user, and when the recommendation score is higher, the data recommended by the target user is more interested;
step 5: recommending the top 15 pieces of data with highest click rate in the elastic search database for the new user; for old users, linear weighted fusion is carried out on recommendation results of collaborative filtering algorithm based on the users and recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation.
Preferably, in step 1, different crawling objects are selected according to different department attributes of the user, and relevant laws and regulations, news dynamics, microblogs and case data are obtained.
Preferably, the assignment criteria of the user-data matrix U in step 3 are: after searching a certain data, assigning 1; after commenting, assigning a value of 2; after collection, assign 3.
Preferably, in step 3, the top k users most similar to the target user are counted, k selecting 20.
Preferably, the recommending method for the new user in step 5 can also enable the user to select the interested field when the platform design is initialized, and then select the related field data for recommending.
Preferably, step 6 can be added to perform subsequent iterative optimization on the model, and the method is mainly divided into two parts: the first part is the optimization of data, and the database is updated continuously based on a crawler program; the second part is the updating of the user-data matrix, and the user-data matrix and the final recommendation list are continuously updated according to whether the target user has behavior feedback on the recommended data after recommending the data to the user with the behavior information and recording the behavior feedback in the user behavior information log table.
Compared with the prior art, the invention has the beneficial effects that: firstly, the database can be updated regularly through the high automation of the web crawler; secondly, the TF-IDF algorithm and the collaborative filtering algorithm are combined, so that the sparse problem of the data can be relieved in a certain colloquial way.
Drawings
FIG. 1 is a schematic diagram of a model implementation process.
FIG. 2 is a schematic diagram of a model optimization process.
Detailed Description
In order to clarify the technical scheme and working principle of the present invention, the following describes the embodiments of the present disclosure in further detail. Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.
The terms "step 1," "step 2," "step 3," and the like in the description and in the claims are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those described herein.
The embodiment of the disclosure provides a personalized recommendation method based on big data, which comprises the following steps:
step 1: based on the crawler technology, relevant data are acquired.
With web crawler technology, fields are specified, such as: title, release time, administration time, timeliness, text, etc., obtain data from the relevant website.
Preferably, different crawling objects are selected according to different department attributes of the user, and relevant laws and regulations, news dynamics, microblogs and case data are obtained.
Step 2: the data is cleaned and stored in an elastic search database.
And firstly, processing the field of the crawled data, removing meaningless data, de-duplicating the data, secondly, storing the washed data into json format according to new fields such as the title, the data derivative region and the like.
Importing data stored in json format into an elastic search database, selecting an ik word segmentation device corresponding to an elastic search version, and creating an index by using the finest granularity ik_max_word; the search is performed using the least cut ik_smart. Meanwhile, the dictionary can be expanded in a self-defined mode, and the searching accuracy is improved.
Step 3: and constructing a user-tag matrix by utilizing a collaborative filtering algorithm and combining the potential interests of the user and a TF-IDF algorithm.
According to the historical behavior information of a user on certain data, including searching, commenting and collecting, a user-data matrix U with the size of m multiplied by U is constructed as follows:
Figure BDA0004033361870000051
wherein m represents the number of users, n represents the total number of data, and if a user has historical behavior on a certain piece of data, scoring is performed, u mn Representing the score of user m for data n, if no historical behavior has passed, then a value of 0 is assigned.
Preferably, if a certain data is searched, a value of 1 is assigned; after commenting, assigning a value of 2; after collection, assign 3.
Constructing an n x s data-tag matrix C based on tag information of the data, as follows:
Figure BDA0004033361870000052
wherein n represents the total number of data, s represents the total number of tags, c ns Indicating whether the data n contains a tag s, if so, then a value of 1, otherwise, a value of 0.
From the matrices U and C, a user-tag preference matrix P of size mxs is constructed as follows:
Figure BDA0004033361870000053
wherein m represents the total number of users, s represents the total number of tags, and p ms Indicating the preference degree of the user m for the tag s.
The user-tag preference matrix P is improved by TF-IDF algorithm, specifically as follows:
Figure BDA0004033361870000054
wherein p is ua Indicating the degree of preference of user u for tag a,
Figure BDA0004033361870000055
representing the number of marks of user u on tag a, < >>
Figure BDA0004033361870000056
Indicating the total number of times user u marks the tag, +.>
Figure BDA0004033361870000057
Indicating the total number of times the tag is to be used,
Figure BDA0004033361870000058
represents the total number of tags, n ua Indicating the number of users marked with label a, n m Indicating the total number of users.
And calculating the similarity between users by using the improved user-label matrix and cosine similarity, wherein the concrete calculation formula is as follows:
Figure BDA0004033361870000061
wherein w is u,v The larger the value, the more similar user u and user v are, and thus the greater the probability of recommending data that user v prefers to user u; n is n u A set of data representing user u preferences, n v A set of data representing user v preferences.|n u ∩n v The i indicates the set of common preference data for users u and v.
The similarity is arranged in a descending order, the first k users which are most similar to the target user are found, the first k users are represented by a set S (u, k), the preference degree of the target user u to the data i is calculated, and a specific calculation formula is as follows:
Figure BDA0004033361870000062
wherein n is i Representing a set of users who have historic behavior on data i, u vi The scoring of historical behavior information of the data i by the user v is shown, wherein the scoring comprises searching, commenting and collecting, and the scoring is respectively assigned with 1, 2 and 3.
Preferably, k is selected to be 20.
Step 4, the recommended result of the collaborative filtering algorithm based on the project can be basically consistent with the implementation process of the collaborative filtering algorithm based on the user. Firstly, constructing a data-user matrix according to historical behavior information of different users on different data; secondly, calculating the similarity between the data by using the cosine similarity; and thirdly, obtaining a recommendation result, wherein the data recommended to the target user is data which has no historical behavior of the target user and has relatively higher similarity with the data which has the historical behavior of the target user, and when the recommendation score is higher, the data recommended by the target user is more interested.
And finally, carrying out linear weighted fusion on recommendation results of a collaborative filtering algorithm based on the user and a recommendation algorithm based on the item, and selecting data of the top 10 pieces of ranking to recommend.
Step 5: different recommendation algorithms are employed depending on the nature of the user.
The top 15 pieces of data with the highest click rate in the elastic search database can also be recommended to the new user, and a label can be set when the platform design is initialized, for example: the user selects the interested field, and then the related field data is selected for recommendation. For old users, linear weighted fusion is carried out on recommendation results of collaborative filtering algorithm based on the users and recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation.
Step 6: iterative optimization of the model.
Iterative optimization for models is largely divided into two parts. The first part is the optimization of data, and the database is updated continuously based on a crawler program; the second part is the updating of the user-data matrix, and particularly as shown in fig. 2, according to whether the target user has behavior feedback on the recommended data after recommending the data to the user with behavior information, and records the behavior feedback in the user behavior information log table, so as to continuously update the user-data matrix and the final recommendation list.
While the invention has been described above by way of example, it is evident that the invention is not limited to the particular embodiments described above, but rather, it is intended to provide various insubstantial modifications, both as to the method concepts and technical solutions of the invention; or the above conception and technical scheme of the invention are directly applied to other occasions without improvement and equivalent replacement, and all are within the protection scope of the invention.

Claims (6)

1. The personalized recommendation method based on the big data is characterized by comprising the following steps:
step 1: utilizing web crawler technology, specifying fields includes: title, release time, administration time, timeliness, text, etc., obtaining data from related websites;
step 2: firstly, processing fields of the crawled data, removing meaningless data, de-duplicating the data, secondly, storing the washed data into json format according to new fields such as titles, data derived areas and the like;
importing data stored in json format into an elastic search database, selecting an ik word segmentation device corresponding to an elastic search version, and creating an index by using the finest granularity ik_max_word; searching using a minimum cut ik_smart;
step 3: constructing a collaborative filtering algorithm based on a user, and constructing a user-data matrix U with m multiplied by n according to historical behavior information of the user on certain data, including searching, commenting and collecting, as follows:
Figure FDA0004033361860000011
wherein m represents the number of users, n represents the total number of data, and if a user has historical behavior on a certain piece of data, scoring is performed, u mn Representing the grading of the user m to the data n, and if no historical behavior exists, assigning 0;
constructing an n x s data-tag matrix C based on tag information of the data, as follows:
Figure FDA0004033361860000012
wherein n represents the total number of data, s represents the total number of tags, c ns Indicating whether the data n contains a label s, if so, assigning 1, otherwise, assigning 0;
from the matrices U and C, a user-tag preference matrix P of size mxs is constructed as follows:
Figure FDA0004033361860000013
wherein m represents the total number of users, s represents the total number of tags, and p ms Representing the preference degree of the user m to the label s;
the user-tag preference matrix P is improved by TF-IDF algorithm, specifically as follows:
Figure FDA0004033361860000021
wherein p is ua Indicating the degree of preference of user u for tag a,
Figure FDA0004033361860000022
indicating the number of labels a marked by user u,
Figure FDA0004033361860000023
indicating the total number of times user u marks the tag, +.>
Figure FDA0004033361860000024
Representing the total number of tags, +.>
Figure FDA0004033361860000025
Represents the total number of tags, n ua Indicating the number of users marked with label a, n m Representing the total number of users;
and calculating the similarity between users by using the improved user-label matrix and cosine similarity, wherein the concrete calculation formula is as follows:
Figure FDA0004033361860000026
/>
wherein w is u,v The larger the value, the more similar user u and user v are, and thus the greater the probability of recommending data that user v prefers to user u; n is n u A set of data representing user u preferences, n v A set of data representing user v preferences; n u ∩n v The i represents a set of common preference data for users u and v;
the similarity is arranged in a descending order, the first k users which are most similar to the target user are found, the first k users are represented by a set S (u, k), the preference degree of the target user u to the data i is calculated, and a specific calculation formula is as follows:
Figure FDA0004033361860000027
wherein n is i Representing a set of users who have historic behavior on data i, u vi Indicating that user v has historical behavior credit for data iGrading the rest;
step 4, constructing a recommendation result of the collaborative filtering algorithm based on the project, wherein the recommendation result is basically consistent with the implementation process of the collaborative filtering algorithm based on the user in the step 3: firstly, constructing a data-user matrix according to historical behavior information of different users on different data; secondly, calculating the similarity between the data by using the cosine similarity; thirdly, obtaining a recommendation result, wherein the data recommended to the target user is data which has no historical behavior of the target user and has relatively higher similarity with the data which has the historical behavior of the target user, and when the recommendation score is higher, the data recommended by the target user is more interested;
step 5: recommending the top 15 pieces of data with highest click rate in the elastic search database for the new user; for old users, linear weighted fusion is carried out on recommendation results of collaborative filtering algorithm based on the users and recommendation algorithm based on the items, and the top 10 pieces of data are selected for recommendation.
2. The personalized recommendation method based on big data according to claim 1, wherein in step 1, different crawling objects are selected according to different department attributes of the user, and relevant laws and regulations, news dynamics, microblog and case data are obtained.
3. The personalized recommendation method based on big data according to claim 1, wherein the assignment criteria of the user-data matrix U in step 3 is: after searching a certain data, assigning 1; after commenting, assigning a value of 2; after collection, assign 3.
4. A personalized recommendation method based on big data according to claim 3, wherein the top k users, k choices 20, most similar to the target user are counted in step 3.
5. The personalized recommendation method based on big data according to claim 1, wherein the recommendation method for new users in step 5 can also let users select the field of interest when the platform design is initialized, and then select the related field data for recommendation.
6. The personalized recommendation method based on big data according to claim 1, wherein step 6 is added to perform subsequent iterative optimization on the model, and the method is mainly divided into two parts: the first part is the optimization of data, and the database is updated continuously based on a crawler program; the second part is the updating of the user-data matrix, and the user-data matrix and the final recommendation list are continuously updated according to whether the target user has behavior feedback on the recommended data after recommending the data to the user with the behavior information and recording the behavior feedback in the user behavior information log table.
CN202211741970.8A 2022-12-31 2022-12-31 Personalized recommendation method based on big data Pending CN116127192A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211741970.8A CN116127192A (en) 2022-12-31 2022-12-31 Personalized recommendation method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211741970.8A CN116127192A (en) 2022-12-31 2022-12-31 Personalized recommendation method based on big data

Publications (1)

Publication Number Publication Date
CN116127192A true CN116127192A (en) 2023-05-16

Family

ID=86305911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211741970.8A Pending CN116127192A (en) 2022-12-31 2022-12-31 Personalized recommendation method based on big data

Country Status (1)

Country Link
CN (1) CN116127192A (en)

Similar Documents

Publication Publication Date Title
US11580104B2 (en) Method, apparatus, device, and storage medium for intention recommendation
CN111008265B (en) Enterprise information searching method and device
JP4637969B1 (en) Properly understand the intent of web pages and user preferences, and recommend the best information in real time
US9317613B2 (en) Large scale entity-specific resource classification
CN104035927B (en) Search method and system based on user behaviors
JP5721818B2 (en) Use of model information group in search
CN109522480B (en) Information recommendation method and device, electronic equipment and storage medium
US8271495B1 (en) System and method for automating categorization and aggregation of content from network sites
US20170154116A1 (en) Method and system for recommending contents based on social network
CN101189608A (en) Systems and methods for analyzing a user&#39;s Web history
CN104036038A (en) News recommendation method and system
CN101727454A (en) Method for automatic classification of objects and system
CN105930469A (en) Hadoop-based individualized tourism recommendation system and method
CN103400286A (en) Recommendation system and method for user-behavior-based article characteristic marking
US9858332B1 (en) Extracting and leveraging knowledge from unstructured data
CN104063476A (en) Social network-based content recommending method and system
KR101355945B1 (en) On line context aware advertising apparatus and method
CN105378730A (en) Social media content analysis and output
US20180139296A1 (en) Method of producing browsing attributes of users, and non-transitory computer-readable storage medium
US20160299951A1 (en) Processing a search query and retrieving targeted records from a networked database system
CN112818230A (en) Content recommendation method and device, electronic equipment and storage medium
CN114329207A (en) Multi-service information sequencing system, method, storage medium and electronic equipment
CN112732995A (en) Animal husbandry news information recommendation system
CN114090877A (en) Position information recommendation method and device, electronic equipment and storage medium
Nawazish et al. Integrating “Random Forest” with Indexing and Query Processing for Personalized Search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination