CN107562912B - Sina microblog event recommendation method - Google Patents

Sina microblog event recommendation method Download PDF

Info

Publication number
CN107562912B
CN107562912B CN201710816042.6A CN201710816042A CN107562912B CN 107562912 B CN107562912 B CN 107562912B CN 201710816042 A CN201710816042 A CN 201710816042A CN 107562912 B CN107562912 B CN 107562912B
Authority
CN
China
Prior art keywords
event
user
model
user model
events
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710816042.6A
Other languages
Chinese (zh)
Other versions
CN107562912A (en
Inventor
于富财
刘�东
胡光岷
费高雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201710816042.6A priority Critical patent/CN107562912B/en
Publication of CN107562912A publication Critical patent/CN107562912A/en
Application granted granted Critical
Publication of CN107562912B publication Critical patent/CN107562912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for recommending a Sina microblog event, which aims at the problem that the accuracy of the current social short text recommendation algorithm is not high, and calculates the similarity between a user model and an event vector through an improved cosine included angle algorithm; if the similarity is higher than the set threshold, pushing the event to the user; updating the user model through the newly arrived time in the latest period of time, so that the user model can track the latest development state of the event; updating the user model again by combining with the praise behavior of the user, so that the recommendation result is more in line with the expectation of the user; according to the method, the Sina microblog events can be recommended with high accuracy, reasonable drifting can be conducted on the model, and feedback of the user recommendation result can be responded in time.

Description

Sina microblog event recommendation method
Technical Field
The invention belongs to the field of data mining, and particularly relates to a social network text recommendation technology.
Background
Microblogs as a novel propagation medium are developed rapidly, have the characteristics of high propagation speed, strong interactivity, convenience in information updating and the like, have started to have great influence on social life, and become one of the main social network propagation media in China. Because people can release information to the outside in various forms such as web and webpage at any time and any place, instant sharing is realized, and more people like sharing information, exchanging opinions and expressing emotion on the microblog. Compared with the traditional media, for a plurality of important news events, the microblog is simple and convenient to operate, and the high-level point of information release can be determined by the low threshold of the microblog. The method is more remarkable in emergency, because any microblog user in the event scene can issue the whole event information to the microblog through the mobile phone. For example, in 11 months in 2009, 4.4-level earthquake occurred in west security, the microblog reported the event only after 1 minute, and the national official website released 15 minutes later for the first time.
But with the popularization of the micro-blogs, new problems are brought. The first problem is information explosion, and mass data information is full of the internet, so that the problem of serious information overload is brought to people. People are faced with the huge amount of information, and often have difficulty in finding the data which the people want, and the people want to quickly and accurately find the data which is most important for the people. Before web2.0, people usually obtained information through professional search engines, but there are some problems, and one of the most main problems is that the search engines need users to actively inquire, cannot actively push information, and has low real-time performance, so that the users are likely to miss important information. Due to the adoption of Web2.0, people can participate in publishing, spreading and filtering of information through a network, so that the purpose of information sharing is achieved. Although the information pushing mode of the directional message source subverts the previous mode of pulling information through the search engine, the information pushing mode also well makes up the current embarrassment of the search engine.
The recommendation system is used as an information acquisition method, starts from a user, researches the preference of the user, can guide the user to find out the potential demand of the user and push the information interested by the user under the condition of fuzzy intention of the user, and the information acquisition mode is a very potential method for solving the information overload problem. The recommendation system has the main task of accurately grasping the interest points of the user and pushing the possibly interested events to the user by using an efficient recommendation algorithm.
The Sina microblog, as the most popular microblog tool in China, has the following characteristics: the number of the Bo-Chinese characters is limited within 140 characters, and the Bo-Chinese characters have large data quantity, short text property, text deficiency, instantaneity and rich social information. Because the microblog data is not fixed in form, and many messages may not contain effective information, which brings great trouble to processing, research on the recommendation system for such short texts is still challenging at present. In order to achieve good recommendation effect, it is very important to develop an efficient recommendation algorithm. Most of the existing recommendation systems are text recommendation systems, research on a short text data recommendation system such as microblog is not deep enough, and the research result cannot meet the actual application requirement.
Disclosure of Invention
In order to solve the technical problem, the application provides a method for recommending the Xinlang microblog events, which is used for correcting a user model in real time, improving the recommendation accuracy of a microblog event recommendation system and improving user experience.
The technical scheme adopted by the invention is as follows: the method for recommending the Sina microblog events comprises the following steps:
s1, calculating the similarity between the user model and the event vector by adopting an improved cosine included angle algorithm, and recommending the event to the user if the similarity is greater than a threshold value; otherwise, not recommending;
s2, updating the user model according to the recommended events arriving at the event database within the latest time length K;
and S3, updating the user model according to the events approved by the user.
Further, the improved cosine included angle algorithm is specifically as follows:
Figure BDA0001405100170000021
wherein, sameWordNum represents the number of the keywords of the user model A and the event model B; min (| a |, | B |) represents the smallest dimension in the user model a and the event model B; w is aaiRepresenting the weight corresponding to the feature word ai in the user model A; w is abjAnd representing the weight corresponding to the characteristic word bj in the event model B.
Further, the user model is extracted from a user database.
Further, the event vector is extracted from an event database.
Further, step S2 is specifically:
s21, when a new recommended event arrives in the event database, extracting the recommended event which arrives within the latest time length K;
s22, selecting the feature words with the weight larger than the first threshold value in the recommended events extracted in the step S21 and adding the feature words into the user model;
and S23, selecting the high-frequency vocabulary in the feature words of the current user model as a new user model.
Further, step S3 is specifically: and when a new event is approved, recording the ID of the approved event, searching the corresponding event from the event database according to the ID, and extracting the high-frequency vocabulary of the event.
The invention has the beneficial effects that: according to the method for recommending the Sina microblog events, the similarity between a user model and event vectors is calculated through an improved cosine included angle algorithm; if the similarity is higher than the set threshold, pushing the event to the user; updating the user model through the newly arrived time in the latest period of time, so that the user model can track the latest development state of the event; updating the user model again by combining with the praise behavior of the user, so that the recommendation result is more in line with the expectation of the user; according to the method, the Sina microblog events can be recommended with high accuracy, reasonable drifting can be conducted on the model, and feedback of the user recommendation result can be responded in time.
Drawings
FIG. 1 is a schematic flow chart of the present application;
FIG. 2 is a model drift workflow;
fig. 3 is a user feedback updating process.
Detailed Description
In order to facilitate the understanding of the technical contents of the present invention by those skilled in the art, the present invention will be further explained with reference to the accompanying drawings.
As shown in fig. 1, a scheme flow chart of the present application is provided, and the technical scheme of the present application is as follows: the method for recommending the Sina microblog events comprises the following steps:
s1, calculating the similarity between the user model and the event vector by adopting an improved cosine included angle algorithm, and recommending the event to the user if the similarity is greater than a threshold value; otherwise, not recommending;
s2, updating the user model according to the recommended events arriving at the event database within the latest time length K;
and S3, updating the user model according to the events approved by the user.
Step S1 specifically includes: the classical cosine angle algorithm formula is as follows:
Figure BDA0001405100170000031
a, B represents a user model vector and an event vector, respectively, and can be expressed as follows:
A={(a1,wa1),(a2,wa2),(a3,wa3),……,(am,wam),}
B={(b1,wb1),(b2,wb2),(b3,wb3),……,(bn,wbn),}
wa1representing the weight corresponding to the characteristic word a1 in the user model A; the B vectors work the same way. Simplifying to obtain:
Figure BDA0001405100170000032
wherein, waiAnd wbjThe condition for multiplication is the feature word ai ═ bj.
But if more words are the same for both vectors, the cosine value is larger. Considering that the dimension of the user model and the event vector may be large, the similarity calculated by simply using the same morphology inevitably causes a problem of low recommendation precision. One of the reasons for this phenomenon is that some feature words with high weight in the event vector may not have the ability to divide the event, such as "china", "usa", etc., while some words with lower weight may be the focus of the event, such as "air crash", "gold prize", etc. Therefore, the method introduces an attenuation coefficient to improve the recommendation precision, and the improved cosine included angle algorithm is as follows:
Figure BDA0001405100170000041
wherein, sameWordNum represents the number of the keywords of the user model A and the event model B; min (| a |, | B |) represents the smallest dimension in the user model a and the event model B; w is aaiRepresenting the weight corresponding to the feature word ai in the user model A; w is abjRepresenting an event modelAnd the weight corresponding to the characteristic word bj in the type B.
After the attenuation coefficient is introduced, if only a few keywords are the same among vectors, the similarity of the vectors can be greatly attenuated, and the setting of the threshold is not fixed as long as a proper threshold is set; a generally suitable threshold value that achieves a recommended result that is expected indicates that the threshold value is set as appropriate; otherwise, the threshold is readjusted. The recommendation accuracy can be improved to a great extent. Besides the introduction of attenuation coefficients, the method also uses another two methods for improving recommendation precision. Firstly, the recommendation is performed only when the number of the same keywords is more than the preset number. Generally, the number of keywords input by the user is not too many, about 5, and the similarity calculation is performed when the user model and the event vector have at least three same keywords. When the number of words input by the user is changed greatly, the threshold value can be adjusted correspondingly. Secondly, in order to avoid negative effects caused by different word shapes of the same word, when the similarity is calculated, word stems of all the keywords are extracted for calculation.
And after the recommendation event is obtained, storing the recommendation event in a user database to generate a recommendation log. In the case of low demand, an event may be represented by a summary of the event, which is pushed to the user. If the user needs to read the original blog article, the blog article most relevant to the user model needs to be extracted from the event.
To extract the most interesting blog article, the blog article needs to be preprocessed, and word segmentation and word stem reduction are carried out, and if the blog article contains the word with the maximum weight in the same keyword list, the blog article is probably the most interesting for the user.
Step S2 specifically includes: the main task of model drift is to automatically correct the user model along with the time, and the purpose of the model drift is to track the event hot spots in real time and master the trend of the event.
The user model represents the user's points of interest, which is also usually the miniature of an event, except that the user summarizes the event with some keywords. Over time, events may develop new, and their hot words may change. In order to automatically track the change and ensure that a user can receive the latest information, the model drift module is designed.
The core of model drift is to modify the user model, add the latest hot words into the user model, and delete the outdated keywords in the model. The work flow is shown in fig. 2.
Like the recommendation module, the user model is extracted from the user database, and the latest event is extracted from the event database. It is noted that events that arrive within the last hour are extracted, and that the trigger point is extracted as new recommended events arrive under the user model. That is, when a new recommended event exists in the current user model, all events recommended in the next hour of the model are usually extracted (that is, the duration K in the present application is one hour, and the value of K may be other values, but the drift amplitude is different, but it is not recommended to take a too large value in order to ensure the timeliness of news), and a drift vector is generated. The purpose of this is to smooth the drift process and avoid situations where the drift is too fast. If the drift magnitude is too large, it may be too far from the initial model, affecting the user experience. There are also some points to note about the extraction of feature vectors. In all events within one hour, the generated drift vector may be very large and far exceeds the dimension of the initial user model, and in order to exclude the feature words with extremely small weight and avoid the initial user model from being excessively diluted, a high-frequency word in the feature words of the current user model is selected as a new user model.
In the embodiment, the extracted event feature vector is limited to 20 words, only a part of words with the highest weight are taken, the words are added to the user model, and the words with the weight of 20 before the update are intercepted as a new user model. Particularly, in order to ensure the influence of the original input keywords of the user, the updated user model is divided into two parts, namely the original input keywords and the newly added keywords, and each part respectively occupies 0.5 of weight. By the method, the weight of the original input keyword can be guaranteed, the latest event hot words can be added, and the outdated feature words can be deleted.
Similarly, the user model after drift is stored in a user database to generate a drift log.
Step S3 specifically includes: the main purpose of the user feedback updating is to receive the user feedback information in time and to modify the user model according to the user preference. The feedback of the user to the recommendation result reflects whether the user is satisfied with the current result, and is the most important reference information for modifying the recommendation. The user feedback update flow is shown in fig. 3.
The most direct way for user feedback is "like". When a user is interested in an event or a blog, he can like to approve the event or the blog, and the system recognizes the approval behavior and stores the approved event ID in the user database. By using the praise information, the latest interest points of the user can be acquired in time, and the user model is updated.
If the new event is approved, extracting a user model and an interested event ID from a user database, searching a corresponding event in the event database according to the ID, and extracting a high-frequency vocabulary of the event; after the high-frequency words are extracted, the weight of the word with the highest word frequency in the high-frequency words is set as the maximum weight in the original user model, and the weights of the rest high-frequency words are adjusted in proportion. And finally, normalizing the whole updated user model. The high-frequency vocabulary is defined as characteristic words with the word frequency larger than the number of Bowen of corresponding events, the words have certain representative meanings for the events, obviously, the words have higher weight and have larger proportion when being updated to a user model, and therefore, the subjective hobbies of users can be reflected powerfully. And (4) drifting in the same model, intercepting the first 20 keywords of the updated user model, and performing normalization processing to still enable the original input keywords to occupy the weight of 0.5. And finally, storing the updated user model into a user database to generate an update log.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (4)

1. The method for recommending the Sina microblog events is characterized by comprising the following steps:
s1, calculating the similarity between the user model and the event vector by adopting an improved cosine included angle algorithm, and recommending the event to the user if the similarity is greater than a threshold value; otherwise, not recommending; the improved cosine included angle algorithm is specifically as follows:
Figure FDA0003147790450000011
wherein, sameWordNum represents the number of the keywords of the user model A and the event model B; min (| a |, | B |) represents the smallest dimension in the user model a and the event model B; w is aaiRepresenting the weight corresponding to the feature word ai in the user model A; w is abjRepresenting the weight corresponding to the characteristic word bj in the event model B;
s2, updating the user model according to the recommended events arriving at the event database within the latest time length K; step S2 specifically includes:
s21, when a new recommended event arrives in the event database, extracting the recommended event which arrives within the latest time length K; extracting all events recommended in the last hour of the model to generate a drift vector; selecting keywords in the drift vector which are 20 th of the weight before, and adding the keywords into the user model; dividing the updated user model into two parts: the original input keywords and the newly added keywords respectively account for 0.5 of weight;
s22, selecting the feature words with the weight larger than the first threshold value in the recommended events extracted in the step S21 and adding the feature words into the user model;
s23, selecting a high-frequency word in the feature words of the current user model as a new user model;
and S3, updating the user model according to the events approved by the user.
2. The method for recommending Sing microblog events according to claim 1, wherein the user model is extracted from a user database.
3. The method for recommending Sing microblog events according to claim 2, wherein the event vector is extracted from an event database.
4. The method for recommending the green sea microblog event according to claim 1, wherein the step S3 is specifically as follows: and when a new event is approved, recording the ID of the approved event, searching the corresponding event from the event database according to the ID, and extracting the high-frequency vocabulary of the event.
CN201710816042.6A 2017-09-12 2017-09-12 Sina microblog event recommendation method Active CN107562912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710816042.6A CN107562912B (en) 2017-09-12 2017-09-12 Sina microblog event recommendation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710816042.6A CN107562912B (en) 2017-09-12 2017-09-12 Sina microblog event recommendation method

Publications (2)

Publication Number Publication Date
CN107562912A CN107562912A (en) 2018-01-09
CN107562912B true CN107562912B (en) 2021-08-27

Family

ID=60980471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710816042.6A Active CN107562912B (en) 2017-09-12 2017-09-12 Sina microblog event recommendation method

Country Status (1)

Country Link
CN (1) CN107562912B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175293B (en) * 2019-05-30 2021-01-29 北京小米智能科技有限公司 Method and device for determining news venation and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN103488705A (en) * 2013-09-06 2014-01-01 电子科技大学 User interest model incremental update method of personalized recommendation system
CN103778260A (en) * 2014-03-03 2014-05-07 哈尔滨工业大学 Individualized microblog information recommending system and method
CN104239512A (en) * 2014-09-16 2014-12-24 电子科技大学 Text recommendation method
KR20170024257A (en) * 2015-08-25 2017-03-07 건국대학교 산학협력단 Method and apparatus for recommending personalized subject

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110145822A1 (en) * 2009-12-10 2011-06-16 The Go Daddy Group, Inc. Generating and recommending task solutions
CN101968802A (en) * 2010-09-30 2011-02-09 百度在线网络技术(北京)有限公司 Method and equipment for recommending content of Internet based on user browse behavior
CN103455485A (en) * 2012-05-28 2013-12-18 中兴通讯股份有限公司 Method and device for automatically updating user interest model
US10158730B2 (en) * 2013-10-30 2018-12-18 At&T Intellectual Property I, L.P. Context based communication management
CN105989056B (en) * 2015-02-06 2019-05-24 北京中搜云商网络技术有限公司 A kind of Chinese news recommender system
CN106777132A (en) * 2016-12-18 2017-05-31 深圳市辣妈帮科技有限公司 Data processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN103488705A (en) * 2013-09-06 2014-01-01 电子科技大学 User interest model incremental update method of personalized recommendation system
CN103778260A (en) * 2014-03-03 2014-05-07 哈尔滨工业大学 Individualized microblog information recommending system and method
CN104239512A (en) * 2014-09-16 2014-12-24 电子科技大学 Text recommendation method
KR20170024257A (en) * 2015-08-25 2017-03-07 건국대학교 산학협력단 Method and apparatus for recommending personalized subject

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种相似度改进的用户聚类协同过滤推荐算法;孙辉 等;《小型微型计算机系统》;20140930;第35卷(第9期);第1967-1970页 *

Also Published As

Publication number Publication date
CN107562912A (en) 2018-01-09

Similar Documents

Publication Publication Date Title
US11100065B2 (en) Tools and techniques for extracting knowledge from unstructured data retrieved from personal data sources
RU2745632C1 (en) Automated response server device, terminal device, response system, response method and program
CN104076944B (en) A kind of method and apparatus of chatting facial expression input
CN112836130B (en) Context-aware recommendation system and method based on federated learning
US9324112B2 (en) Ranking authors in social media systems
US8468144B2 (en) Methods and apparatus for analyzing information to identify entities of significance
Luo et al. An effective approach to tweets opinion retrieval
CN105868267B (en) A kind of modeling method of mobile social networking user interest
KR20160059486A (en) System and method for continuous social communication
CN101192235A (en) Method, system and equipment for delivering advertisement based on user feature
CN106484829B (en) A kind of foundation and microblogging diversity search method of microblogging order models
CN102043845A (en) Method and equipment for extracting core keywords based on query sequence cluster
CN103324665A (en) Hot spot information extraction method and device based on micro-blog
CN102184256A (en) Clustering method and system aiming at massive similar short texts
CN107145545A (en) Top k zone users text data recommends method in a kind of location-based social networks
CN103246670A (en) Microblog sorting, searching, display method and system
CN109992781B (en) Text feature processing method and device and storage medium
CN104036010A (en) Semi-supervised CBOW based user search term subject classification method
CN110874396B (en) Keyword extraction method and device and computer storage medium
CN104281565A (en) Semantic dictionary constructing method and device
CN109492082A (en) Pull down words recommending method, device, electronic equipment and storage medium
CN103279483A (en) Topic prevalence range assessment method and system facing micro-blogs
CN103020141A (en) Method and equipment for providing searching results
CN107562912B (en) Sina microblog event recommendation method
CN101840438A (en) Retrieval system oriented to meta keywords of source document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant