CN106446191A - Logistic regression based multi-feature network popular tag prediction method - Google Patents
Logistic regression based multi-feature network popular tag prediction method Download PDFInfo
- Publication number
- CN106446191A CN106446191A CN201610864860.9A CN201610864860A CN106446191A CN 106446191 A CN106446191 A CN 106446191A CN 201610864860 A CN201610864860 A CN 201610864860A CN 106446191 A CN106446191 A CN 106446191A
- Authority
- CN
- China
- Prior art keywords
- label
- tag
- network
- popular
- populartag
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000007477 logistic regression Methods 0.000 title abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 15
- 230000001174 ascending effect Effects 0.000 claims description 5
- 230000002123 temporal effect Effects 0.000 claims 2
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 238000012512 characterization method Methods 0.000 claims 1
- 239000000203 mixture Substances 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 abstract description 6
- 238000013145 classification model Methods 0.000 abstract description 2
- 238000010276 construction Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9562—Bookmark management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种基于Logistic回归的多特征网络流行标签预测方法,包括以下步骤:1)根据问答网站发帖数据,构建有权无向网络标签Tag网络;2)按照标签出现频率,提取流行标签集合、非流行标签集合;3)提取标签的网络特征、标签提出者属性特征、标签被提出后属性变动特征作为特征向量;4)采用Logistics多元回归训练并构建标签分类模型;本发明考虑标签之间相关性,依据多特征对标签进行分类,对于预测潜在流行标签具有较高的精度。既有利于引导用户选择合理的标签,也有利于网站建设者提供更高质量的标签。
A multi-feature network popular label prediction method based on Logistic regression, comprising the following steps: 1) constructing a network of authorized and undirected network label Tags according to the posting data of a question-and-answer website; 2) extracting popular label sets, non-popular Label collection; 3) extracting the network features of the label, the attribute characteristics of the label proposer, and the attribute change characteristics after the label is proposed; 4) using Logistics multiple regression training and building a label classification model; the present invention considers the correlation between labels, Classifying labels based on multiple features has high accuracy for predicting potential popular labels. It is not only beneficial to guide users to choose reasonable labels, but also beneficial for website builders to provide higher quality labels.
Description
技术领域technical field
本发明涉及数据挖掘与计算机技术领域,特别是涉及一种基于Logistic回归的多特征网络流行标签预测方法。The invention relates to the fields of data mining and computer technology, in particular to a Logistic regression-based multi-feature network popular label prediction method.
背景技术Background technique
网络标签(Tag)是互联网信息内容的一种组织形式,通常由一些与内容密切相关的关键词组成,它可以帮助人们方便地描述和分类内容,同时也便于信息的检索与分享。由于网络标签的便捷性,标签预测以及标签推荐近年来在众多网络平台上得到了广泛的应用,如问答网站StackExchange,照片分享网站Flickr,以及餐饮点评网站Yelp。采用合适的标签无论是对网站还是对用户而言都非常重要。对网站而言,合适的标签可以帮助网站对用户进行个性化推荐,增加用户的粘性和网站点击率;对用户而言,标签可以帮助用户快速定位到自己所需,避免浪费时间浏览无用信息。在标签选取中,如何选取潜在流行标签是十分关键的步骤,因为流行标签往往代表了大部分用户的需求。Network tags (Tag) is an organizational form of Internet information content, usually composed of some keywords closely related to content, it can help people describe and classify content conveniently, and also facilitate information retrieval and sharing. Due to the convenience of network tags, tag prediction and tag recommendation have been widely used in many network platforms in recent years, such as StackExchange, a question-and-answer website, Flickr, a photo-sharing website, and Yelp, a restaurant review website. Using the right tags is very important both for the website and for the users. For websites, appropriate tags can help websites make personalized recommendations to users, increasing user stickiness and website click-through rate; for users, tags can help users quickly locate what they need and avoid wasting time browsing useless information. In tag selection, how to select potentially popular tags is a critical step, because popular tags often represent the needs of most users.
目前对信息进行标签选取的主要依据是信息与标签的文字相关程度以及信息发起者的自身属性等。但这样的选取存在各种弊端,主要表现在:1.忽略了标签的潜在流行趋势;2.忽略了标签与标签之间的相关性;3.冷门内容导致冷门标签,使得信息并不能被有效搜索到;4.只考虑到少数特征,使得部分标签的选取趋向与片面。At present, the main basis for selecting information tags is the degree of correlation between the information and the text of the tag and the attributes of the information originator. However, there are various disadvantages in this selection, mainly in: 1. Ignoring the potential trend of tags; 2. Ignoring the correlation between tags; 3. Unpopular content leads to unpopular tags, making information not effective Searched; 4. Only a few features are considered, which makes the selection of some labels tend to be one-sided.
因此,为了使用户在发布信息内容时更好地对标签进行选取,尽可能地选取潜在流行标签。本发明基于Logistic回归的多特征网络流行标签预测方法解决以下两个基本问题:(1)预测了标签的未来流行趋势;(2)应用大量的特征对标签的流行趋势进行定量刻画。Therefore, in order to enable users to better select tags when publishing information content, potential popular tags should be selected as much as possible. The multi-feature network popular label prediction method based on Logistic regression in the present invention solves the following two basic problems: (1) predicts the future popular trend of the label; (2) applies a large number of features to quantitatively describe the popular trend of the label.
发明内容Contents of the invention
为了克服现有的标签选取系统忽略了标签潜在流行趋势及标签之间相关性、评价特征单一的不足,本发明提供了一种基于Logistic回归的多特征网络流行标签预测方法,不仅考虑到多个特征及标签之间的相关特征,同时也能更好地预测了标签的流行趋势。In order to overcome the shortcomings of the existing label selection system that ignores the potential trend of labels and the correlation between labels, and has a single evaluation feature, the present invention provides a multi-feature network popular label prediction method based on Logistic regression, which not only considers multiple The related features between features and tags can also better predict the popular trend of tags.
本发明解决其技术问题所采用的技术方案如下:The technical solution adopted by the present invention to solve its technical problems is as follows:
一种基于Logistic回归的多特征网络流行标签预测方法,包括如下步骤:A multi-feature network popular label prediction method based on Logistic regression, comprising the steps of:
S1:数据预处理:收集网站的信息内容和标签数据,并将网站信息内容按时间升序排列,将比例为前α%的帖子视为标签网络稳定前的暂态数据,并删除这一部分暂态数据;从网站剩下的数据中选取前预设比例的数据作为训练数据;S1: Data preprocessing: collect the information content and label data of the website, and arrange the information content of the website in ascending order of time, regard the posts with a proportion of the first α% as the transient data before the label network is stable, and delete this part of the transient state Data; select the data of the preset ratio from the remaining data of the website as the training data;
S2:构建标签Tag网络,对同一个信息内容中出现的Tag,使其两两之间形成连边,对所有信息遍历,得到有权无向网络的标签网络图GTag,网络的权重为两者共同出现的次数;S2: Construct a Tag Network. For Tags that appear in the same information content, make them form an edge between them. Traverse all the information to get the tag network graph G Tag with the right to undirected network. The weight of the network is two the number of co-occurrences;
S3:每个标签按照其在帖子中出现的频率降序排列,取前β%比例的Tag作为流行标签集合UPopularTag;S3: Each tag is arranged in descending order according to its frequency of appearance in the post, and the Tag with the first β% ratio is taken as the popular tag set U PopularTag ;
S4:寻找非流行的标签集合UUnPopularTag,对每一个流行标签t∈UPopularTag,搜索标签t第一次出现的时间,并以此时间为中心,搜寻离该时间最近的,第一次出现的,同时不属于UPopularTag的标签作为非流行标签,组成对照的非流行标签集合UUnPopularTag;S4: Find the unpopular tag set U UnPopularTag , for each popular tag t ∈ U PopularTag , search for the time when the tag t first appeared, and use this time as the center to search for the closest to the time, the first time , and tags that do not belong to U PopularTag are used as unpopular tags to form a comparative non-popular tag set U UnPopularTag ;
S5:对训练的样本标签集合U={UPopularTag,UUnPopularTag},提取其内Tag的网络特征,在有权无向网络GTag上,提取样本标签第一次出现连接的邻居节点度值、邻居节点度中心性;S5: For the training sample label set U={U PopularTag , U UnPopularTag }, extract the network features of the tags in it, and extract the degree value of the neighbor node connected to the sample label for the first time on the weighted undirected network G Tag , Neighbor node degree centrality;
S6:对训练的样本标签集合U={UPopularTag,UUnPopularTag},提取其内Tag的提出者属性特征,具体包括Tag提出者提出该Tag时的以发布的信息内容的数量,信息内容的长度;S6: For the training sample label set U={U PopularTag , U UnPopularTag }, extract the attribute characteristics of the proposer of the Tag in it, specifically including the number of information content published when the Tag proposer proposes the Tag, and the length of the information content ;
S7:对训练的样本标签集合U={UPopularTag,UUnPopularTag},提取其内Tag的属性变动特征,具体包括该Tag提出后,5天内该Tag对应的帖子收到的答复数量;S7: For the training sample label set U={U PopularTag , U UnPopularTag }, extract the attribute change characteristics of the Tag in it, specifically including the number of replies received by the post corresponding to the Tag within 5 days after the Tag is put forward;
S8:采用Logistic多元回归,以集合U={UPopularTag,UUnPopularTag}中标签的特征作为训练数据,训练并构建标签分类器模型。S8: Using Logistic multiple regression, using the features of the tags in the set U={U PopularTag , U UnPopularTag } as training data, train and build a tag classifier model.
进一步,所述步骤S1中,α%的确定方式为,当出现网站全部Tag标签数量的预设百分比时候,作为α%的截取点。其目的是确保标签网络不受到网站建立之初工作人员对网站标签调试造成的影响;Further, in the step S1, α% is determined in such a way that when a preset percentage of the total number of Tags on the website appears, it is used as an intercept point of α%. Its purpose is to ensure that the label network is not affected by the staff's debugging of the website label at the beginning of the website's establishment;
再进一步,所述步骤S5中,采用公式(1)计算邻居i的节点度值Further, in the step S5, formula (1) is used to calculate the node degree value of neighbor i
其中,g表示网络的节点总数;如果节点i和j有连边,则xij=1,否则xij=0;Among them, g represents the total number of nodes in the network; if nodes i and j have connected edges, then x ij =1, otherwise x ij =0;
采用公式(2)计算邻居i的节点度中心性Use formula (2) to calculate the node degree centrality of neighbor i
本发明的有益效果为:考虑标签之间相关性,依据多特征对标签进行分类,对于预测潜在流行标签具有较高的精度。既有利于引导用户选择合理的标签,也有利于网站建设者提供更高质量的标签。The beneficial effects of the present invention are: considering the correlation between tags, classifying tags according to multiple features, and having higher precision for predicting potential popular tags. It is not only beneficial to guide users to choose reasonable labels, but also beneficial for website builders to provide higher quality labels.
附图说明Description of drawings
图1为本发明实施例的一种基于Logistic回归的多特征标签分类方法的流程图。FIG. 1 is a flow chart of a multi-feature label classification method based on Logistic regression according to an embodiment of the present invention.
图2为本发明实施例的标签出现频率示意图。Fig. 2 is a schematic diagram of tag occurrence frequency according to an embodiment of the present invention.
具体实施方式detailed description
下面结合说明书附图对本发明的具体实施方式作进一步详细的描述。The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings.
参照图1和图2,一种基于Logistic回归的多特征网络流行标签预测方法,本发明使用问答网站StackExchange子网站Tex.Stackexchange.com官方公开的数据进行标签分类系统的建模分析,原始数据记录了每个帖子出现的时间,发帖人ID,帖子标签等信息。以本专利研究标签Tag为例,我们提取该标签第一次出现的时间,标签提出者ID,邻居标签ID等信息。Referring to Fig. 1 and Fig. 2, a multi-feature network popular label prediction method based on Logistic regression, the present invention uses the data officially released by the question-and-answer website StackExchange sub-site Tex.Stackexchange.com to carry out the modeling analysis of the label classification system, and the original data records Information such as the time when each post appeared, the ID of the poster, and the label of the post. Taking the Tag of this patent research as an example, we extract the time when the tag first appeared, ID of the tag creator, ID of neighbor tags and other information.
本实施例中,一种基于Logistic回归的多特征标签分类方法,其具体步骤为:In the present embodiment, a kind of multi-feature label classification method based on Logistic regression, its specific steps are:
1)构建标签Tag网络:对发表过的帖子数据,做如下处理:1) Build a tag Tag network: For the published post data, do the following processing:
1.1)遍历帖子数据,得到所有的Tag标签集合TI,I∈N,其中N表示标签的总数量。取数量为N×20%的标签作为网站标签稳定点所需的标签数量,其有益方式为防止网站建立之处,工作人员对网站内容的调试给模型带来噪声;1.1) Traverse the post data to get all the Tag label sets T I , I∈N, where N represents the total number of tags. Take N×20% of the tags as the number of tags required for the stable point of the website tags. The beneficial way is to prevent the site from being built, and the staff’s debugging of the website content will bring noise to the model;
1.2)将帖子按照时间顺序升序排列,再次遍历帖子数据,当得到不同标签的数量为N×20%时,记录此时遍历过的帖子数目为NInstablePosts,将此时的帖子发表时间视为网站标签稳定时间;1.2) Arrange the posts in ascending order of time, and traverse the post data again. When the number of different tags is N×20%, record the number of posts traversed at this time as N InstablePosts , and regard the post publishing time at this time as the website label stabilization time;
1.3)确定其中NPosts为发表帖子的总数量;1.3) OK Where N Posts is the total number of published posts;
1.4)构建Tag网络:去除前α%的帖子,读取问答网站数据中前80%数据量的帖子作为训练数据。其中,Tag网络构建方式为:对同一个帖子中出现的Tag,使其两两之间形成连边。对所有信息遍历,得到有权无向网络的标签网络图GTag,网络的权重为两者共同出现的次数;1.4) Build a Tag network: remove the first α% posts, and read the first 80% of the posts in the Q&A website data as training data. Among them, the construction method of the Tag network is as follows: for the Tags appearing in the same post, make them form a connection between them. For all information traversal, the label network graph G Tag of the authorized undirected network is obtained, and the weight of the network is the number of times the two appear together;
2)获取流行标签集合UPopularTag:对发表过的帖子数据,做如下处理:2) Obtain the popular tag set U PopularTag : For the published post data, do the following processing:
2.1)遍历帖子数据,获取每个Tag在帖子中出现的频率;2.1) Traverse the post data to obtain the frequency of each Tag appearing in the post;
2.2)按照Tag出现频率降序排列,取前β%比例的Tag作为流行标签集合UPopularTag,这里,我们选择β%=5%;2.2) According to the descending order of Tag appearance frequency, take the Tag with the first β% ratio as the popular tag set U PopularTag , here, we choose β%=5%;
3)获取非流行标签集合UUnPopularTag,具体步骤为:3) Obtain the unpopular tag set U UnPopularTag , the specific steps are:
3.1)对每一个标签Tag,遍历帖子,得到每一个标签的首次出现时间;3.1) For each label Tag, traverse the post to get the first appearance time of each label;
3.2)对每一个流行标签t∈UPopularTag,搜索所有其余标签(其余标签不存在于流行标签内)与该标签的时间差,即其余与该标签的首次出现时间差ΔT;3.2) For each popular tag t ∈ U PopularTag , search for all other tags (other tags do not exist in popular tags) The time difference with the label, that is, the time difference ΔT between the rest and the first appearance of the label;
3.3)对该时间差ΔT进行升序排列,取ΔT最小的标签t'作为非流行标签,从而形成非流行标签集合UUnPopularTag 3.3) Arrange the time difference ΔT in ascending order, and take the tag t' with the smallest ΔT as the unpopular tag, thus forming the unpopular tag set U UnPopularTag
4)提取Tag的网络特征,具体步骤为:4) Extract the network features of Tag, the specific steps are:
4.1)对每一个标签t∈{UPopularTag,UUnPopularTag},采用公式(1)计算邻居i的节点度值4.1) For each tag t ∈ {U PopularTag , U UnPopularTag }, use formula (1) to calculate the node degree value of neighbor i
其中,g表示网络的节点总数;如果节点i和j有连边,则xij=1,否则xij=0;Among them, g represents the total number of nodes in the network; if nodes i and j have connected edges, then x ij =1, otherwise x ij =0;
4.2)采用公式(2)计算邻居i的节点度中心性4.2) Use formula (2) to calculate the node degree centrality of neighbor i
4.3)归一化邻居节点度、邻居节点度中心性,归一化分母为邻居节点数值4.3) Normalized neighbor node degree, neighbor node degree centrality, normalized denominator is neighbor node value
5)提取样本Tag提出者属性特征,具体步骤为:5) Extract the attribute characteristics of the sample Tag proposer, the specific steps are:
5.1)对每一个样本标签t∈{UPopularTag,UUnPopularTag},获得该标签首次提出时,提出者的ID号、标签首次出现时间;5.1) For each sample tag t ∈ {U PopularTag , U UnPopularTag }, obtain the ID number of the proposer and the time when the tag first appeared when the tag was proposed for the first time;
5.2)将帖子按照时间顺序升序排列,找出标签首次出现时间之前,该提出者ID总共的提问数量、答案数量,作为Tag提出者属性特征;5.2) Arrange the posts in ascending chronological order, and find out the total number of questions and answers of the presenter ID before the time when the tag first appeared, as the attribute characteristics of the presenter of the Tag;
6)提取样本Tag的属性变动特征,具体步骤为:对训练的样本标签集合U={UPopularTag,UUnPopularTag},在该Tag提出后,5天内该Tag共收到的答案数量;6) extracting the attribute change feature of the sample Tag, the specific steps are: for the sample tag set U={U PopularTag , U UnPopularTag } for training, after the Tag is proposed, the number of answers received by the Tag within 5 days;
7)Logistic多元回归训练分类模型:将上述样本标签集合U={UPopularTag,UUnPopularTag},以及相对应的Tag的邻居节点度值、邻居节点中心度、Tag提出者提问数量、Tag提出者答案数量、Tag提出后一定时间收到的答案数量这5个特征作为输入,运用Logistics多元回归作为分类器,训练并构建标签分类器模型;7) Logistic multiple regression training classification model: the above sample label set U={U PopularTag , U UnPopularTag }, and the corresponding Tag's neighbor node degree value, neighbor node centrality, the number of questions asked by the Tag proposer, and the answer of the Tag proposer The five features of the quantity and the number of answers received within a certain period of time after the Tag is proposed are used as input, and the Logistic multiple regression is used as the classifier to train and build a tag classifier model;
如上所述为本发明在问答网站StackExchange子网站Tex.Stackexchange.com中的标签分类实施例介绍,通过构建网络的方式将标签之间的相关性纳入特征;通过考虑标签邻居特征、考虑标签提出者特征、标签时间演化特征等方式增加了标签分类的特征数据。通过训练模型最终得到标签是否流行的判定,对网站的标签推荐系统构建提供指导意义。As mentioned above, the label classification embodiment of the present invention in the question-and-answer website StackExchange sub-site Tex.Stackexchange.com is introduced, and the correlation between labels is included in the feature by constructing a network; Features, label time evolution features, etc. increase the feature data of label classification. Through the training model, the judgment of whether the tag is popular can be finally obtained, which provides guidance for the construction of the tag recommendation system of the website.
Claims (3)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610864860.9A CN106446191B (en) | 2016-09-30 | 2016-09-30 | A kind of multiple features network flow row label prediction technique returned based on Logistic |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610864860.9A CN106446191B (en) | 2016-09-30 | 2016-09-30 | A kind of multiple features network flow row label prediction technique returned based on Logistic |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106446191A true CN106446191A (en) | 2017-02-22 |
CN106446191B CN106446191B (en) | 2019-11-05 |
Family
ID=58169804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610864860.9A Active CN106446191B (en) | 2016-09-30 | 2016-09-30 | A kind of multiple features network flow row label prediction technique returned based on Logistic |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106446191B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951471A (en) * | 2017-03-06 | 2017-07-14 | 浙江工业大学 | A kind of construction method of the label prediction of the development trend model based on SVM |
CN108629358A (en) * | 2017-03-23 | 2018-10-09 | 北京嘀嘀无限科技发展有限公司 | The prediction technique and device of object type |
CN110380954A (en) * | 2017-04-12 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Data sharing method and device, storage medium and electronic device |
CN115002030A (en) * | 2022-04-27 | 2022-09-02 | 安徽工业大学 | A kind of website fingerprint identification method, device, memory and processor |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130018697A1 (en) * | 2011-07-15 | 2013-01-17 | Giovanni Giuffrida | System to forecast performance of online news articles to suggest the optimal homepage layout to maximize article readership and readers stickiness |
CN103631874A (en) * | 2013-11-07 | 2014-03-12 | 微梦创科网络科技(中国)有限公司 | UGC label classification determining method and device for social platform |
CN103678670A (en) * | 2013-12-25 | 2014-03-26 | 福州大学 | Micro-blog hot word and hot topic mining system and method |
CN104281882A (en) * | 2014-09-16 | 2015-01-14 | 中国科学院信息工程研究所 | Method and system for predicting social network information popularity on basis of user characteristics |
CN104572733A (en) * | 2013-10-22 | 2015-04-29 | 腾讯科技(深圳)有限公司 | User interest tag classification method and device |
CN104933622A (en) * | 2015-03-12 | 2015-09-23 | 中国科学院计算技术研究所 | Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme |
-
2016
- 2016-09-30 CN CN201610864860.9A patent/CN106446191B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130018697A1 (en) * | 2011-07-15 | 2013-01-17 | Giovanni Giuffrida | System to forecast performance of online news articles to suggest the optimal homepage layout to maximize article readership and readers stickiness |
CN104572733A (en) * | 2013-10-22 | 2015-04-29 | 腾讯科技(深圳)有限公司 | User interest tag classification method and device |
CN103631874A (en) * | 2013-11-07 | 2014-03-12 | 微梦创科网络科技(中国)有限公司 | UGC label classification determining method and device for social platform |
CN103678670A (en) * | 2013-12-25 | 2014-03-26 | 福州大学 | Micro-blog hot word and hot topic mining system and method |
CN104281882A (en) * | 2014-09-16 | 2015-01-14 | 中国科学院信息工程研究所 | Method and system for predicting social network information popularity on basis of user characteristics |
CN104933622A (en) * | 2015-03-12 | 2015-09-23 | 中国科学院计算技术研究所 | Microblog popularity degree prediction method based on user and microblog theme and microblog popularity degree prediction system based on user and microblog theme |
Non-Patent Citations (1)
Title |
---|
刘列: ""社交网络用户标签预测研究"", 《中文信息学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951471A (en) * | 2017-03-06 | 2017-07-14 | 浙江工业大学 | A kind of construction method of the label prediction of the development trend model based on SVM |
CN106951471B (en) * | 2017-03-06 | 2020-05-05 | 浙江工业大学 | A Construction Method of Label Development Trend Prediction Model Based on SVM |
CN108629358A (en) * | 2017-03-23 | 2018-10-09 | 北京嘀嘀无限科技发展有限公司 | The prediction technique and device of object type |
CN108629358B (en) * | 2017-03-23 | 2020-12-25 | 北京嘀嘀无限科技发展有限公司 | Object class prediction method and device |
CN110380954A (en) * | 2017-04-12 | 2019-10-25 | 腾讯科技(深圳)有限公司 | Data sharing method and device, storage medium and electronic device |
CN115002030A (en) * | 2022-04-27 | 2022-09-02 | 安徽工业大学 | A kind of website fingerprint identification method, device, memory and processor |
Also Published As
Publication number | Publication date |
---|---|
CN106446191B (en) | 2019-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ferrara et al. | Online popularity and topical interests through the lens of instagram | |
CN103425799B (en) | Individuation research direction commending system and recommend method based on theme | |
Li et al. | Community detection using hierarchical clustering based on edge-weighted similarity in cloud environment | |
CN104077417B (en) | People tag in social networks recommends method and system | |
CN100416569C (en) | A Formal Description Method of User Access Behavior Based on Web Page Metadata | |
CN106484764A (en) | User's similarity calculating method based on crowd portrayal technology | |
CN110851664B (en) | A topic-oriented social network node importance evaluation method | |
CN104268648B (en) | Merge user's ranking system of a variety of interactive information of user and user's subject information | |
CN106951471A (en) | A kind of construction method of the label prediction of the development trend model based on SVM | |
CN105719191A (en) | System and method for social group discovery with uncertain behavioral semantics in multi-scale space | |
CN106446191B (en) | A kind of multiple features network flow row label prediction technique returned based on Logistic | |
CN116738066B (en) | Rural tourism service recommendation method, device, electronic equipment and storage medium | |
Xiong et al. | Affective impression: Sentiment-awareness POI suggestion via embedding in heterogeneous LBSNs | |
Guo et al. | A survey of internet public opinion mining | |
KR20100023630A (en) | Method and system of classifying web page using categogory tag information and recording medium using by the same | |
CN102567392A (en) | Control method for interest subject excavation based on time window | |
Sun et al. | Tourists’ digital footprint: prediction method of tourism consumption decision preference | |
CN110297984A (en) | Information transmission dynamics system, construction method, device and medium based on microblogging | |
McKenzie et al. | Weighted multi-attribute matching of user-generated points of interest | |
Helic et al. | Building directories for social tagging systems | |
Chu et al. | TRSO: A tourism recommender system based on ontology | |
Liang et al. | Enhancing scenic recommendation and tour route personalization in tourism using UGC text mining | |
CN112052995B (en) | Social network user influence prediction method based on fusion emotion tendency theme | |
Li et al. | Modeling topic and community structure in social tagging: The TTR‐LDA‐Community model | |
Elsharkawy et al. | Modelling meme adoption pattern on online social networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |