CN107862620A - A kind of similar users method for digging based on social data - Google Patents

A kind of similar users method for digging based on social data Download PDF

Info

Publication number
CN107862620A
CN107862620A CN201711311721.4A CN201711311721A CN107862620A CN 107862620 A CN107862620 A CN 107862620A CN 201711311721 A CN201711311721 A CN 201711311721A CN 107862620 A CN107862620 A CN 107862620A
Authority
CN
China
Prior art keywords
mrow
word
msub
user
microblogging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711311721.4A
Other languages
Chinese (zh)
Inventor
李开宇
王月超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan XW Bank Co Ltd
Original Assignee
Sichuan XW Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan XW Bank Co Ltd filed Critical Sichuan XW Bank Co Ltd
Priority to CN201711311721.4A priority Critical patent/CN107862620A/en
Publication of CN107862620A publication Critical patent/CN107862620A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of similar users method for digging based on social data of present invention offer, the field of social network being related in Internet information technique, including:Step 1:Critical data is crawled from the microblogging text of user;Step 2:TOPN keywords are extracted from critical data;Step 3:General word2vec models are trained according to TOPN keywords;Step 4:Calculate user interest vector;Step 5:Cosine similarity calculating is carried out two-by-two to user interest vector, obtains the Interest Similarity between two users;Step 6:Similar users are filtered out according to Interest Similarity, similar users is completed and excavates.The present invention solve at present by the strong correlation of this method of other users in user's issuing microblog, comment, forwarding it is very sparse, be unfavorable for extensive similar users and open up newly, it is low to result in recall rate;And this method of similar situation of big V user is paid close attention in the case of the Interest Similarity degree of routine calculates and still has Sparse by user, the problem of final recall rate is still relatively low.

Description

A kind of similar users method for digging based on social data
Technical field
The present invention relates to the field of social network in Internet information technique, more particularly to a kind of estimation and social data Similar users method for digging.
Background technology
Microblogging receives the concern of more and more users as a kind of new social networking service.According to statistics, have daily Thousands of new users adds microblog, while generates hundreds of millions of micro-blog informations.Increasing businessman simultaneously, Also enterprise account is opened one after another, receives bean vermicelli (objective group).By carrying out specific aim marketing to client, Brang Awareness is established, As the main flow means of brand promotion, the marketing.How from existing client, association expands new user, always each The problem of enterprise is thought deeply, user's extended method is carried out based on social big data, is increasingly recognized.Seeking interest similar users is One important means of user's extension, the existing method for finding similar users include two kinds, below, by both approaches Step and determination carry out corresponding explanation.
The first, the similar users using the other users of@in user's issuing microblog, comment and forwarding as the user, this The method that kind finds similar users is very simple, and directly by crawling the microblogging of the user, parse just can find pass therein System;For user with the interest phase same sex between user be present, this is caused by the exclusive functional attributes of microblogging, is a kind of strong correlation Property.But this strong correlation is very sparse, it is unfavorable for extensive similar users and opens up newly, it is low to result in recall rate.
Second, determine to judge according to the similar situation for paying close attention to big V user between user similar between two users Degree.Steps of the method are:(1) the concern list of user is crawled, and is sieved by the bean vermicelli amount of user in list or self brief introduction Select wherein all big V users with medium property;(2) the interactive number (bag between the user and each big V user is gathered Comment is included, forwards and thumbs up), and the bean vermicelli amount of big V user.
(3) user is calculated to each big V user's by formula " interest index=interactive number/big V beans vermicelli number " Interest index;And obtain the vector of each big V user interests index;(4) again by calculating two users V big to two respectively The vectorial cosine similarity of the interest index of user, to represent Interest Similarity between two users.(5) by calculating a user With treating user in expanded set, similar number and similar average value, to judge whether the user is the user to be extended.This side Method can preferably tackle the Similarity Measure of user's hot topic interest, but be calculated for the Interest Similarity degree of routine, however it remains Bottleneck be present in the situation of Sparse, similar users mining effect.Such as the big V or brand of two minorities, A blogers and B blogers, They belong to same field, and the common factor of bean vermicelli is little, and a user pays close attention to A blogers, b user pays close attention to B blogers, and a belongs to similar to b User, but can not be calculated in second method, equally, the final recall rate of this method for digging is relatively low.
The content of the invention
It is an object of the invention to:For solve it is existing by other users in user's issuing microblog, comment, forwarding this The strong correlation of kind of method is very sparse, is unfavorable for extensive similar users opens up newly, and it is low to result in recall rate;And by using This method of similar situation that big V user is paid close attention at family calculates the feelings for still having Sparse for the Interest Similarity of routine Condition, the problem of final recall rate is still relatively low, the present invention provide a kind of similar users method for digging based on social data.
Technical scheme is as follows:
A kind of similar users method for digging based on social data, comprises the following steps:
Step 1:Critical data is crawled from the microblogging text of user;
Step 2:TOPN keywords are extracted from critical data;
Step 3:General word2vec models are trained according to general language material, by the TopN keywords of two users, calculated Interest vector between user.
Step 4:Cosine similarity calculating is carried out two-by-two to user interest vector, obtains the Interest Similarity between two users;
Step 5:Similar users are filtered out according to Interest Similarity, similar users is completed and excavates.
Specifically, in the step 1, the critical data crawled from user's microblogging text includes:(1) user's issue is micro- The rich text data with interactive microblogging;(2) user pays close attention to the table data of big V blogers;(3) the big V blogers issue of user's concern Microblogging text data.
Preferably, the big V blogers are that microblogging bean vermicelli number is at least 10W, and the critical data is that user is closely trimestral Data.
Specifically,, will be from user's microblogging before TOPN keywords are extracted from critical data in the step 2 In the text data that crawls carry out NLP processing, NLP processing includes word segmentation processing and goes stop words to handle.
Specifically, in the step 2, TOPN keywords are calculated using TextRank sort methods, specific step is: Assuming that the microblogging set that all users issue is considered as into a document D, every microblogging of wherein user u issues is mi, and mi ∈ D, each candidate keywords k word frequency TFkThe frequency that the word occurs in microblogging document mi is represented, contains the micro- of keyword k Reverse document frequency IDFs of the rich mi in whole microblogging document DkFor:
Wherein:{m:K ∈ m } the microblogging quantity containing keyword k is represented, | du | represent the total quantity of user's issuing microblog; Word V is calculated using TextRank sort methodsiThe fractional formula of importance is:
Wherein, S (Vk) represent word k importance, TFkRepresent the frequency that word k occurs in a document, E (Vk) represent The set of words that word k occurred altogether, with appearance as co-occurrence, w in a sentencejkRepresent that word j's and word k is similar Degree, wjiWord j and word i similarity is represented, word j is word k similar word, and word i is similar to word j Word, occurrence number of the similarity equal to word i and word j divided by the number sum occurred respectively;Then changed again based on formula In generation, the importance scores of each word are obtained, be ranked up according to the size of importance scores, the larger word of retention score is made For the TOPN keywords extracted.
The specific steps of the step 3 include:(1) Chinese language material is downloaded;(2) to the Chinese language material carry out participle and Remove stop words;(3) using Open-Source Tools training word2vec models;(4) TOPN keywords will be obtained in step 2 to change respectively For multiple term vectors;(5) the multiple term vector is all added up and obtains the interest vector of user.
After such scheme, beneficial effects of the present invention are as follows:
(1) present invention broken it is traditional only according to by other users in user's issuing microblog, comment, forwarding this Kind of method, or pay close attention to by user this method of similar situation of big V user, it is proposed that from the comprehensive social data of user To excavate similar users, big V blogers can regard a bridge as, draw more interactive microblog datas.Even if it is not same One big V blogers, but their interactive content of microblog are similar, for example be all that amusement discloses bloger, cuisines bloger, wears and take bloger, Also similar users can be calculated as, here it is the reason for recall rate expansion, after the method using the present invention, to the recall rate phase of client Than being compared in the method for digging for being solely focused on big V user's similar situation, recall rate increases to 66.72% from 37.2%, recall rate Almost it is doubled, its effect highly significant.
Because the scale of the similar users of excavation is bigger, the information crawled is more, and amount of calculation increases, and calculating speed, which slows down, is Inevitably, can be controlled by filtering user, such as originally according to 10,000 usage mining similar users, now by 1 Ten thousand user filterings go out representational 1,000.
(2) traditional TextRank methods are to extract keyword using word as both candidate nodes so as to construct non-directed graph, The fraction of each noun node in non-directed graph is calculated, the shadow calculated without considering the weight information of word in itself node fraction Ring, if not considering word importance, some unessential words, can constantly diffuse out more inessential words so that follow-up phase Seriously reduced like degree accuracy, therefore, the word information in this paper statistic documents, on the basis of word co-occurrence number is considered, Addition passes through TFk×IDFkThe term weighing being calculated calculates fraction, selects the larger noun of fraction as candidate key Word, so, this guarantees accuracy rate.
Embodiment
Example is applied below in conjunction with the present invention, the technical scheme in the present embodiment is clearly and completely described, is shown So, described embodiment is only the part of the embodiment of the present invention, rather than whole embodiments.Based in the present invention Embodiment, the every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made, all Belong to the scope of protection of the invention.
The similar users method for digging based on social data in the present embodiment, comprises the following steps:
Step 1:Critical data is crawled from the microblogging text of user;The critical data crawled from user's microblogging text Including:(1) text data of user's issuing microblog and interactive microblogging;(2) user pays close attention to the table data of big V blogers;(3) user The microblogging text data of the big V blogers issue of concern.Big V blogers are that microblogging bean vermicelli number is at least 10W, if excavation amount demand is big 5W can be used, and as threshold values is excavated, the critical data is the closely trimestral data of user.
Step 2:TOPN keywords are extracted from critical data;, before TOPN keywords are extracted from critical data The text data crawled from user's microblogging is subjected to NLP processing, NLP processing includes word segmentation processing and goes stop words to handle. During specific extraction TOPN keywords, TOPN keywords are calculated using TextRank sort methods, specific step is:Assuming that The microblogging set that all users issue and the microblogging set that the big V of concern is issued are considered as a document D, wherein user u hairs Every microblogging of cloth is mi, and mi ∈ D, each candidate keywords k word frequency TFkRepresent what the word occurred in microblogging document mi Frequency, reverse document frequency IDFs of the microblogging mi containing keyword k in whole microblogging document DkFor:
Wherein:{m:K ∈ m } the microblogging quantity containing keyword k is represented, | du | represent the total quantity of user's issuing microblog; Using TextRank sort methods calculate word importance fractional formula be:
Wherein, S (Vk) represent word k importance, TFkRepresent the frequency that word k occurs in a document, E (Vk) represent The set of words that word k occurred altogether, with appearance as co-occurrence, w in a sentencejkRepresent that word j's and word k is similar Degree, wjiWord j and word i similarity is represented, word j is word k similar word, and word i is similar to word j Word, occurrence number of the similarity equal to word i and word j divided by the number sum occurred respectively;Then changed again based on formula In generation, the importance scores of each word are obtained, be ranked up according to the size of importance scores, the larger word of retention score is made For the TOPN keywords extracted.
Traditional TextRank methods are to extract keyword using word as both candidate nodes so as to construct non-directed graph, are calculated The fraction of each noun node, the calculation formula of conventional method are in non-directed graph:
In formula, d is damping coefficient, is traditionally arranged to be 0.85, S (Vk) represent word k importance, TFkRepresent word Vi The frequency occurred in document, E (Vk) represent word VkThe set of words occurred altogether, it is co-occurrence with always occurring in a sentence, wjkRepresent word j and word k similarity, wjiWord i and word j similarity is represented, similarity is equal to word j and word k Occurrence number divided by the number sum that occurs respectively.It is not difficult to find out by above formula, traditional TextRank methods simply consider Weight of the co-occurrence number of word as side, the shadow calculated without considering the weight information of word in itself node fraction Ring.Therefore, the word information in this paper statistic documents, on the basis of word co-occurrence number is considered, addition passes through TFk×IDFk The term weighing being calculated calculates fraction, selects the larger noun of fraction as candidate keywords.
Step 3:General word2vec models are trained according to general language material, by the TopN keywords of two users, calculated Similarity between two users;Word2vec models are a deep learning models, and text-processing can be reduced to a K dimension Vector space in vector operation, vector operation can express the similitude between word.(1) Chinese language material is downloaded;(2) to institute State Chinese language material and segmented and gone stop words;(3) using Open-Source Tools training word2vec models.Trained according to Open-Source Tools Word2vec models, input language material is exactly operation program, obtains word and the one-to-one data (i.e. model) of vector;So TopN keywords can be mapped to vector, and user interest amount is arrived after crucial term vector is cumulative.
Step 4:Cosine similarity calculating is carried out two-by-two to user interest vector, obtains the Interest Similarity between two users. Cosine similarity, also known as cosine similarity, it is to assess their similarity by calculating two vectorial included angle cosine values. Vector according to coordinate value, is plotted in vector space, such as most common two-dimensional space by cosine similarity.
Then their angle to be tried to achieve again, and draws cosine value corresponding to angle, this cosine value can is used for characterizing, this Two vectorial similitudes.Angle is smaller, and closer to 1, their direction more coincide cosine value, then more similar.
Step 6:Similar users are filtered out according to Interest Similarity, similar users is completed and excavates.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.
Moreover, it will be appreciated that although the present specification is described in terms of embodiments, not each embodiment is only wrapped Containing an independent technical scheme, this narrating mode of specification is only that those skilled in the art should for clarity Using specification as an entirety, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art It is appreciated that other embodiment.

Claims (6)

1. a kind of similar users method for digging based on social data, it is characterised in that comprise the following steps:
Step 1:Critical data is crawled from the microblogging text of user;
Step 2:TOPN keywords are extracted from critical data;
Step 3:General word2vec models, then the TopN keywords by two users are trained according to general language material, calculates and uses The interest vector at family;
Step 4:Cosine similarity calculating is carried out two-by-two to user interest vector, obtains the Interest Similarity between two users;
Step 5:Similar users are filtered out according to Interest Similarity, similar users is completed and excavates.
A kind of 2. similar users method for digging based on social data according to claim 1, it is characterised in that the step In rapid one, the critical data crawled from user's microblogging text includes:
(1) text data of user's issuing microblog and interactive microblogging;
(2) user pays close attention to the table data of big V blogers;
(3) the microblogging text data of the big V blogers issue of user's concern.
3. a kind of similar users method for digging based on social data according to claim 2, it is characterised in that described big V blogers are that microblogging bean vermicelli number is at least 10W, and the critical data is the closely trimestral data of user.
A kind of 4. similar users method for digging based on social data according to claim 1, it is characterised in that the step In rapid two, before TOPN keywords are extracted from critical data, the text data crawled from user's microblogging is subjected to NLP Processing, NLP processing include word segmentation processing and go stop words to handle.
A kind of 5. similar users method for digging based on social data according to claim 1 or 4, it is characterised in that institute State in step 2, TOPN keywords are calculated using TextRank sort methods, specific step is:Assuming that all users are issued Microblogging set and the microblogging text data of big V blogers issue of user's concern be considered as a document D, wherein user u issues Every microblogging be mi, and mi ∈ D, each candidate keywords k word frequency TFkRepresent the frequency that the word occurs in microblogging document mi Rate, reverse document frequency IDFs of the microblogging mi containing keyword k in whole microblogging document DkFor:
<mrow> <msub> <mi>IDF</mi> <mi>k</mi> </msub> <mo>=</mo> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mo>|</mo> <mi>d</mi> <mi>u</mi> <mo>|</mo> </mrow> <mrow> <mn>1</mn> <mo>+</mo> <mo>|</mo> <mo>{</mo> <mi>m</mi> <mo>:</mo> <mi>k</mi> <mo>&amp;Element;</mo> <mi>m</mi> <mo>}</mo> <mo>|</mo> </mrow> </mfrac> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
Wherein:{m:K ∈ m } the microblogging quantity containing keyword k is represented, | du | represent the total quantity of user's issuing microblog;Utilize The importance scores formula that TextRank sort methods calculate word k is:
<mrow> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>d</mi> <mo>)</mo> </mrow> <mo>&amp;times;</mo> <msub> <mi>TF</mi> <mi>k</mi> </msub> <mo>&amp;times;</mo> <msub> <mi>IDF</mi> <mi>k</mi> </msub> <mo>+</mo> <mi>d</mi> <mo>&amp;times;</mo> <msub> <mi>&amp;Sigma;</mi> <mrow> <msub> <mi>V</mi> <mi>j</mi> </msub> <mo>&amp;Element;</mo> <mi>E</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow> </msub> <mfrac> <msub> <mi>w</mi> <mrow> <mi>j</mi> <mi>k</mi> </mrow> </msub> <mrow> <msub> <mi>&amp;Sigma;</mi> <mrow> <msub> <mi>V</mi> <mi>i</mi> </msub> <mo>&amp;Element;</mo> <mi>E</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </msub> <msub> <mi>w</mi> <mrow> <mi>j</mi> <mi>i</mi> </mrow> </msub> </mrow> </mfrac> <mi>S</mi> <mrow> <mo>(</mo> <msub> <mi>V</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
Wherein, S (Vk) represent word k importance, TFkRepresent the frequency that word k occurs in a document, E (Vk) represent word k The set of words occurred altogether, with appearance as co-occurrence, w in a sentencejkRepresent word j and word k similarity, wji Word j and word i similarity is represented, word j is word k similar word, and word i is the similar word with word j, similar Occurrence number of the degree equal to word i and word j divided by the number sum occurred respectively;Then it is iterated, is obtained based on formula again The importance scores of each word, are ranked up according to the size of importance scores, and the larger word of retention score is as extraction The TOPN keywords gone out.
6. according to a kind of similar users method for digging based on social data described in claim 1, it is characterised in that the step Three specific steps include:(1) Chinese language material is downloaded;(2) segmented and gone stop words to the Chinese language material;(3) use Open-Source Tools train word2vec models;(4) the TOPN keywords obtained in step 2 are respectively converted into multiple term vectors; (5) the multiple term vector is all added up and obtains the interest vector of user.
CN201711311721.4A 2017-12-11 2017-12-11 A kind of similar users method for digging based on social data Pending CN107862620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711311721.4A CN107862620A (en) 2017-12-11 2017-12-11 A kind of similar users method for digging based on social data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711311721.4A CN107862620A (en) 2017-12-11 2017-12-11 A kind of similar users method for digging based on social data

Publications (1)

Publication Number Publication Date
CN107862620A true CN107862620A (en) 2018-03-30

Family

ID=61705795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711311721.4A Pending CN107862620A (en) 2017-12-11 2017-12-11 A kind of similar users method for digging based on social data

Country Status (1)

Country Link
CN (1) CN107862620A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002508A (en) * 2018-07-01 2018-12-14 东莞市华睿电子科技有限公司 A kind of text information crawling method based on web crawlers
CN110413956A (en) * 2018-04-28 2019-11-05 南京云问网络技术有限公司 A kind of Text similarity computing method based on bootstrapping
CN111459964A (en) * 2020-03-24 2020-07-28 长沙理工大学 Template-oriented log anomaly detection method and device based on Word2vec
CN112818258A (en) * 2021-03-08 2021-05-18 珠海市蜂巢数据技术有限公司 Social network user searching method based on keywords, computer device and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120035912A1 (en) * 2010-07-30 2012-02-09 Ben-Gurion University Of The Negev Research And Development Authority Multilingual sentence extractor
CN104008092A (en) * 2014-06-10 2014-08-27 复旦大学 Method and system of relation characterizing, clustering and identifying based on the semanteme of semantic space mapping
CN104778158A (en) * 2015-03-04 2015-07-15 新浪网技术(中国)有限公司 Method and device for representing text
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
CN107357889A (en) * 2017-07-11 2017-11-17 北京工业大学 A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120035912A1 (en) * 2010-07-30 2012-02-09 Ben-Gurion University Of The Negev Research And Development Authority Multilingual sentence extractor
CN104008092A (en) * 2014-06-10 2014-08-27 复旦大学 Method and system of relation characterizing, clustering and identifying based on the semanteme of semantic space mapping
CN104778158A (en) * 2015-03-04 2015-07-15 新浪网技术(中国)有限公司 Method and device for representing text
CN105740349A (en) * 2016-01-25 2016-07-06 重庆邮电大学 Sentiment classification method capable of combining Doc2vce with convolutional neural network
CN107357889A (en) * 2017-07-11 2017-11-17 北京工业大学 A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TU SHOUZHONG: ""Mining microblog user interests based on TextRank with TF-IDF factor"", 《THE JOURNAL OF CHINA UNIVERSITIES OF POSTS AND TELECOMMUNICATIONS》 *
魏赟 等: ""融合统计学和TextRank的生物医学文献关键短语抽取"", 《计算机应用与软件》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413956A (en) * 2018-04-28 2019-11-05 南京云问网络技术有限公司 A kind of Text similarity computing method based on bootstrapping
CN110413956B (en) * 2018-04-28 2023-08-01 南京云问网络技术有限公司 Text similarity calculation method based on bootstrapping
CN109002508A (en) * 2018-07-01 2018-12-14 东莞市华睿电子科技有限公司 A kind of text information crawling method based on web crawlers
CN109002508B (en) * 2018-07-01 2021-08-06 上海众引文化传播股份有限公司 Text information crawling method based on web crawler
CN111459964A (en) * 2020-03-24 2020-07-28 长沙理工大学 Template-oriented log anomaly detection method and device based on Word2vec
CN111459964B (en) * 2020-03-24 2023-12-01 长沙理工大学 Log anomaly detection method and device based on Word2vec for template
CN112818258A (en) * 2021-03-08 2021-05-18 珠海市蜂巢数据技术有限公司 Social network user searching method based on keywords, computer device and computer-readable storage medium
CN112818258B (en) * 2021-03-08 2024-05-10 珠海市蜂巢数据技术有限公司 Social network user searching method based on keywords, computer device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN104156436B (en) Social association cloud media collaborative filtering and recommending method
CN103577549B (en) Crowd portrayal system and method based on microblog label
CN103631929B (en) A kind of method of intelligent prompt, module and system for search
Zou et al. Sentiment classification using machine learning techniques with syntax features
CN107862620A (en) A kind of similar users method for digging based on social data
CN105893444A (en) Sentiment classification method and apparatus
CN106484764A (en) User&#39;s similarity calculating method based on crowd portrayal technology
CN105095419B (en) A kind of informational influence power maximization approach towards microblogging particular type of user
El-Fishawy et al. Arabic summarization in twitter social network
CN110457404A (en) Social media account-classification method based on complex heterogeneous network
CN107239512B (en) A kind of microblogging comment spam recognition methods of combination comment relational network figure
CN104572797A (en) Individual service recommendation system and method based on topic model
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
JP2010009307A (en) Feature word automatic learning system, content linkage type advertisement distribution computer system, retrieval linkage type advertisement distribution computer system and text classification computer system, and computer program and method for them
CN103577405A (en) Interest analysis based micro-blogger community classification method
CN108572971A (en) It is a kind of to be used to excavate and the method and apparatus of the relevant keyword of term
CN103631862B (en) Event characteristic evolution excavation method and system based on microblogs
CN104268230A (en) Method for detecting objective points of Chinese micro-blogs based on heterogeneous graph random walk
Lalji et al. Twitter sentiment analysis using hybrid approach
CN104573057A (en) Account correlation method used for UGC (User Generated Content)-spanning website platform
CN104281565A (en) Semantic dictionary constructing method and device
Zhang et al. Reverse attack: Black-box attacks on collaborative recommendation
CN102063497B (en) Open type knowledge sharing platform and entry processing method thereof
CN108446333A (en) A kind of big data text mining processing system and its method
Gao et al. Topology imbalance and relation inauthenticity aware hierarchical graph attention networks for fake news detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180330

RJ01 Rejection of invention patent application after publication