CN107862620A - A kind of similar users method for digging based on social data - Google Patents
A kind of similar users method for digging based on social data Download PDFInfo
- Publication number
- CN107862620A CN107862620A CN201711311721.4A CN201711311721A CN107862620A CN 107862620 A CN107862620 A CN 107862620A CN 201711311721 A CN201711311721 A CN 201711311721A CN 107862620 A CN107862620 A CN 107862620A
- Authority
- CN
- China
- Prior art keywords
- mrow
- word
- msub
- user
- microblogging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 239000013598 vector Substances 0.000 claims abstract description 22
- 239000000463 material Substances 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 235000010627 Phaseolus vulgaris Nutrition 0.000 claims description 8
- 244000046052 Phaseolus vulgaris Species 0.000 claims description 8
- 230000002452 interceptive effect Effects 0.000 claims description 7
- 230000014759 maintenance of location Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 2
- 238000009412 basement excavation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000005303 weighing Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 210000004209 hair Anatomy 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of similar users method for digging based on social data of present invention offer, the field of social network being related in Internet information technique, including:Step 1:Critical data is crawled from the microblogging text of user;Step 2:TOPN keywords are extracted from critical data;Step 3:General word2vec models are trained according to TOPN keywords;Step 4:Calculate user interest vector;Step 5:Cosine similarity calculating is carried out two-by-two to user interest vector, obtains the Interest Similarity between two users;Step 6:Similar users are filtered out according to Interest Similarity, similar users is completed and excavates.The present invention solve at present by the strong correlation of this method of other users in user's issuing microblog, comment, forwarding it is very sparse, be unfavorable for extensive similar users and open up newly, it is low to result in recall rate;And this method of similar situation of big V user is paid close attention in the case of the Interest Similarity degree of routine calculates and still has Sparse by user, the problem of final recall rate is still relatively low.
Description
Technical field
The present invention relates to the field of social network in Internet information technique, more particularly to a kind of estimation and social data
Similar users method for digging.
Background technology
Microblogging receives the concern of more and more users as a kind of new social networking service.According to statistics, have daily
Thousands of new users adds microblog, while generates hundreds of millions of micro-blog informations.Increasing businessman simultaneously,
Also enterprise account is opened one after another, receives bean vermicelli (objective group).By carrying out specific aim marketing to client, Brang Awareness is established,
As the main flow means of brand promotion, the marketing.How from existing client, association expands new user, always each
The problem of enterprise is thought deeply, user's extended method is carried out based on social big data, is increasingly recognized.Seeking interest similar users is
One important means of user's extension, the existing method for finding similar users include two kinds, below, by both approaches
Step and determination carry out corresponding explanation.
The first, the similar users using the other users of@in user's issuing microblog, comment and forwarding as the user, this
The method that kind finds similar users is very simple, and directly by crawling the microblogging of the user, parse just can find pass therein
System;For user with the interest phase same sex between user be present, this is caused by the exclusive functional attributes of microblogging, is a kind of strong correlation
Property.But this strong correlation is very sparse, it is unfavorable for extensive similar users and opens up newly, it is low to result in recall rate.
Second, determine to judge according to the similar situation for paying close attention to big V user between user similar between two users
Degree.Steps of the method are:(1) the concern list of user is crawled, and is sieved by the bean vermicelli amount of user in list or self brief introduction
Select wherein all big V users with medium property;(2) the interactive number (bag between the user and each big V user is gathered
Comment is included, forwards and thumbs up), and the bean vermicelli amount of big V user.
(3) user is calculated to each big V user's by formula " interest index=interactive number/big V beans vermicelli number "
Interest index;And obtain the vector of each big V user interests index;(4) again by calculating two users V big to two respectively
The vectorial cosine similarity of the interest index of user, to represent Interest Similarity between two users.(5) by calculating a user
With treating user in expanded set, similar number and similar average value, to judge whether the user is the user to be extended.This side
Method can preferably tackle the Similarity Measure of user's hot topic interest, but be calculated for the Interest Similarity degree of routine, however it remains
Bottleneck be present in the situation of Sparse, similar users mining effect.Such as the big V or brand of two minorities, A blogers and B blogers,
They belong to same field, and the common factor of bean vermicelli is little, and a user pays close attention to A blogers, b user pays close attention to B blogers, and a belongs to similar to b
User, but can not be calculated in second method, equally, the final recall rate of this method for digging is relatively low.
The content of the invention
It is an object of the invention to:For solve it is existing by other users in user's issuing microblog, comment, forwarding this
The strong correlation of kind of method is very sparse, is unfavorable for extensive similar users opens up newly, and it is low to result in recall rate;And by using
This method of similar situation that big V user is paid close attention at family calculates the feelings for still having Sparse for the Interest Similarity of routine
Condition, the problem of final recall rate is still relatively low, the present invention provide a kind of similar users method for digging based on social data.
Technical scheme is as follows:
A kind of similar users method for digging based on social data, comprises the following steps:
Step 1:Critical data is crawled from the microblogging text of user;
Step 2:TOPN keywords are extracted from critical data;
Step 3:General word2vec models are trained according to general language material, by the TopN keywords of two users, calculated
Interest vector between user.
Step 4:Cosine similarity calculating is carried out two-by-two to user interest vector, obtains the Interest Similarity between two users;
Step 5:Similar users are filtered out according to Interest Similarity, similar users is completed and excavates.
Specifically, in the step 1, the critical data crawled from user's microblogging text includes:(1) user's issue is micro-
The rich text data with interactive microblogging;(2) user pays close attention to the table data of big V blogers;(3) the big V blogers issue of user's concern
Microblogging text data.
Preferably, the big V blogers are that microblogging bean vermicelli number is at least 10W, and the critical data is that user is closely trimestral
Data.
Specifically,, will be from user's microblogging before TOPN keywords are extracted from critical data in the step 2
In the text data that crawls carry out NLP processing, NLP processing includes word segmentation processing and goes stop words to handle.
Specifically, in the step 2, TOPN keywords are calculated using TextRank sort methods, specific step is:
Assuming that the microblogging set that all users issue is considered as into a document D, every microblogging of wherein user u issues is mi, and mi ∈
D, each candidate keywords k word frequency TFkThe frequency that the word occurs in microblogging document mi is represented, contains the micro- of keyword k
Reverse document frequency IDFs of the rich mi in whole microblogging document DkFor:
Wherein:{m:K ∈ m } the microblogging quantity containing keyword k is represented, | du | represent the total quantity of user's issuing microblog;
Word V is calculated using TextRank sort methodsiThe fractional formula of importance is:
Wherein, S (Vk) represent word k importance, TFkRepresent the frequency that word k occurs in a document, E (Vk) represent
The set of words that word k occurred altogether, with appearance as co-occurrence, w in a sentencejkRepresent that word j's and word k is similar
Degree, wjiWord j and word i similarity is represented, word j is word k similar word, and word i is similar to word j
Word, occurrence number of the similarity equal to word i and word j divided by the number sum occurred respectively;Then changed again based on formula
In generation, the importance scores of each word are obtained, be ranked up according to the size of importance scores, the larger word of retention score is made
For the TOPN keywords extracted.
The specific steps of the step 3 include:(1) Chinese language material is downloaded;(2) to the Chinese language material carry out participle and
Remove stop words;(3) using Open-Source Tools training word2vec models;(4) TOPN keywords will be obtained in step 2 to change respectively
For multiple term vectors;(5) the multiple term vector is all added up and obtains the interest vector of user.
After such scheme, beneficial effects of the present invention are as follows:
(1) present invention broken it is traditional only according to by other users in user's issuing microblog, comment, forwarding this
Kind of method, or pay close attention to by user this method of similar situation of big V user, it is proposed that from the comprehensive social data of user
To excavate similar users, big V blogers can regard a bridge as, draw more interactive microblog datas.Even if it is not same
One big V blogers, but their interactive content of microblog are similar, for example be all that amusement discloses bloger, cuisines bloger, wears and take bloger,
Also similar users can be calculated as, here it is the reason for recall rate expansion, after the method using the present invention, to the recall rate phase of client
Than being compared in the method for digging for being solely focused on big V user's similar situation, recall rate increases to 66.72% from 37.2%, recall rate
Almost it is doubled, its effect highly significant.
Because the scale of the similar users of excavation is bigger, the information crawled is more, and amount of calculation increases, and calculating speed, which slows down, is
Inevitably, can be controlled by filtering user, such as originally according to 10,000 usage mining similar users, now by 1
Ten thousand user filterings go out representational 1,000.
(2) traditional TextRank methods are to extract keyword using word as both candidate nodes so as to construct non-directed graph,
The fraction of each noun node in non-directed graph is calculated, the shadow calculated without considering the weight information of word in itself node fraction
Ring, if not considering word importance, some unessential words, can constantly diffuse out more inessential words so that follow-up phase
Seriously reduced like degree accuracy, therefore, the word information in this paper statistic documents, on the basis of word co-occurrence number is considered,
Addition passes through TFk×IDFkThe term weighing being calculated calculates fraction, selects the larger noun of fraction as candidate key
Word, so, this guarantees accuracy rate.
Embodiment
Example is applied below in conjunction with the present invention, the technical scheme in the present embodiment is clearly and completely described, is shown
So, described embodiment is only the part of the embodiment of the present invention, rather than whole embodiments.Based in the present invention
Embodiment, the every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made, all
Belong to the scope of protection of the invention.
The similar users method for digging based on social data in the present embodiment, comprises the following steps:
Step 1:Critical data is crawled from the microblogging text of user;The critical data crawled from user's microblogging text
Including:(1) text data of user's issuing microblog and interactive microblogging;(2) user pays close attention to the table data of big V blogers;(3) user
The microblogging text data of the big V blogers issue of concern.Big V blogers are that microblogging bean vermicelli number is at least 10W, if excavation amount demand is big
5W can be used, and as threshold values is excavated, the critical data is the closely trimestral data of user.
Step 2:TOPN keywords are extracted from critical data;, before TOPN keywords are extracted from critical data
The text data crawled from user's microblogging is subjected to NLP processing, NLP processing includes word segmentation processing and goes stop words to handle.
During specific extraction TOPN keywords, TOPN keywords are calculated using TextRank sort methods, specific step is:Assuming that
The microblogging set that all users issue and the microblogging set that the big V of concern is issued are considered as a document D, wherein user u hairs
Every microblogging of cloth is mi, and mi ∈ D, each candidate keywords k word frequency TFkRepresent what the word occurred in microblogging document mi
Frequency, reverse document frequency IDFs of the microblogging mi containing keyword k in whole microblogging document DkFor:
Wherein:{m:K ∈ m } the microblogging quantity containing keyword k is represented, | du | represent the total quantity of user's issuing microblog;
Using TextRank sort methods calculate word importance fractional formula be:
Wherein, S (Vk) represent word k importance, TFkRepresent the frequency that word k occurs in a document, E (Vk) represent
The set of words that word k occurred altogether, with appearance as co-occurrence, w in a sentencejkRepresent that word j's and word k is similar
Degree, wjiWord j and word i similarity is represented, word j is word k similar word, and word i is similar to word j
Word, occurrence number of the similarity equal to word i and word j divided by the number sum occurred respectively;Then changed again based on formula
In generation, the importance scores of each word are obtained, be ranked up according to the size of importance scores, the larger word of retention score is made
For the TOPN keywords extracted.
Traditional TextRank methods are to extract keyword using word as both candidate nodes so as to construct non-directed graph, are calculated
The fraction of each noun node, the calculation formula of conventional method are in non-directed graph:
In formula, d is damping coefficient, is traditionally arranged to be 0.85, S (Vk) represent word k importance, TFkRepresent word Vi
The frequency occurred in document, E (Vk) represent word VkThe set of words occurred altogether, it is co-occurrence with always occurring in a sentence,
wjkRepresent word j and word k similarity, wjiWord i and word j similarity is represented, similarity is equal to word j and word k
Occurrence number divided by the number sum that occurs respectively.It is not difficult to find out by above formula, traditional TextRank methods simply consider
Weight of the co-occurrence number of word as side, the shadow calculated without considering the weight information of word in itself node fraction
Ring.Therefore, the word information in this paper statistic documents, on the basis of word co-occurrence number is considered, addition passes through TFk×IDFk
The term weighing being calculated calculates fraction, selects the larger noun of fraction as candidate keywords.
Step 3:General word2vec models are trained according to general language material, by the TopN keywords of two users, calculated
Similarity between two users;Word2vec models are a deep learning models, and text-processing can be reduced to a K dimension
Vector space in vector operation, vector operation can express the similitude between word.(1) Chinese language material is downloaded;(2) to institute
State Chinese language material and segmented and gone stop words;(3) using Open-Source Tools training word2vec models.Trained according to Open-Source Tools
Word2vec models, input language material is exactly operation program, obtains word and the one-to-one data (i.e. model) of vector;So
TopN keywords can be mapped to vector, and user interest amount is arrived after crucial term vector is cumulative.
Step 4:Cosine similarity calculating is carried out two-by-two to user interest vector, obtains the Interest Similarity between two users.
Cosine similarity, also known as cosine similarity, it is to assess their similarity by calculating two vectorial included angle cosine values.
Vector according to coordinate value, is plotted in vector space, such as most common two-dimensional space by cosine similarity.
Then their angle to be tried to achieve again, and draws cosine value corresponding to angle, this cosine value can is used for characterizing, this
Two vectorial similitudes.Angle is smaller, and closer to 1, their direction more coincide cosine value, then more similar.
Step 6:Similar users are filtered out according to Interest Similarity, similar users is completed and excavates.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power
Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling
Change is included in the present invention.
Moreover, it will be appreciated that although the present specification is described in terms of embodiments, not each embodiment is only wrapped
Containing an independent technical scheme, this narrating mode of specification is only that those skilled in the art should for clarity
Using specification as an entirety, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art
It is appreciated that other embodiment.
Claims (6)
1. a kind of similar users method for digging based on social data, it is characterised in that comprise the following steps:
Step 1:Critical data is crawled from the microblogging text of user;
Step 2:TOPN keywords are extracted from critical data;
Step 3:General word2vec models, then the TopN keywords by two users are trained according to general language material, calculates and uses
The interest vector at family;
Step 4:Cosine similarity calculating is carried out two-by-two to user interest vector, obtains the Interest Similarity between two users;
Step 5:Similar users are filtered out according to Interest Similarity, similar users is completed and excavates.
A kind of 2. similar users method for digging based on social data according to claim 1, it is characterised in that the step
In rapid one, the critical data crawled from user's microblogging text includes:
(1) text data of user's issuing microblog and interactive microblogging;
(2) user pays close attention to the table data of big V blogers;
(3) the microblogging text data of the big V blogers issue of user's concern.
3. a kind of similar users method for digging based on social data according to claim 2, it is characterised in that described big
V blogers are that microblogging bean vermicelli number is at least 10W, and the critical data is the closely trimestral data of user.
A kind of 4. similar users method for digging based on social data according to claim 1, it is characterised in that the step
In rapid two, before TOPN keywords are extracted from critical data, the text data crawled from user's microblogging is subjected to NLP
Processing, NLP processing include word segmentation processing and go stop words to handle.
A kind of 5. similar users method for digging based on social data according to claim 1 or 4, it is characterised in that institute
State in step 2, TOPN keywords are calculated using TextRank sort methods, specific step is:Assuming that all users are issued
Microblogging set and the microblogging text data of big V blogers issue of user's concern be considered as a document D, wherein user u issues
Every microblogging be mi, and mi ∈ D, each candidate keywords k word frequency TFkRepresent the frequency that the word occurs in microblogging document mi
Rate, reverse document frequency IDFs of the microblogging mi containing keyword k in whole microblogging document DkFor:
<mrow>
<msub>
<mi>IDF</mi>
<mi>k</mi>
</msub>
<mo>=</mo>
<mi>l</mi>
<mi>o</mi>
<mi>g</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mrow>
<mo>|</mo>
<mi>d</mi>
<mi>u</mi>
<mo>|</mo>
</mrow>
<mrow>
<mn>1</mn>
<mo>+</mo>
<mo>|</mo>
<mo>{</mo>
<mi>m</mi>
<mo>:</mo>
<mi>k</mi>
<mo>&Element;</mo>
<mi>m</mi>
<mo>}</mo>
<mo>|</mo>
</mrow>
</mfrac>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein:{m:K ∈ m } the microblogging quantity containing keyword k is represented, | du | represent the total quantity of user's issuing microblog;Utilize
The importance scores formula that TextRank sort methods calculate word k is:
<mrow>
<mi>S</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>V</mi>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>-</mo>
<mi>d</mi>
<mo>)</mo>
</mrow>
<mo>&times;</mo>
<msub>
<mi>TF</mi>
<mi>k</mi>
</msub>
<mo>&times;</mo>
<msub>
<mi>IDF</mi>
<mi>k</mi>
</msub>
<mo>+</mo>
<mi>d</mi>
<mo>&times;</mo>
<msub>
<mi>&Sigma;</mi>
<mrow>
<msub>
<mi>V</mi>
<mi>j</mi>
</msub>
<mo>&Element;</mo>
<mi>E</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>V</mi>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</msub>
<mfrac>
<msub>
<mi>w</mi>
<mrow>
<mi>j</mi>
<mi>k</mi>
</mrow>
</msub>
<mrow>
<msub>
<mi>&Sigma;</mi>
<mrow>
<msub>
<mi>V</mi>
<mi>i</mi>
</msub>
<mo>&Element;</mo>
<mi>E</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>V</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</msub>
<msub>
<mi>w</mi>
<mrow>
<mi>j</mi>
<mi>i</mi>
</mrow>
</msub>
</mrow>
</mfrac>
<mi>S</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>V</mi>
<mi>j</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, S (Vk) represent word k importance, TFkRepresent the frequency that word k occurs in a document, E (Vk) represent word k
The set of words occurred altogether, with appearance as co-occurrence, w in a sentencejkRepresent word j and word k similarity, wji
Word j and word i similarity is represented, word j is word k similar word, and word i is the similar word with word j, similar
Occurrence number of the degree equal to word i and word j divided by the number sum occurred respectively;Then it is iterated, is obtained based on formula again
The importance scores of each word, are ranked up according to the size of importance scores, and the larger word of retention score is as extraction
The TOPN keywords gone out.
6. according to a kind of similar users method for digging based on social data described in claim 1, it is characterised in that the step
Three specific steps include:(1) Chinese language material is downloaded;(2) segmented and gone stop words to the Chinese language material;(3) use
Open-Source Tools train word2vec models;(4) the TOPN keywords obtained in step 2 are respectively converted into multiple term vectors;
(5) the multiple term vector is all added up and obtains the interest vector of user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711311721.4A CN107862620A (en) | 2017-12-11 | 2017-12-11 | A kind of similar users method for digging based on social data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711311721.4A CN107862620A (en) | 2017-12-11 | 2017-12-11 | A kind of similar users method for digging based on social data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107862620A true CN107862620A (en) | 2018-03-30 |
Family
ID=61705795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711311721.4A Pending CN107862620A (en) | 2017-12-11 | 2017-12-11 | A kind of similar users method for digging based on social data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107862620A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109002508A (en) * | 2018-07-01 | 2018-12-14 | 东莞市华睿电子科技有限公司 | A kind of text information crawling method based on web crawlers |
CN110413956A (en) * | 2018-04-28 | 2019-11-05 | 南京云问网络技术有限公司 | A kind of Text similarity computing method based on bootstrapping |
CN111459964A (en) * | 2020-03-24 | 2020-07-28 | 长沙理工大学 | Template-oriented log anomaly detection method and device based on Word2vec |
CN112818258A (en) * | 2021-03-08 | 2021-05-18 | 珠海市蜂巢数据技术有限公司 | Social network user searching method based on keywords, computer device and computer-readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120035912A1 (en) * | 2010-07-30 | 2012-02-09 | Ben-Gurion University Of The Negev Research And Development Authority | Multilingual sentence extractor |
CN104008092A (en) * | 2014-06-10 | 2014-08-27 | 复旦大学 | Method and system of relation characterizing, clustering and identifying based on the semanteme of semantic space mapping |
CN104778158A (en) * | 2015-03-04 | 2015-07-15 | 新浪网技术(中国)有限公司 | Method and device for representing text |
CN105740349A (en) * | 2016-01-25 | 2016-07-06 | 重庆邮电大学 | Sentiment classification method capable of combining Doc2vce with convolutional neural network |
CN107357889A (en) * | 2017-07-11 | 2017-11-17 | 北京工业大学 | A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude |
-
2017
- 2017-12-11 CN CN201711311721.4A patent/CN107862620A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120035912A1 (en) * | 2010-07-30 | 2012-02-09 | Ben-Gurion University Of The Negev Research And Development Authority | Multilingual sentence extractor |
CN104008092A (en) * | 2014-06-10 | 2014-08-27 | 复旦大学 | Method and system of relation characterizing, clustering and identifying based on the semanteme of semantic space mapping |
CN104778158A (en) * | 2015-03-04 | 2015-07-15 | 新浪网技术(中国)有限公司 | Method and device for representing text |
CN105740349A (en) * | 2016-01-25 | 2016-07-06 | 重庆邮电大学 | Sentiment classification method capable of combining Doc2vce with convolutional neural network |
CN107357889A (en) * | 2017-07-11 | 2017-11-17 | 北京工业大学 | A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude |
Non-Patent Citations (2)
Title |
---|
TU SHOUZHONG: ""Mining microblog user interests based on TextRank with TF-IDF factor"", 《THE JOURNAL OF CHINA UNIVERSITIES OF POSTS AND TELECOMMUNICATIONS》 * |
魏赟 等: ""融合统计学和TextRank的生物医学文献关键短语抽取"", 《计算机应用与软件》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110413956A (en) * | 2018-04-28 | 2019-11-05 | 南京云问网络技术有限公司 | A kind of Text similarity computing method based on bootstrapping |
CN110413956B (en) * | 2018-04-28 | 2023-08-01 | 南京云问网络技术有限公司 | Text similarity calculation method based on bootstrapping |
CN109002508A (en) * | 2018-07-01 | 2018-12-14 | 东莞市华睿电子科技有限公司 | A kind of text information crawling method based on web crawlers |
CN109002508B (en) * | 2018-07-01 | 2021-08-06 | 上海众引文化传播股份有限公司 | Text information crawling method based on web crawler |
CN111459964A (en) * | 2020-03-24 | 2020-07-28 | 长沙理工大学 | Template-oriented log anomaly detection method and device based on Word2vec |
CN111459964B (en) * | 2020-03-24 | 2023-12-01 | 长沙理工大学 | Log anomaly detection method and device based on Word2vec for template |
CN112818258A (en) * | 2021-03-08 | 2021-05-18 | 珠海市蜂巢数据技术有限公司 | Social network user searching method based on keywords, computer device and computer-readable storage medium |
CN112818258B (en) * | 2021-03-08 | 2024-05-10 | 珠海市蜂巢数据技术有限公司 | Social network user searching method based on keywords, computer device and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104156436B (en) | Social association cloud media collaborative filtering and recommending method | |
CN103577549B (en) | Crowd portrayal system and method based on microblog label | |
CN103631929B (en) | A kind of method of intelligent prompt, module and system for search | |
Zou et al. | Sentiment classification using machine learning techniques with syntax features | |
CN107862620A (en) | A kind of similar users method for digging based on social data | |
CN105893444A (en) | Sentiment classification method and apparatus | |
CN106484764A (en) | User's similarity calculating method based on crowd portrayal technology | |
CN105095419B (en) | A kind of informational influence power maximization approach towards microblogging particular type of user | |
El-Fishawy et al. | Arabic summarization in twitter social network | |
CN110457404A (en) | Social media account-classification method based on complex heterogeneous network | |
CN107239512B (en) | A kind of microblogging comment spam recognition methods of combination comment relational network figure | |
CN104572797A (en) | Individual service recommendation system and method based on topic model | |
CN104978332B (en) | User-generated content label data generation method, device and correlation technique and device | |
JP2010009307A (en) | Feature word automatic learning system, content linkage type advertisement distribution computer system, retrieval linkage type advertisement distribution computer system and text classification computer system, and computer program and method for them | |
CN103577405A (en) | Interest analysis based micro-blogger community classification method | |
CN108572971A (en) | It is a kind of to be used to excavate and the method and apparatus of the relevant keyword of term | |
CN103631862B (en) | Event characteristic evolution excavation method and system based on microblogs | |
CN104268230A (en) | Method for detecting objective points of Chinese micro-blogs based on heterogeneous graph random walk | |
Lalji et al. | Twitter sentiment analysis using hybrid approach | |
CN104573057A (en) | Account correlation method used for UGC (User Generated Content)-spanning website platform | |
CN104281565A (en) | Semantic dictionary constructing method and device | |
Zhang et al. | Reverse attack: Black-box attacks on collaborative recommendation | |
CN102063497B (en) | Open type knowledge sharing platform and entry processing method thereof | |
CN108446333A (en) | A kind of big data text mining processing system and its method | |
Gao et al. | Topology imbalance and relation inauthenticity aware hierarchical graph attention networks for fake news detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180330 |
|
RJ01 | Rejection of invention patent application after publication |