CN106021232B - A kind of microblogging vest account recognition methods based on interdependent syntactic relation - Google Patents
A kind of microblogging vest account recognition methods based on interdependent syntactic relation Download PDFInfo
- Publication number
- CN106021232B CN106021232B CN201610350203.2A CN201610350203A CN106021232B CN 106021232 B CN106021232 B CN 106021232B CN 201610350203 A CN201610350203 A CN 201610350203A CN 106021232 B CN106021232 B CN 106021232B
- Authority
- CN
- China
- Prior art keywords
- microblogging
- relationship
- vest
- support
- interdependent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/04—Real-time or near real-time messaging, e.g. instant messaging [IM]
- H04L51/046—Interoperability with other network applications or services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
Abstract
The microblogging vest account recognition methods based on interdependent syntactic relation that the present invention relates to a kind of, the specific steps are as follows: step 1: microblogging text data is obtained;Step 2: being segmented using participle software, removal English and punctuation mark;Step 3: using interdependent syntactic analysis software, carries out interdependent syntactic analysis to the text after having segmented, every microblogging can obtain a syntactic analysis result;Step 4: each text in someone microblogging list obtains analysis result using the method for step 3;Interdependent syntactic structure is commonly used using what Apriori algorithm calculated someone microblogging;Step 5: two accounts for needing to judge whether it is vest relationship are compared according to the result of step 1 to four respectively, and identical is vest relationship, conversely, being then non-vest relationship.The method of the present invention can be used in social network sites for management and government the tracing about the network crime of network security, can be quick, efficiently identify vest account.
Description
Technical field
The microblogging vest account recognition methods based on interdependent syntactic relation that the present invention relates to a kind of, it is applied to social networks
Middle text author relationships identify, and belong to data mining technology field.
Background technique
Currently, with the rapid development of science and technology, the especially development of Internet technology, global interconnection network users total amount has surpassed
3000000000.At home, end 2015, the Sina weibo moon any active ues reach 2.36 hundred million.In social networks, someone can register one
A microblogging ID, what is be commonly used or log in is known as main ID, and present many online friends are simultaneously unsatisfactory for a microblogging ID, can register other
Microblogging ID is made a speech, these non-master ID accounts are known as vest account often when being not desired to show main ID identity with the account of non-master ID.
Vest account has the one side of its negative function, such as: use vest account spread rumors;Verbal attack is carried out under others' article
Or slander, induce incorrect values;Main microblogging ID promote using vest account etc..Such behavior can shadow
Ring the safety and fairness of network.The system of real name of social networks is a difficult problem, and most of online friend does not carry out real name
Certification, it is not easy to know their true identity.When online friend delivers discordant speech, such as: flame is propagated, is humiliated
Other people even betrayal of state secrets etc. are calumniated, is same people by vest account specification, may consequently contribute to government department and fight crime
Behavior.
Currently, author's Study of recognition based on diction is more taken seriously, this method is equally applicable to microblogging short essay
The identification of this progress vest account.The present invention carries out interdependent syntactic relation analysis to microblogging short text, analyzes the language of microblog account
Style is sayed, to identify microblogging vest account.
Summary of the invention
1, purpose:
The microblogging vest account recognition methods based on interdependent syntactic relation that it is an object of the present invention to provide a kind of, it is a large amount of possessing
In the microblogging of user, vest relationship can be quickly and efficiently identified, and then facilitate the further work of other departments.
The principle of the present invention is: carrying out the processing of natural language first, carries out to all microblogging short texts of some user
Participle obtains word segmentation result and analyzes the interdependent syntactic structure of short text, the interdependent syntactic structure of every microblogging is protected
It deposits, until all microblogging text analyzings of this user finish.This user is excavated using Apriori algorithm and uses interdependent sentence
The frequent mode of method structure, the i.e. diction of user thus.The interdependent syntactic structure of two users is compared, and then may determine that
It whether is vest relationship.
2, technical solution: technical solution provided by the invention is as follows:
The present invention is a kind of microblogging vest account recognition methods based on interdependent syntactic relation, and this method specific steps are such as
Under:
Step 1: microblogging text data is obtained.
Step 2: being segmented using participle software, removal English and punctuation mark.
Step 3: using interdependent syntactic analysis software, carries out interdependent syntactic analysis, every microblogging to the text after having segmented
A syntactic analysis can be obtained as a result, the syntactic structure form of every microblogging is illustrated in fig. 1 shown below after analysis:
The sentence is the result after participle.Wherein, ROOT indicates to handle the sentence of text, on the arrow line above sentence
Letter indicate syntactic structure.
After interdependent syntactic analysis, the preservation result formats of microblogging short text are as follows:
WSi: WSi(R1, R2... Ri..., Rn) (i ∈ [1, n])
Wherein, WSiIndicate the interdependent syntactic analysis of i-th microblogging in someone microblogging list as a result, RnIndicate the text according to
Deposit syntactic structure.
Interdependent syntactic analysis mark relationship (totally 14 kinds) and meaning for including in the interdependent syntactic analysis software are as follows:
Step 4: each text in someone microblogging list is analyzed using the method for step 3 as a result, as follows:
WS: WS(WS1, WS2... WSi..., WSn)
Interdependent syntactic structure is commonly used using what Apriori algorithm calculated someone microblogging.Firstly, giving a minimum support
Threshold value SUPmin, this minimum support is to be manually set according to the smaller value of an item collection experimental result, and pass through many experiments
This parameter is adjusted, to obtain best experiment effect.Scanning microblogging list for the first time, calculates each syntactic structure (Ri)
Support (Support).
Support Support:
Wherein, A, B indicate that a certain syntactic structure, P (A ∪ B) indicate A, the probability that B occurs simultaneously.Support is less than
SUPminSyntactic structure delete.It as a result is a frequent item collection.Subset in a frequent item collection is combined two-by-two, carries out second
Secondary scanning, calculates support, and removal is less than SUPminSyntactic structure, obtain frequent two item collection.It is repeated in, until in K item collection
Support be both greater than SUPmin, frequent K item collection is obtained, as the common interdependent syntactic structure of user's microblogging.
Step 5: two accounts for needing to judge whether it is vest relationship are carried out according to the result of step 1 to four respectively
Comparison, identical is vest relationship, conversely, being then non-vest relationship.
3, advantage and effect: beneficial effects of the present invention: provided by the present invention a kind of based on the micro- of interdependent syntactic relation
Method after training by a large amount of data can be utilized and network is pacified in social network sites by rich vest account recognition methods
Full management and government tracing about the network crime, can be quick, efficiently identifies vest account.
Detailed description of the invention
Fig. 1 is the syntactic structure form of microblogging.
Specific embodiment
Specific implementation step, and referring to Figure 1;
Step 1: the content of microblog of Sina weibo partial user is extracted.Such as: (take preceding ten microbloggings of two users
For) user 1:
I has just used # whip # to share a sound photo, and photo is listened to tell a story, very interesting.http://t.cn/
Zj6H170 (comes from@whip official microblogging)
It is said that envying that envy is hated all haunting daily, it is not known that upset how the artificial of intelligence is all that appearance is pretending to be what one is not by it
People.
I am very serious!Right? I will become the earth emperor person!Here monster too dish, the general appearance of my mind is it
All frighten urine!Become my comrade-in-arms fastly.Http:// t.cn/zYilNjV is surrounded and watched in click
I am tired, and just like this, some day, I just disappeared from your life some things perhaps, I am tired!
I has just used # whip # to share a sound photo, and photo is listened to tell a story, very interesting.http://t.cn/
Zjijni4 (comes from@whip official microblogging)
To hear " song of leaving the boundary " audition address > > > http://t.cn/zji5THs (singing # by # to record) that I sings
To hear " there are also me " audition address > > > http://t.cn/zjJevkB (singing # by # to record) that I sings
To hear " Bei Jiuchang " audition address > > > http://t.cn/zjJupTa (singing # by # to record) that I sings
" Farewell My Concubine " is just finished watching, I obtains slowly mood!
It wants various transhipments this year, first recruits peach blossom!
User 2:
It has seen " sheet ", the inside is other all right other than the play in modest teacher is behave excellently relatively, too rotten!
It spends to see that sincerity does not worth!
To hear " elegy " audition address > > > http://t.cn/zHnCnam (singing # by # to record) that I sings
To hear " because love " audition address > > > http://t.cn/zHRnyhm (singing # by # to record) that I sings
Some program of Dragon TV is being had a dress rehearsal
I has just used # whip # to share a sound photo, and photo is listened to tell a story, very interesting.http://t.cn/
ZYpnSj (comes from@whip official microblogging)
I is not desired to do this row suddenly, has a evil spirit that me is allowed not have interest to this row
To hear " human world " audition address > > > http://t.cn/zYpXLeo (singing # by # to record) that I sings
Yesterday, evacuation saw that the Jiangsu the@satellite TV of this week was very terrible, and mood has been got well much immediately.
Without words do not say be we once, having nothing to speak is our final result.
Cold wind blows on one's face outside vehicle window, and it is all primary watchful for striking each time, and there is no so unhappy for I, moreover it is possible to be laughed at raw
It is living!
Step 2: it according to the microblogging text extracted in Sina weibo, is segmented and deletes English and punctuation mark:
User 1:
I has just shared a sound photo with whip and photo is listened to tell a story very interesting come from
Whip official microblogging
It is said that envying that envy is hated all is haunting do not know how by its artificial of upset intelligence be all appearance people mould dog daily
The people of sample
I it is very serious right I will become my mind of the monster of the earth emperor person here too dish
It is general appearance they are all frightened urinated the comrade-in-arms for becoming me fastly click surround and watch
I tired out some things just like this perhaps some day I that me is just disappeared from your life is tired
I has just shared a sound photo with whip and photo is listened to tell a story very interesting come from
Whip official microblogging
To hear bent audition address of leaving the boundary that I sing by singing recording
To hear that I sing, there are also my audition addresses by singing recording
To hear northern wine field audition address that I sings by singing recording
Just having finished watching Farewell My Concubine, I obtains slowly mood
Various transhipments are wanted this year first to recruit peach blossom
User 2:
Seen large stretch of the inside in addition to the play in modest teacher is opposite behave excellently other than it is other all right too rotten
It spends to see that sincerity does not worth
To hear elegy audition address that I sings by singing recording
To hear that I sings because love audition address is by singing recording
Some program of Dragon TV is being had a dress rehearsal
I has just shared a sound photo with whip and photo is listened to tell a story very interesting come from
Whip official microblogging
I, which is not desired to do this row suddenly, has a evil spirit that me is allowed not have interest to this row
To hear human world audition address that I sings by singing recording
Yesterday, evacuation saw that the very terrible mood of the Jiangsu satellite TV of this week had been got well much immediately
Do not say be our once have nothing to speak to be our final result without words
Cold wind blow on one's face outside vehicle window strike each time all be it is primary it is watchful I there is no so it is not unhappy can also laugh at it is raw
It is living
Step 3: the interdependent syntactic structure of microblogging, obtained result are analyzed are as follows:
Support Support:Probability that expression event A and B occur simultaneously (A with
Number/total event times that B occurs simultaneously), total event times are identical, for convenience of calculating, time that event A and B are occurred simultaneously
Number (being the number that syntactic structure occurs in this example) is set as support, the minimum support threshold value SUP of settingmin=6:
User 1:
The item collection generated after scanning for the first time are as follows:
Removal value is less than SUPminItem collection, CMP.
Two item collections generated after second of scanning are as follows:
Removal value is less than SUPminItem collection, (POB, COO), (RAD, COO).
According to above method, frequent K item collection, final result are successively found are as follows: (HED, SBV, ADV, POB, RAD, ATT,
VOB)。
So the common syntactic structure of 1 microblogging of user is (HED, SBV, ADV, POB, RAD, ATT, VOB).
User 2:
The item collection generated after scanning for the first time are as follows:
Removal value is less than SUPminItem collection, CMP, DBL.
Two item collections generated after second of scanning are as follows:
Removal value is less than SUPminItem collection, (POB, COO), (VOB, COO).
According to above method, frequent K item collection, final result are successively found are as follows: (HED, SBV, ADV, POB, RAD, ATT,
VOB)。
So the common syntactic structure of 2 microblogging of user is (HED, SBV, ADV, POB, RAD, ATT, VOB).
Step 4: the common syntactic structure of 1 microblogging of user is (HED, SBV, ADV, POB, RAD, ATT, VOB), and user 2
The common syntactic structure of microblogging is (HED, SBV, ADV, POB, RAD, ATT, VOB), and two identical, so user 1 is with user 2
Vest relationship.
Claims (1)
1. a kind of microblogging vest account recognition methods based on interdependent syntactic relation, the specific steps are as follows:
Step 1: microblogging text data is obtained;
Step 2: being segmented using participle software, removal English and punctuation mark;
Step 3: using interdependent syntactic analysis software, carries out interdependent syntactic analysis to the text after having segmented, every microblogging can obtain
To a syntactic analysis result;
After interdependent syntactic analysis, the preservation result formats of microblogging short text are as follows:
() (i ∈ [1, n])
Wherein,Indicate the interdependent syntactic analysis of i-th microblogging in someone microblogging list as a result,Indicate the interdependent of the text
Syntactic structure;
Mark relationship used in the interdependent syntactic analysis includes 14 kinds: subject-predicate relationship, dynamic guest's relationship, guest's relationship, preposition guest
Language and language, it is fixed in relationship, verbal endocentric phrase, structure of complementation, coordination, guest's Jie relationship, left additional relationships, right additional relationships, solely
Vertical structure, Key Relationships;
Step 4: each text in someone microblogging list is analyzed using the method for step 3 as a result, as follows:
:()
Interdependent syntactic structure is commonly used using what Apriori algorithm calculated someone microblogging: firstly, giving a minimum support threshold value, for the first time scan microblogging list, calculate each syntactic structure () support (Support);
Support Support:Support (A B)=P (A ∪ B)
Wherein, A, B indicate that a certain syntactic structure, P (A ∪ B) indicate A, the probability that B occurs simultaneously;Support is less thanSyntactic structure delete, an as a result as frequent item collection;Subset in a frequent item collection is combined two-by-two, carries out second
Secondary scanning, calculates support, and removal is less thanSyntactic structure, obtain frequent two item collection;It is repeated in, until K item collection
In support be both greater than, frequent K item collection is obtained, as the common interdependent syntactic structure of user's microblogging;
Step 5: two accounts for needing to judge whether it is vest relationship are carried out pair according to the result of step 1 to four respectively
Than identical is vest relationship, conversely, being then non-vest relationship.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610350203.2A CN106021232B (en) | 2016-05-24 | 2016-05-24 | A kind of microblogging vest account recognition methods based on interdependent syntactic relation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610350203.2A CN106021232B (en) | 2016-05-24 | 2016-05-24 | A kind of microblogging vest account recognition methods based on interdependent syntactic relation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106021232A CN106021232A (en) | 2016-10-12 |
CN106021232B true CN106021232B (en) | 2019-06-28 |
Family
ID=57094569
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610350203.2A Active CN106021232B (en) | 2016-05-24 | 2016-05-24 | A kind of microblogging vest account recognition methods based on interdependent syntactic relation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106021232B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106598954A (en) * | 2017-01-05 | 2017-04-26 | 北京工商大学 | Method for recognizing social network sock puppet model based on frequency sub-tree |
CN110198261B (en) * | 2018-02-27 | 2021-09-07 | 腾讯科技(深圳)有限公司 | Group communication method, terminal and storage medium in instant messaging |
CN111046894A (en) * | 2018-10-15 | 2020-04-21 | 北京京东尚科信息技术有限公司 | Method and device for identifying vest account |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0522591A2 (en) * | 1991-07-11 | 1993-01-13 | Mitsubishi Denki Kabushiki Kaisha | Database retrieval system for responding to natural language queries with corresponding tables |
CN102043851A (en) * | 2010-12-22 | 2011-05-04 | 四川大学 | Multiple-document automatic abstracting method based on frequent itemset |
CN102185788A (en) * | 2011-01-31 | 2011-09-14 | 北京开心人信息技术有限公司 | Method and system for searching vice accounts on basis of temporary mailbox |
CN102968408A (en) * | 2012-11-23 | 2013-03-13 | 西安电子科技大学 | Method for identifying substance features of customer reviews |
CN103729474A (en) * | 2014-01-23 | 2014-04-16 | 中国科学院计算技术研究所 | Method and system for identifying vest account numbers of forum users |
CN104572765A (en) * | 2013-10-25 | 2015-04-29 | 西安群丰电子信息科技有限公司 | Method and system for finding vest account based on behavior analysis of user account |
-
2016
- 2016-05-24 CN CN201610350203.2A patent/CN106021232B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0522591A2 (en) * | 1991-07-11 | 1993-01-13 | Mitsubishi Denki Kabushiki Kaisha | Database retrieval system for responding to natural language queries with corresponding tables |
CN102043851A (en) * | 2010-12-22 | 2011-05-04 | 四川大学 | Multiple-document automatic abstracting method based on frequent itemset |
CN102185788A (en) * | 2011-01-31 | 2011-09-14 | 北京开心人信息技术有限公司 | Method and system for searching vice accounts on basis of temporary mailbox |
CN102968408A (en) * | 2012-11-23 | 2013-03-13 | 西安电子科技大学 | Method for identifying substance features of customer reviews |
CN104572765A (en) * | 2013-10-25 | 2015-04-29 | 西安群丰电子信息科技有限公司 | Method and system for finding vest account based on behavior analysis of user account |
CN103729474A (en) * | 2014-01-23 | 2014-04-16 | 中国科学院计算技术研究所 | Method and system for identifying vest account numbers of forum users |
Non-Patent Citations (2)
Title |
---|
微博客话题评论的聚类分析;张超;《中国优秀硕士学位论文全文数据库信息科技辑》;20140415;第28-30页 |
社交网络账号的马甲关系辨识方法;樊茜等;《中文信息学报》;20141119;第162-168页 |
Also Published As
Publication number | Publication date |
---|---|
CN106021232A (en) | 2016-10-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kluever et al. | Balancing usability and security in a video CAPTCHA | |
CN107944027B (en) | Method and system for creating semantic key index | |
CN105045857A (en) | Social network rumor recognition method and system | |
Hiruncharoenvate et al. | Algorithmically bypassing censorship on sina weibo with nondeterministic homophone substitutions | |
CN106021232B (en) | A kind of microblogging vest account recognition methods based on interdependent syntactic relation | |
CN111400506B (en) | Ancient poetry proposition method and system | |
Vitevitch et al. | Path-length and the misperception of speech: Insights from network science and psycholinguistics | |
Kaushik et al. | Automatic sentiment detection in naturalistic audio | |
Khatri et al. | Detecting offensive content in open-domain conversations using two stage semi-supervision | |
Arslan et al. | Real-time Lexicon-based sentiment analysis experiments on Twitter with a mild (more information, less data) approach | |
CN110956210A (en) | Semi-supervised network water force identification method and system based on AP clustering | |
CN104462326A (en) | Person relation analyzing method as well as method and device for providing person information | |
CN104484437B (en) | A kind of network short commentary emotion method for digging | |
Grinev et al. | Sifting micro-blogging stream for events of user interest | |
CN104978308B (en) | A kind of microblogging theme emotion evolution analysis method | |
Dharani et al. | Detection of phishing websites using ensemble machine learning approach | |
Cohen et al. | Invisible empire of hate: gender differences in the Ku Klux Klan's online justifications for violence | |
JP2017091368A (en) | Paraphrase device, method, and program | |
Detterbeck | South African choral music (Amakwaya): song, contest and the formation of identity. | |
Asadovna | An integrative approach in speech development by working on m atn in reading lessons | |
KR102005420B1 (en) | Method and apparatus for providing e-mail authorship classification | |
JP5718406B2 (en) | Utterance sentence generation device, dialogue apparatus, utterance sentence generation method, dialogue method, utterance sentence generation program, and dialogue program | |
Simo et al. | Regrets: A new corpus of regrettable (self-) disclosures on social media | |
Belbachir et al. | Opinion detection: Influence factors | |
Wester et al. | Bot or not: Exploring the fine line between cyber and human identity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211221 Address after: 410023 Room 101, building 3, wisdom Park, country garden, Xuehua village, bachelor street, Yuelu District, Changsha City, Hunan Province Patentee after: HUNAN ZHONGKE YOUXIN TECHNOLOGY CO.,LTD. Address before: 100048, Fu Cheng Road, Beijing, Haidian District, No. 33 Patentee before: BEIJING TECHNOLOGY AND BUSINESS University |
|
TR01 | Transfer of patent right |