CN106021232B - A kind of microblogging vest account recognition methods based on interdependent syntactic relation - Google Patents

A kind of microblogging vest account recognition methods based on interdependent syntactic relation Download PDF

Info

Publication number
CN106021232B
CN106021232B CN201610350203.2A CN201610350203A CN106021232B CN 106021232 B CN106021232 B CN 106021232B CN 201610350203 A CN201610350203 A CN 201610350203A CN 106021232 B CN106021232 B CN 106021232B
Authority
CN
China
Prior art keywords
microblogging
relationship
vest
support
interdependent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610350203.2A
Other languages
Chinese (zh)
Other versions
CN106021232A (en
Inventor
段大高
高飒
韩忠明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HUNAN ZHONGKE YOUXIN TECHNOLOGY CO.,LTD.
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN201610350203.2A priority Critical patent/CN106021232B/en
Publication of CN106021232A publication Critical patent/CN106021232A/en
Application granted granted Critical
Publication of CN106021232B publication Critical patent/CN106021232B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • H04L51/046Interoperability with other network applications or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities

Abstract

The microblogging vest account recognition methods based on interdependent syntactic relation that the present invention relates to a kind of, the specific steps are as follows: step 1: microblogging text data is obtained;Step 2: being segmented using participle software, removal English and punctuation mark;Step 3: using interdependent syntactic analysis software, carries out interdependent syntactic analysis to the text after having segmented, every microblogging can obtain a syntactic analysis result;Step 4: each text in someone microblogging list obtains analysis result using the method for step 3;Interdependent syntactic structure is commonly used using what Apriori algorithm calculated someone microblogging;Step 5: two accounts for needing to judge whether it is vest relationship are compared according to the result of step 1 to four respectively, and identical is vest relationship, conversely, being then non-vest relationship.The method of the present invention can be used in social network sites for management and government the tracing about the network crime of network security, can be quick, efficiently identify vest account.

Description

A kind of microblogging vest account recognition methods based on interdependent syntactic relation
Technical field
The microblogging vest account recognition methods based on interdependent syntactic relation that the present invention relates to a kind of, it is applied to social networks Middle text author relationships identify, and belong to data mining technology field.
Background technique
Currently, with the rapid development of science and technology, the especially development of Internet technology, global interconnection network users total amount has surpassed 3000000000.At home, end 2015, the Sina weibo moon any active ues reach 2.36 hundred million.In social networks, someone can register one A microblogging ID, what is be commonly used or log in is known as main ID, and present many online friends are simultaneously unsatisfactory for a microblogging ID, can register other Microblogging ID is made a speech, these non-master ID accounts are known as vest account often when being not desired to show main ID identity with the account of non-master ID. Vest account has the one side of its negative function, such as: use vest account spread rumors;Verbal attack is carried out under others' article Or slander, induce incorrect values;Main microblogging ID promote using vest account etc..Such behavior can shadow Ring the safety and fairness of network.The system of real name of social networks is a difficult problem, and most of online friend does not carry out real name Certification, it is not easy to know their true identity.When online friend delivers discordant speech, such as: flame is propagated, is humiliated Other people even betrayal of state secrets etc. are calumniated, is same people by vest account specification, may consequently contribute to government department and fight crime Behavior.
Currently, author's Study of recognition based on diction is more taken seriously, this method is equally applicable to microblogging short essay The identification of this progress vest account.The present invention carries out interdependent syntactic relation analysis to microblogging short text, analyzes the language of microblog account Style is sayed, to identify microblogging vest account.
Summary of the invention
1, purpose:
The microblogging vest account recognition methods based on interdependent syntactic relation that it is an object of the present invention to provide a kind of, it is a large amount of possessing In the microblogging of user, vest relationship can be quickly and efficiently identified, and then facilitate the further work of other departments.
The principle of the present invention is: carrying out the processing of natural language first, carries out to all microblogging short texts of some user Participle obtains word segmentation result and analyzes the interdependent syntactic structure of short text, the interdependent syntactic structure of every microblogging is protected It deposits, until all microblogging text analyzings of this user finish.This user is excavated using Apriori algorithm and uses interdependent sentence The frequent mode of method structure, the i.e. diction of user thus.The interdependent syntactic structure of two users is compared, and then may determine that It whether is vest relationship.
2, technical solution: technical solution provided by the invention is as follows:
The present invention is a kind of microblogging vest account recognition methods based on interdependent syntactic relation, and this method specific steps are such as Under:
Step 1: microblogging text data is obtained.
Step 2: being segmented using participle software, removal English and punctuation mark.
Step 3: using interdependent syntactic analysis software, carries out interdependent syntactic analysis, every microblogging to the text after having segmented A syntactic analysis can be obtained as a result, the syntactic structure form of every microblogging is illustrated in fig. 1 shown below after analysis:
The sentence is the result after participle.Wherein, ROOT indicates to handle the sentence of text, on the arrow line above sentence Letter indicate syntactic structure.
After interdependent syntactic analysis, the preservation result formats of microblogging short text are as follows:
WSi: WSi(R1, R2... Ri..., Rn) (i ∈ [1, n])
Wherein, WSiIndicate the interdependent syntactic analysis of i-th microblogging in someone microblogging list as a result, RnIndicate the text according to Deposit syntactic structure.
Interdependent syntactic analysis mark relationship (totally 14 kinds) and meaning for including in the interdependent syntactic analysis software are as follows:
Step 4: each text in someone microblogging list is analyzed using the method for step 3 as a result, as follows:
WS: WS(WS1, WS2... WSi..., WSn)
Interdependent syntactic structure is commonly used using what Apriori algorithm calculated someone microblogging.Firstly, giving a minimum support Threshold value SUPmin, this minimum support is to be manually set according to the smaller value of an item collection experimental result, and pass through many experiments This parameter is adjusted, to obtain best experiment effect.Scanning microblogging list for the first time, calculates each syntactic structure (Ri) Support (Support).
Support Support:
Wherein, A, B indicate that a certain syntactic structure, P (A ∪ B) indicate A, the probability that B occurs simultaneously.Support is less than SUPminSyntactic structure delete.It as a result is a frequent item collection.Subset in a frequent item collection is combined two-by-two, carries out second Secondary scanning, calculates support, and removal is less than SUPminSyntactic structure, obtain frequent two item collection.It is repeated in, until in K item collection Support be both greater than SUPmin, frequent K item collection is obtained, as the common interdependent syntactic structure of user's microblogging.
Step 5: two accounts for needing to judge whether it is vest relationship are carried out according to the result of step 1 to four respectively Comparison, identical is vest relationship, conversely, being then non-vest relationship.
3, advantage and effect: beneficial effects of the present invention: provided by the present invention a kind of based on the micro- of interdependent syntactic relation Method after training by a large amount of data can be utilized and network is pacified in social network sites by rich vest account recognition methods Full management and government tracing about the network crime, can be quick, efficiently identifies vest account.
Detailed description of the invention
Fig. 1 is the syntactic structure form of microblogging.
Specific embodiment
Specific implementation step, and referring to Figure 1;
Step 1: the content of microblog of Sina weibo partial user is extracted.Such as: (take preceding ten microbloggings of two users For) user 1:
I has just used # whip # to share a sound photo, and photo is listened to tell a story, very interesting.http://t.cn/ Zj6H170 (comes from@whip official microblogging)
It is said that envying that envy is hated all haunting daily, it is not known that upset how the artificial of intelligence is all that appearance is pretending to be what one is not by it People.
I am very serious!Right? I will become the earth emperor person!Here monster too dish, the general appearance of my mind is it All frighten urine!Become my comrade-in-arms fastly.Http:// t.cn/zYilNjV is surrounded and watched in click
I am tired, and just like this, some day, I just disappeared from your life some things perhaps, I am tired!
I has just used # whip # to share a sound photo, and photo is listened to tell a story, very interesting.http://t.cn/ Zjijni4 (comes from@whip official microblogging)
To hear " song of leaving the boundary " audition address > > > http://t.cn/zji5THs (singing # by # to record) that I sings
To hear " there are also me " audition address > > > http://t.cn/zjJevkB (singing # by # to record) that I sings
To hear " Bei Jiuchang " audition address > > > http://t.cn/zjJupTa (singing # by # to record) that I sings
" Farewell My Concubine " is just finished watching, I obtains slowly mood!
It wants various transhipments this year, first recruits peach blossom!
User 2:
It has seen " sheet ", the inside is other all right other than the play in modest teacher is behave excellently relatively, too rotten! It spends to see that sincerity does not worth!
To hear " elegy " audition address > > > http://t.cn/zHnCnam (singing # by # to record) that I sings
To hear " because love " audition address > > > http://t.cn/zHRnyhm (singing # by # to record) that I sings
Some program of Dragon TV is being had a dress rehearsal
I has just used # whip # to share a sound photo, and photo is listened to tell a story, very interesting.http://t.cn/ ZYpnSj (comes from@whip official microblogging)
I is not desired to do this row suddenly, has a evil spirit that me is allowed not have interest to this row
To hear " human world " audition address > > > http://t.cn/zYpXLeo (singing # by # to record) that I sings
Yesterday, evacuation saw that the Jiangsu the@satellite TV of this week was very terrible, and mood has been got well much immediately.
Without words do not say be we once, having nothing to speak is our final result.
Cold wind blows on one's face outside vehicle window, and it is all primary watchful for striking each time, and there is no so unhappy for I, moreover it is possible to be laughed at raw It is living!
Step 2: it according to the microblogging text extracted in Sina weibo, is segmented and deletes English and punctuation mark:
User 1:
I has just shared a sound photo with whip and photo is listened to tell a story very interesting come from Whip official microblogging
It is said that envying that envy is hated all is haunting do not know how by its artificial of upset intelligence be all appearance people mould dog daily The people of sample
I it is very serious right I will become my mind of the monster of the earth emperor person here too dish It is general appearance they are all frightened urinated the comrade-in-arms for becoming me fastly click surround and watch
I tired out some things just like this perhaps some day I that me is just disappeared from your life is tired
I has just shared a sound photo with whip and photo is listened to tell a story very interesting come from Whip official microblogging
To hear bent audition address of leaving the boundary that I sing by singing recording
To hear that I sing, there are also my audition addresses by singing recording
To hear northern wine field audition address that I sings by singing recording
Just having finished watching Farewell My Concubine, I obtains slowly mood
Various transhipments are wanted this year first to recruit peach blossom
User 2:
Seen large stretch of the inside in addition to the play in modest teacher is opposite behave excellently other than it is other all right too rotten It spends to see that sincerity does not worth
To hear elegy audition address that I sings by singing recording
To hear that I sings because love audition address is by singing recording
Some program of Dragon TV is being had a dress rehearsal
I has just shared a sound photo with whip and photo is listened to tell a story very interesting come from Whip official microblogging
I, which is not desired to do this row suddenly, has a evil spirit that me is allowed not have interest to this row
To hear human world audition address that I sings by singing recording
Yesterday, evacuation saw that the very terrible mood of the Jiangsu satellite TV of this week had been got well much immediately
Do not say be our once have nothing to speak to be our final result without words
Cold wind blow on one's face outside vehicle window strike each time all be it is primary it is watchful I there is no so it is not unhappy can also laugh at it is raw It is living
Step 3: the interdependent syntactic structure of microblogging, obtained result are analyzed are as follows:
Support Support:Probability that expression event A and B occur simultaneously (A with Number/total event times that B occurs simultaneously), total event times are identical, for convenience of calculating, time that event A and B are occurred simultaneously Number (being the number that syntactic structure occurs in this example) is set as support, the minimum support threshold value SUP of settingmin=6:
User 1:
The item collection generated after scanning for the first time are as follows:
Removal value is less than SUPminItem collection, CMP.
Two item collections generated after second of scanning are as follows:
Removal value is less than SUPminItem collection, (POB, COO), (RAD, COO).
According to above method, frequent K item collection, final result are successively found are as follows: (HED, SBV, ADV, POB, RAD, ATT, VOB)。
So the common syntactic structure of 1 microblogging of user is (HED, SBV, ADV, POB, RAD, ATT, VOB).
User 2:
The item collection generated after scanning for the first time are as follows:
Removal value is less than SUPminItem collection, CMP, DBL.
Two item collections generated after second of scanning are as follows:
Removal value is less than SUPminItem collection, (POB, COO), (VOB, COO).
According to above method, frequent K item collection, final result are successively found are as follows: (HED, SBV, ADV, POB, RAD, ATT, VOB)。
So the common syntactic structure of 2 microblogging of user is (HED, SBV, ADV, POB, RAD, ATT, VOB).
Step 4: the common syntactic structure of 1 microblogging of user is (HED, SBV, ADV, POB, RAD, ATT, VOB), and user 2 The common syntactic structure of microblogging is (HED, SBV, ADV, POB, RAD, ATT, VOB), and two identical, so user 1 is with user 2 Vest relationship.

Claims (1)

1. a kind of microblogging vest account recognition methods based on interdependent syntactic relation, the specific steps are as follows:
Step 1: microblogging text data is obtained;
Step 2: being segmented using participle software, removal English and punctuation mark;
Step 3: using interdependent syntactic analysis software, carries out interdependent syntactic analysis to the text after having segmented, every microblogging can obtain To a syntactic analysis result;
After interdependent syntactic analysis, the preservation result formats of microblogging short text are as follows:
() (i ∈ [1, n])
Wherein,Indicate the interdependent syntactic analysis of i-th microblogging in someone microblogging list as a result,Indicate the interdependent of the text Syntactic structure;
Mark relationship used in the interdependent syntactic analysis includes 14 kinds: subject-predicate relationship, dynamic guest's relationship, guest's relationship, preposition guest Language and language, it is fixed in relationship, verbal endocentric phrase, structure of complementation, coordination, guest's Jie relationship, left additional relationships, right additional relationships, solely Vertical structure, Key Relationships;
Step 4: each text in someone microblogging list is analyzed using the method for step 3 as a result, as follows:
:()
Interdependent syntactic structure is commonly used using what Apriori algorithm calculated someone microblogging: firstly, giving a minimum support threshold value, for the first time scan microblogging list, calculate each syntactic structure () support (Support);
Support Support:Support (A B)=P (A ∪ B)
Wherein, A, B indicate that a certain syntactic structure, P (A ∪ B) indicate A, the probability that B occurs simultaneously;Support is less thanSyntactic structure delete, an as a result as frequent item collection;Subset in a frequent item collection is combined two-by-two, carries out second Secondary scanning, calculates support, and removal is less thanSyntactic structure, obtain frequent two item collection;It is repeated in, until K item collection In support be both greater than, frequent K item collection is obtained, as the common interdependent syntactic structure of user's microblogging;
Step 5: two accounts for needing to judge whether it is vest relationship are carried out pair according to the result of step 1 to four respectively Than identical is vest relationship, conversely, being then non-vest relationship.
CN201610350203.2A 2016-05-24 2016-05-24 A kind of microblogging vest account recognition methods based on interdependent syntactic relation Active CN106021232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610350203.2A CN106021232B (en) 2016-05-24 2016-05-24 A kind of microblogging vest account recognition methods based on interdependent syntactic relation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610350203.2A CN106021232B (en) 2016-05-24 2016-05-24 A kind of microblogging vest account recognition methods based on interdependent syntactic relation

Publications (2)

Publication Number Publication Date
CN106021232A CN106021232A (en) 2016-10-12
CN106021232B true CN106021232B (en) 2019-06-28

Family

ID=57094569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610350203.2A Active CN106021232B (en) 2016-05-24 2016-05-24 A kind of microblogging vest account recognition methods based on interdependent syntactic relation

Country Status (1)

Country Link
CN (1) CN106021232B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106598954A (en) * 2017-01-05 2017-04-26 北京工商大学 Method for recognizing social network sock puppet model based on frequency sub-tree
CN110198261B (en) * 2018-02-27 2021-09-07 腾讯科技(深圳)有限公司 Group communication method, terminal and storage medium in instant messaging
CN111046894A (en) * 2018-10-15 2020-04-21 北京京东尚科信息技术有限公司 Method and device for identifying vest account

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0522591A2 (en) * 1991-07-11 1993-01-13 Mitsubishi Denki Kabushiki Kaisha Database retrieval system for responding to natural language queries with corresponding tables
CN102043851A (en) * 2010-12-22 2011-05-04 四川大学 Multiple-document automatic abstracting method based on frequent itemset
CN102185788A (en) * 2011-01-31 2011-09-14 北京开心人信息技术有限公司 Method and system for searching vice accounts on basis of temporary mailbox
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN103729474A (en) * 2014-01-23 2014-04-16 中国科学院计算技术研究所 Method and system for identifying vest account numbers of forum users
CN104572765A (en) * 2013-10-25 2015-04-29 西安群丰电子信息科技有限公司 Method and system for finding vest account based on behavior analysis of user account

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0522591A2 (en) * 1991-07-11 1993-01-13 Mitsubishi Denki Kabushiki Kaisha Database retrieval system for responding to natural language queries with corresponding tables
CN102043851A (en) * 2010-12-22 2011-05-04 四川大学 Multiple-document automatic abstracting method based on frequent itemset
CN102185788A (en) * 2011-01-31 2011-09-14 北京开心人信息技术有限公司 Method and system for searching vice accounts on basis of temporary mailbox
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN104572765A (en) * 2013-10-25 2015-04-29 西安群丰电子信息科技有限公司 Method and system for finding vest account based on behavior analysis of user account
CN103729474A (en) * 2014-01-23 2014-04-16 中国科学院计算技术研究所 Method and system for identifying vest account numbers of forum users

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
微博客话题评论的聚类分析;张超;《中国优秀硕士学位论文全文数据库信息科技辑》;20140415;第28-30页
社交网络账号的马甲关系辨识方法;樊茜等;《中文信息学报》;20141119;第162-168页

Also Published As

Publication number Publication date
CN106021232A (en) 2016-10-12

Similar Documents

Publication Publication Date Title
Kluever et al. Balancing usability and security in a video CAPTCHA
CN107944027B (en) Method and system for creating semantic key index
CN105045857A (en) Social network rumor recognition method and system
Hiruncharoenvate et al. Algorithmically bypassing censorship on sina weibo with nondeterministic homophone substitutions
CN106021232B (en) A kind of microblogging vest account recognition methods based on interdependent syntactic relation
CN111400506B (en) Ancient poetry proposition method and system
Vitevitch et al. Path-length and the misperception of speech: Insights from network science and psycholinguistics
Kaushik et al. Automatic sentiment detection in naturalistic audio
Khatri et al. Detecting offensive content in open-domain conversations using two stage semi-supervision
Arslan et al. Real-time Lexicon-based sentiment analysis experiments on Twitter with a mild (more information, less data) approach
CN110956210A (en) Semi-supervised network water force identification method and system based on AP clustering
CN104462326A (en) Person relation analyzing method as well as method and device for providing person information
CN104484437B (en) A kind of network short commentary emotion method for digging
Grinev et al. Sifting micro-blogging stream for events of user interest
CN104978308B (en) A kind of microblogging theme emotion evolution analysis method
Dharani et al. Detection of phishing websites using ensemble machine learning approach
Cohen et al. Invisible empire of hate: gender differences in the Ku Klux Klan's online justifications for violence
JP2017091368A (en) Paraphrase device, method, and program
Detterbeck South African choral music (Amakwaya): song, contest and the formation of identity.
Asadovna An integrative approach in speech development by working on m atn in reading lessons
KR102005420B1 (en) Method and apparatus for providing e-mail authorship classification
JP5718406B2 (en) Utterance sentence generation device, dialogue apparatus, utterance sentence generation method, dialogue method, utterance sentence generation program, and dialogue program
Simo et al. Regrets: A new corpus of regrettable (self-) disclosures on social media
Belbachir et al. Opinion detection: Influence factors
Wester et al. Bot or not: Exploring the fine line between cyber and human identity

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211221

Address after: 410023 Room 101, building 3, wisdom Park, country garden, Xuehua village, bachelor street, Yuelu District, Changsha City, Hunan Province

Patentee after: HUNAN ZHONGKE YOUXIN TECHNOLOGY CO.,LTD.

Address before: 100048, Fu Cheng Road, Beijing, Haidian District, No. 33

Patentee before: BEIJING TECHNOLOGY AND BUSINESS University

TR01 Transfer of patent right