CN111444402A - Analysis method for community detection based on index construction and social factor control network - Google Patents

Analysis method for community detection based on index construction and social factor control network Download PDF

Info

Publication number
CN111444402A
CN111444402A CN201911036341.3A CN201911036341A CN111444402A CN 111444402 A CN111444402 A CN 111444402A CN 201911036341 A CN201911036341 A CN 201911036341A CN 111444402 A CN111444402 A CN 111444402A
Authority
CN
China
Prior art keywords
network
constructing
community
theory
social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911036341.3A
Other languages
Chinese (zh)
Inventor
朱海
李雪威
王文俊
武南南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201911036341.3A priority Critical patent/CN111444402A/en
Publication of CN111444402A publication Critical patent/CN111444402A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an analysis method for carrying out community detection based on index construction and a social factor control network, which mainly comprises the following two steps of firstly constructing a generalized causal relationship network according to a social factor control theory, then constructing an index according to an FTV framework theory, inquiring and excavating a community structure of the network; establishing a causal relationship network by a social factor control theory, using the established network in the implementation step, then performing query work based on an FTV framework theory, and excavating a community structure in the network; and constructing a dictionary structure in the network.

Description

Analysis method for community detection based on index construction and social factor control network
Technical Field
The invention belongs to the field of network analysis, and relates to a method for carrying out query based on index construction and carrying out analysis based on social factor control theory. Firstly, a cause-control relationship network is constructed according to social cause-control relationships, then, the speed and the accuracy of query are improved by using an index technology, and the community relationships in the network are analyzed.
Background
In recent years, with the popularization and development of social networks, more and more users generate a large amount of data, and how to analyze possible community structures from the large amount of data becomes a challenge in the field of network analysis. In the years, problems brought by mass data are gradually transferred from data storage to network construction and network analysis due to the appearance of technologies like hadoop, and possible community structures are analyzed from the mass data, so that the method has a great effect on various fields. For example, various potential communities are separated from the social network, various cheat groups can be excavated, and the method has remarkable significance for purifying network security. This section mainly introduces the current research situation of community detection in network analysis.
Many studies have been currently conducted for community detection of massive data. The composition of the massive graph data is of two types, one is an ultra-large-scale graph composition consisting of massive data, and comprises a social network, a world wide web, an e-commerce transaction network and the like. In this type of network, for example a social network, where each node in the graph represents a person and each edge represents a person-to-person relationship, queries of this type of graph, which were initially considered as NPC problems, were proposed by karp. KARP proposes a method of using the most complete graph to query and community detection model the massive data graph. And the other graph network is composed of a large number of small-range graphs, such as a compound network. In networks composed of many compounds, each atom represents a node and each edge represents the force between atoms. This kind of problem can be queried using a method of subgraph approximate matching, but this problem is also an NPC problem, and in 1976 j.ullmann first proposed a solvable approach using a backtracking method. The present patent is directed to a first graph query type.
According to the theory of the method of KARP.R.M, although the query problem of the mass data graph is converted from the NPC problem to the solvable problem, the query speed is too slow, and particularly the current data is increased rapidly, the method is more difficult to adapt to the current environment.
The method comprises the steps of firstly modeling a network according to the social factor control theory correlation relationship, constructing a causal relationship network, then inquiring and analyzing the constructed network according to the FTV framework theory so as to achieve better matching, and reconstructing the community detection method according to the index inquiry technology.
Disclosure of Invention
The method mainly comprises the steps of mining the community structure in the mass data graph, accelerating the query speed and improving the query accuracy through the FTV framework theory, and therefore the community structure can be mined in large-scale static graph data more quickly. The method has great application value in relevant scenes such as fraud group detection, recommendation of the same interest group, early warning of event outbreak and the like.
The method mainly comprises the following two steps of extracting the causal relationship according to the dependency syntax, and then constructing a generalized causal relationship network by using the extracted causal relationship.
The method mainly comprises the following two steps of firstly constructing a generalized causal relationship network according to a social factor control theory, then constructing an index according to an FTV framework theory, inquiring and excavating a community structure of the network.
A causal relationship network is constructed by the social factor control theory, and the implementation steps are as follows:
step one, a network is constructed. And crawling blog content in the microblog and friend relation list data in the microblog as the evidence data of the method by using the current pyspider frame.
The method comprises the steps of processing data of specific contents of blogs, dividing the blogger contents of users by using a word segmentation device of the university of the double-denier, removing irrelevant tone words, extracting input blogger data by using FN L P keywords, extracting the keywords, dividing parts of speech, performing semantic analysis, and performing query abstraction ((see figure 1)).
Firstly, an elementary SNA network is constructed according to the extracted data relation, then text semantic content is extracted according to the semantics extracted from the b, and possible nodes and edges in the network are mined, so that the constructed network is denser, and the network sparsity is reduced. In the present method we use an example to illustrate how semantic extraction is performed, based on a semantic analysis model. For example, in the blog text, "i'm very repudiation xxx", where "xxx" is a person name, extracting the blog text, "i" represents the blog text author ID in the blog, "repudiation" is a relational verb, and "xxx" is another object, so that a hidden edge with the blog text author as a node can be extracted.
Using the constructed network, then carrying out query work based on FTV framework theory, and mining community structures in the network (see figure 2):
and step two, constructing a dictionary structure in the network. This dictionary structure is the basis for later building of query indexes.
The dictionary structure is composed of paths in the graph, and the graph of the paths with the length not exceeding p is used as an index, and the index is called as the fingerprint.
Bitmap inclusion is performed by first dividing the fingerprints (query FQ and database FD) into long integer sequences (query L Q and database L D), and then testing the bitmap inclusion between each pair (L Q, L D) using bitmap operations (L Q ^).
We modify the structure of the nodes in the dictionary repository by adding new fields that record edge labels, each fingerprint being constructed in exactly the same way, however, each bit of the fingerprint now obtained is appended to the end of each corresponding bitmap. (see figure four).
And constructs a parent node by performing a binary OR operation between two child nodes. If the comparison between the query fingerprint and a given node of the tree returns false, the entire branch below the node may be discarded directly. Thus, if the database is large enough, searching the dictionary repository will require a smaller number of comparisons than the database fingerprints.
In order to be able to apply a "fingerprint" to a set of bitmaps, distance measurements and average calculation methods must be defined. The number of 1's in common between elements of the same cluster must be maximized in order to minimize 1's in the binary OR. If b 1 and b 2 are two bitmaps, the distance between b 1 and b 2, denoted as d (b 1, b 2), is defined in the current proposal as follows:
Figure BDA0002251601970000031
and f, constructing a query dictionary base according to the method in the step f, and directly querying the community structure of the mass data according to the dictionary base, so that the query speed is increased and the query precision is improved.
Advantageous effects
For the existing massive data graph mining method, a causal relationship network is mainly constructed by adopting a social factor control theory, but only a social relationship correlation theory is used, a complete causal syntactic relationship is not used, but the precision and the speed of a query result are still satisfactory, and the method mainly has the following gains:
firstly, when the social network is constructed, not only the grouping related information in the data is used, but also the semantic analysis is used for constructing the network, so that the problem related to the network sparsity is reduced.
Secondly, an FTV theoretical framework is used, meanwhile, the FTV theoretical framework is improved, a bitmap compression method is used, the volume of index construction can be effectively compressed, and the query efficiency is accelerated.
Finally, the method has great compatibility, and can be applied to not only the related field of graph query, but also the community detection field in a citation network and a communication network.
Drawings
FIG. 1 is a query abstraction graph;
FIG. 2 is an index validation query structure;
FIG. 3 is a bitmap dictionary library calculation;
FIG. 4 is how a relationship network is built from data;
FIG. 5 is an example of utilizing text semantic relationships to reduce network sparsity;
FIG. 6 is a filtering theory framework and index construction for queries after a network is built.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The emotion cause mining method based on the dependency syntax and the generalized cause-and-effect network is mainly applied to finding out the cause-and-effect relationship of texts and finding out the operation rule of the texts. When mining the emotional causes, the following steps can be performed.
The index construction and social cause control network-based community detection analysis method is mainly used for quickly and accurately detecting communities with a certain structure in mass data scale so as to predict corresponding relationships. The process of establishing a network and the process of establishing an index query are performed according to the following steps.
The first step is as follows: firstly, user blog data and user data of a microblog are obtained, and crawling is mainly carried out by using a pyspider.
The second step is that: and (4) cleaning and washing the crawled data, extracting account data of the user and blog data for sorting, wherein the step is to remove a lot of invalid information, such as an empty account and the blog.
The third step: and (4) arranging the Bo-Wen data, performing semantic analysis on the Bo-Wen data, and extracting keywords in the Bo-Wen.
The fourth step: and analyzing the semantics of the keywords to remove the language words and symbols.
The fifth step: after filtering out invalid keywords and symbols, distinguishing account names, related account names and relations among names by using a word segmentation device.
And a sixth step: and constructing a factor control relationship network by the extracted effective keywords and the relationship table.
The seventh step: and constructing a relation network which takes the user relation network as a main body and adopts the blog semantic analysis as supplement.
Eighth step: constructing a dictionary base based on FTV filtering frame theory, firstly extracting a fingerprint base with the user node length not longer than p (index path value) for matching, and then obtaining a matching result as the dictionary base.
The ninth step: and the dictionary library is compressed, so that the query is facilitated, and a bitmap compression algorithm is used.
The tenth step: and querying a community structure in the network according to the dictionary library.
The implementation process of the first step, the second step, the third step, the fourth step and the fifth step is the step one in the corresponding technical scheme, and a complex network with associated semantics as a carrier can be constructed through the step one. The data processing process can crawl data according to the framework of fig. 3 and then process the data, and the semantic analysis framework association semantic network can be performed through fig. 4. And step two in the technical scheme corresponding to the implementation processes of step six, step seven, step eight and step nine, and through the step, an index dictionary library after abstract query can be constructed. Wherein the data structure of the dictionary repository is shown in figure 2. Finally, in the tenth step, a candidate set, that is, a community structure of a specific structure, can be obtained from the index according to the existing query, and the obtained mode of the candidate set is as shown in fig. 6.

Claims (4)

1. The method is characterized by mainly comprising the following two steps of firstly constructing a generalized causal relationship network according to a social factor control theory, then constructing an index according to an FTV framework theory, inquiring and excavating a community structure of the network;
a causal relationship network is constructed by the social factor control theory, and the implementation steps are as follows:
step one, constructing a network;
using the constructed network, then carrying out query work based on FTV framework theory, and mining community structures in the network;
step two, constructing a dictionary structure in the network;
bitmap inclusion is performed by first dividing the fingerprints (query FQ and database FD) into long integer sequences (query L Q and database L D), and then testing the bitmap inclusion between each pair (L Q, L D) using bitmap operations (L Q ^).
2. The index building and social cause control network based analysis method for community detection according to claim 1, wherein the structure of the nodes in the dictionary repository is modified by adding new fields recording edge tags, each fingerprint is constructed in exactly the same way, however, each bit of the fingerprint obtained now is appended to the end of each corresponding bitmap.
3. The index building and social cause control network based analysis method for community detection as claimed in claim 1, wherein the parent node is constructed by performing a binary OR operation between two child nodes.
4. The index building and social cause control network based community detection analysis method of claim 1, wherein a 'fingerprint' is applied to a bitmap set, defining a distance measurement and average calculation method.
CN201911036341.3A 2019-10-29 2019-10-29 Analysis method for community detection based on index construction and social factor control network Pending CN111444402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911036341.3A CN111444402A (en) 2019-10-29 2019-10-29 Analysis method for community detection based on index construction and social factor control network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911036341.3A CN111444402A (en) 2019-10-29 2019-10-29 Analysis method for community detection based on index construction and social factor control network

Publications (1)

Publication Number Publication Date
CN111444402A true CN111444402A (en) 2020-07-24

Family

ID=71650610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911036341.3A Pending CN111444402A (en) 2019-10-29 2019-10-29 Analysis method for community detection based on index construction and social factor control network

Country Status (1)

Country Link
CN (1) CN111444402A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116227598A (en) * 2023-05-08 2023-06-06 山东财经大学 Event prediction method, device and medium based on dual-stage attention mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887441A (en) * 2009-05-15 2010-11-17 华为技术有限公司 Method and system for establishing social network and method and system for mining network community
CN102254012A (en) * 2011-07-19 2011-11-23 北京大学 Graph data storing method and subgraph enquiring method based on external memory
US20130080416A1 (en) * 2011-09-23 2013-03-28 The Hartford System and method of insurance database optimization using social networking
CN103455705A (en) * 2013-05-24 2013-12-18 中国科学院自动化研究所 Analysis and prediction system for cooperative correlative tracking and global situation of network social events
CN103886011A (en) * 2013-12-30 2014-06-25 安徽讯飞智元信息科技有限公司 Social-relation network creation and retrieval system and method based on index files

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887441A (en) * 2009-05-15 2010-11-17 华为技术有限公司 Method and system for establishing social network and method and system for mining network community
CN102254012A (en) * 2011-07-19 2011-11-23 北京大学 Graph data storing method and subgraph enquiring method based on external memory
US20130080416A1 (en) * 2011-09-23 2013-03-28 The Hartford System and method of insurance database optimization using social networking
CN103455705A (en) * 2013-05-24 2013-12-18 中国科学院自动化研究所 Analysis and prediction system for cooperative correlative tracking and global situation of network social events
CN103886011A (en) * 2013-12-30 2014-06-25 安徽讯飞智元信息科技有限公司 Social-relation network creation and retrieval system and method based on index files

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
钱汉伟;袁明;吉文元;: "基于社交网络大数据线索分析平台研究及应用" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116227598A (en) * 2023-05-08 2023-06-06 山东财经大学 Event prediction method, device and medium based on dual-stage attention mechanism
CN116227598B (en) * 2023-05-08 2023-07-11 山东财经大学 Event prediction method, device and medium based on dual-stage attention mechanism

Similar Documents

Publication Publication Date Title
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
Rizzo et al. NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud.
CN102207945B (en) Knowledge network-based text indexing system and method
CN113822067A (en) Key information extraction method and device, computer equipment and storage medium
CN101593200A (en) Chinese Web page classification method based on the keyword frequency analysis
Babur et al. Hierarchical clustering of metamodels for comparative analysis and visualization
Zhao et al. Topic-centric and semantic-aware retrieval system for internet of things
CN112989831B (en) Entity extraction method applied to network security field
CN104268148A (en) Forum page information auto-extraction method and system based on time strings
CN105718585A (en) Document and label word semantic association method and device thereof
CN102654873A (en) Tourism information extraction and aggregation method based on Chinese word segmentation
CN104346382B (en) Use the text analysis system and method for language inquiry
Bhardwaj et al. Web scraping using summarization and named entity recognition (ner)
CN111190873B (en) Log mode extraction method and system for log training of cloud native system
Xu Cultural communication in double-layer coupling social network based on association rules in big data
Wang et al. Short text topic learning using heterogeneous information network
CN107784019A (en) Word treatment method and system are searched in a kind of searching service
CN111444402A (en) Analysis method for community detection based on index construction and social factor control network
Narayana et al. A novel and efficient approach for near duplicate page detection in web crawling
Yu et al. Mining hidden interests from twitter based on word similarity and social relationship for OLAP
CN116822491A (en) Log analysis method and device, equipment and storage medium
Joshi et al. Sequential pattern mining using formal language tools
Shaikh et al. Bringing shape to textual data-a feasible demonstration
Zhang et al. An improved ontology-based web information extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200724