CN111444402A - Analysis method for community detection based on index construction and social factor control network - Google Patents
Analysis method for community detection based on index construction and social factor control network Download PDFInfo
- Publication number
- CN111444402A CN111444402A CN201911036341.3A CN201911036341A CN111444402A CN 111444402 A CN111444402 A CN 111444402A CN 201911036341 A CN201911036341 A CN 201911036341A CN 111444402 A CN111444402 A CN 111444402A
- Authority
- CN
- China
- Prior art keywords
- network
- constructing
- community
- theory
- social
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 14
- 238000001514 detection method Methods 0.000 title claims abstract description 13
- 238000010276 construction Methods 0.000 title abstract description 8
- 230000001364 causal effect Effects 0.000 claims abstract description 12
- 238000000034 method Methods 0.000 claims description 29
- 238000005065 mining Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 238000003012 network analysis Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000009193 crawling Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an analysis method for carrying out community detection based on index construction and a social factor control network, which mainly comprises the following two steps of firstly constructing a generalized causal relationship network according to a social factor control theory, then constructing an index according to an FTV framework theory, inquiring and excavating a community structure of the network; establishing a causal relationship network by a social factor control theory, using the established network in the implementation step, then performing query work based on an FTV framework theory, and excavating a community structure in the network; and constructing a dictionary structure in the network.
Description
Technical Field
The invention belongs to the field of network analysis, and relates to a method for carrying out query based on index construction and carrying out analysis based on social factor control theory. Firstly, a cause-control relationship network is constructed according to social cause-control relationships, then, the speed and the accuracy of query are improved by using an index technology, and the community relationships in the network are analyzed.
Background
In recent years, with the popularization and development of social networks, more and more users generate a large amount of data, and how to analyze possible community structures from the large amount of data becomes a challenge in the field of network analysis. In the years, problems brought by mass data are gradually transferred from data storage to network construction and network analysis due to the appearance of technologies like hadoop, and possible community structures are analyzed from the mass data, so that the method has a great effect on various fields. For example, various potential communities are separated from the social network, various cheat groups can be excavated, and the method has remarkable significance for purifying network security. This section mainly introduces the current research situation of community detection in network analysis.
Many studies have been currently conducted for community detection of massive data. The composition of the massive graph data is of two types, one is an ultra-large-scale graph composition consisting of massive data, and comprises a social network, a world wide web, an e-commerce transaction network and the like. In this type of network, for example a social network, where each node in the graph represents a person and each edge represents a person-to-person relationship, queries of this type of graph, which were initially considered as NPC problems, were proposed by karp. KARP proposes a method of using the most complete graph to query and community detection model the massive data graph. And the other graph network is composed of a large number of small-range graphs, such as a compound network. In networks composed of many compounds, each atom represents a node and each edge represents the force between atoms. This kind of problem can be queried using a method of subgraph approximate matching, but this problem is also an NPC problem, and in 1976 j.ullmann first proposed a solvable approach using a backtracking method. The present patent is directed to a first graph query type.
According to the theory of the method of KARP.R.M, although the query problem of the mass data graph is converted from the NPC problem to the solvable problem, the query speed is too slow, and particularly the current data is increased rapidly, the method is more difficult to adapt to the current environment.
The method comprises the steps of firstly modeling a network according to the social factor control theory correlation relationship, constructing a causal relationship network, then inquiring and analyzing the constructed network according to the FTV framework theory so as to achieve better matching, and reconstructing the community detection method according to the index inquiry technology.
Disclosure of Invention
The method mainly comprises the steps of mining the community structure in the mass data graph, accelerating the query speed and improving the query accuracy through the FTV framework theory, and therefore the community structure can be mined in large-scale static graph data more quickly. The method has great application value in relevant scenes such as fraud group detection, recommendation of the same interest group, early warning of event outbreak and the like.
The method mainly comprises the following two steps of extracting the causal relationship according to the dependency syntax, and then constructing a generalized causal relationship network by using the extracted causal relationship.
The method mainly comprises the following two steps of firstly constructing a generalized causal relationship network according to a social factor control theory, then constructing an index according to an FTV framework theory, inquiring and excavating a community structure of the network.
A causal relationship network is constructed by the social factor control theory, and the implementation steps are as follows:
step one, a network is constructed. And crawling blog content in the microblog and friend relation list data in the microblog as the evidence data of the method by using the current pyspider frame.
The method comprises the steps of processing data of specific contents of blogs, dividing the blogger contents of users by using a word segmentation device of the university of the double-denier, removing irrelevant tone words, extracting input blogger data by using FN L P keywords, extracting the keywords, dividing parts of speech, performing semantic analysis, and performing query abstraction ((see figure 1)).
Firstly, an elementary SNA network is constructed according to the extracted data relation, then text semantic content is extracted according to the semantics extracted from the b, and possible nodes and edges in the network are mined, so that the constructed network is denser, and the network sparsity is reduced. In the present method we use an example to illustrate how semantic extraction is performed, based on a semantic analysis model. For example, in the blog text, "i'm very repudiation xxx", where "xxx" is a person name, extracting the blog text, "i" represents the blog text author ID in the blog, "repudiation" is a relational verb, and "xxx" is another object, so that a hidden edge with the blog text author as a node can be extracted.
Using the constructed network, then carrying out query work based on FTV framework theory, and mining community structures in the network (see figure 2):
and step two, constructing a dictionary structure in the network. This dictionary structure is the basis for later building of query indexes.
The dictionary structure is composed of paths in the graph, and the graph of the paths with the length not exceeding p is used as an index, and the index is called as the fingerprint.
Bitmap inclusion is performed by first dividing the fingerprints (query FQ and database FD) into long integer sequences (query L Q and database L D), and then testing the bitmap inclusion between each pair (L Q, L D) using bitmap operations (L Q ^).
We modify the structure of the nodes in the dictionary repository by adding new fields that record edge labels, each fingerprint being constructed in exactly the same way, however, each bit of the fingerprint now obtained is appended to the end of each corresponding bitmap. (see figure four).
And constructs a parent node by performing a binary OR operation between two child nodes. If the comparison between the query fingerprint and a given node of the tree returns false, the entire branch below the node may be discarded directly. Thus, if the database is large enough, searching the dictionary repository will require a smaller number of comparisons than the database fingerprints.
In order to be able to apply a "fingerprint" to a set of bitmaps, distance measurements and average calculation methods must be defined. The number of 1's in common between elements of the same cluster must be maximized in order to minimize 1's in the binary OR. If b 1 and b 2 are two bitmaps, the distance between b 1 and b 2, denoted as d (b 1, b 2), is defined in the current proposal as follows:
and f, constructing a query dictionary base according to the method in the step f, and directly querying the community structure of the mass data according to the dictionary base, so that the query speed is increased and the query precision is improved.
Advantageous effects
For the existing massive data graph mining method, a causal relationship network is mainly constructed by adopting a social factor control theory, but only a social relationship correlation theory is used, a complete causal syntactic relationship is not used, but the precision and the speed of a query result are still satisfactory, and the method mainly has the following gains:
firstly, when the social network is constructed, not only the grouping related information in the data is used, but also the semantic analysis is used for constructing the network, so that the problem related to the network sparsity is reduced.
Secondly, an FTV theoretical framework is used, meanwhile, the FTV theoretical framework is improved, a bitmap compression method is used, the volume of index construction can be effectively compressed, and the query efficiency is accelerated.
Finally, the method has great compatibility, and can be applied to not only the related field of graph query, but also the community detection field in a citation network and a communication network.
Drawings
FIG. 1 is a query abstraction graph;
FIG. 2 is an index validation query structure;
FIG. 3 is a bitmap dictionary library calculation;
FIG. 4 is how a relationship network is built from data;
FIG. 5 is an example of utilizing text semantic relationships to reduce network sparsity;
FIG. 6 is a filtering theory framework and index construction for queries after a network is built.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The emotion cause mining method based on the dependency syntax and the generalized cause-and-effect network is mainly applied to finding out the cause-and-effect relationship of texts and finding out the operation rule of the texts. When mining the emotional causes, the following steps can be performed.
The index construction and social cause control network-based community detection analysis method is mainly used for quickly and accurately detecting communities with a certain structure in mass data scale so as to predict corresponding relationships. The process of establishing a network and the process of establishing an index query are performed according to the following steps.
The first step is as follows: firstly, user blog data and user data of a microblog are obtained, and crawling is mainly carried out by using a pyspider.
The second step is that: and (4) cleaning and washing the crawled data, extracting account data of the user and blog data for sorting, wherein the step is to remove a lot of invalid information, such as an empty account and the blog.
The third step: and (4) arranging the Bo-Wen data, performing semantic analysis on the Bo-Wen data, and extracting keywords in the Bo-Wen.
The fourth step: and analyzing the semantics of the keywords to remove the language words and symbols.
The fifth step: after filtering out invalid keywords and symbols, distinguishing account names, related account names and relations among names by using a word segmentation device.
And a sixth step: and constructing a factor control relationship network by the extracted effective keywords and the relationship table.
The seventh step: and constructing a relation network which takes the user relation network as a main body and adopts the blog semantic analysis as supplement.
Eighth step: constructing a dictionary base based on FTV filtering frame theory, firstly extracting a fingerprint base with the user node length not longer than p (index path value) for matching, and then obtaining a matching result as the dictionary base.
The ninth step: and the dictionary library is compressed, so that the query is facilitated, and a bitmap compression algorithm is used.
The tenth step: and querying a community structure in the network according to the dictionary library.
The implementation process of the first step, the second step, the third step, the fourth step and the fifth step is the step one in the corresponding technical scheme, and a complex network with associated semantics as a carrier can be constructed through the step one. The data processing process can crawl data according to the framework of fig. 3 and then process the data, and the semantic analysis framework association semantic network can be performed through fig. 4. And step two in the technical scheme corresponding to the implementation processes of step six, step seven, step eight and step nine, and through the step, an index dictionary library after abstract query can be constructed. Wherein the data structure of the dictionary repository is shown in figure 2. Finally, in the tenth step, a candidate set, that is, a community structure of a specific structure, can be obtained from the index according to the existing query, and the obtained mode of the candidate set is as shown in fig. 6.
Claims (4)
1. The method is characterized by mainly comprising the following two steps of firstly constructing a generalized causal relationship network according to a social factor control theory, then constructing an index according to an FTV framework theory, inquiring and excavating a community structure of the network;
a causal relationship network is constructed by the social factor control theory, and the implementation steps are as follows:
step one, constructing a network;
using the constructed network, then carrying out query work based on FTV framework theory, and mining community structures in the network;
step two, constructing a dictionary structure in the network;
bitmap inclusion is performed by first dividing the fingerprints (query FQ and database FD) into long integer sequences (query L Q and database L D), and then testing the bitmap inclusion between each pair (L Q, L D) using bitmap operations (L Q ^).
2. The index building and social cause control network based analysis method for community detection according to claim 1, wherein the structure of the nodes in the dictionary repository is modified by adding new fields recording edge tags, each fingerprint is constructed in exactly the same way, however, each bit of the fingerprint obtained now is appended to the end of each corresponding bitmap.
3. The index building and social cause control network based analysis method for community detection as claimed in claim 1, wherein the parent node is constructed by performing a binary OR operation between two child nodes.
4. The index building and social cause control network based community detection analysis method of claim 1, wherein a 'fingerprint' is applied to a bitmap set, defining a distance measurement and average calculation method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911036341.3A CN111444402A (en) | 2019-10-29 | 2019-10-29 | Analysis method for community detection based on index construction and social factor control network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911036341.3A CN111444402A (en) | 2019-10-29 | 2019-10-29 | Analysis method for community detection based on index construction and social factor control network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111444402A true CN111444402A (en) | 2020-07-24 |
Family
ID=71650610
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911036341.3A Pending CN111444402A (en) | 2019-10-29 | 2019-10-29 | Analysis method for community detection based on index construction and social factor control network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111444402A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116227598A (en) * | 2023-05-08 | 2023-06-06 | 山东财经大学 | Event prediction method, device and medium based on dual-stage attention mechanism |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101887441A (en) * | 2009-05-15 | 2010-11-17 | 华为技术有限公司 | Method and system for establishing social network and method and system for mining network community |
CN102254012A (en) * | 2011-07-19 | 2011-11-23 | 北京大学 | Graph data storing method and subgraph enquiring method based on external memory |
US20130080416A1 (en) * | 2011-09-23 | 2013-03-28 | The Hartford | System and method of insurance database optimization using social networking |
CN103455705A (en) * | 2013-05-24 | 2013-12-18 | 中国科学院自动化研究所 | Analysis and prediction system for cooperative correlative tracking and global situation of network social events |
CN103886011A (en) * | 2013-12-30 | 2014-06-25 | 安徽讯飞智元信息科技有限公司 | Social-relation network creation and retrieval system and method based on index files |
-
2019
- 2019-10-29 CN CN201911036341.3A patent/CN111444402A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101887441A (en) * | 2009-05-15 | 2010-11-17 | 华为技术有限公司 | Method and system for establishing social network and method and system for mining network community |
CN102254012A (en) * | 2011-07-19 | 2011-11-23 | 北京大学 | Graph data storing method and subgraph enquiring method based on external memory |
US20130080416A1 (en) * | 2011-09-23 | 2013-03-28 | The Hartford | System and method of insurance database optimization using social networking |
CN103455705A (en) * | 2013-05-24 | 2013-12-18 | 中国科学院自动化研究所 | Analysis and prediction system for cooperative correlative tracking and global situation of network social events |
CN103886011A (en) * | 2013-12-30 | 2014-06-25 | 安徽讯飞智元信息科技有限公司 | Social-relation network creation and retrieval system and method based on index files |
Non-Patent Citations (1)
Title |
---|
钱汉伟;袁明;吉文元;: "基于社交网络大数据线索分析平台研究及应用" * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116227598A (en) * | 2023-05-08 | 2023-06-06 | 山东财经大学 | Event prediction method, device and medium based on dual-stage attention mechanism |
CN116227598B (en) * | 2023-05-08 | 2023-07-11 | 山东财经大学 | Event prediction method, device and medium based on dual-stage attention mechanism |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103544255B (en) | Text semantic relativity based network public opinion information analysis method | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
Rizzo et al. | NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud. | |
CN102207945B (en) | Knowledge network-based text indexing system and method | |
CN113822067A (en) | Key information extraction method and device, computer equipment and storage medium | |
CN101593200A (en) | Chinese Web page classification method based on the keyword frequency analysis | |
Babur et al. | Hierarchical clustering of metamodels for comparative analysis and visualization | |
Zhao et al. | Topic-centric and semantic-aware retrieval system for internet of things | |
CN112989831B (en) | Entity extraction method applied to network security field | |
CN104268148A (en) | Forum page information auto-extraction method and system based on time strings | |
CN105718585A (en) | Document and label word semantic association method and device thereof | |
CN102654873A (en) | Tourism information extraction and aggregation method based on Chinese word segmentation | |
CN104346382B (en) | Use the text analysis system and method for language inquiry | |
Bhardwaj et al. | Web scraping using summarization and named entity recognition (ner) | |
CN111190873B (en) | Log mode extraction method and system for log training of cloud native system | |
Xu | Cultural communication in double-layer coupling social network based on association rules in big data | |
Wang et al. | Short text topic learning using heterogeneous information network | |
CN107784019A (en) | Word treatment method and system are searched in a kind of searching service | |
CN111444402A (en) | Analysis method for community detection based on index construction and social factor control network | |
Narayana et al. | A novel and efficient approach for near duplicate page detection in web crawling | |
Yu et al. | Mining hidden interests from twitter based on word similarity and social relationship for OLAP | |
CN116822491A (en) | Log analysis method and device, equipment and storage medium | |
Joshi et al. | Sequential pattern mining using formal language tools | |
Shaikh et al. | Bringing shape to textual data-a feasible demonstration | |
Zhang et al. | An improved ontology-based web information extraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200724 |