CN109657016A - The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model - Google Patents

The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model Download PDF

Info

Publication number
CN109657016A
CN109657016A CN201811648180.9A CN201811648180A CN109657016A CN 109657016 A CN109657016 A CN 109657016A CN 201811648180 A CN201811648180 A CN 201811648180A CN 109657016 A CN109657016 A CN 109657016A
Authority
CN
China
Prior art keywords
attribute
subgraph
homogeney
requirement
meeting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811648180.9A
Other languages
Chinese (zh)
Inventor
赵子豪
杨汉玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nupt Institute Of Big Data Research At Yancheng Co Ltd
Original Assignee
Nupt Institute Of Big Data Research At Yancheng Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nupt Institute Of Big Data Research At Yancheng Co Ltd filed Critical Nupt Institute Of Big Data Research At Yancheng Co Ltd
Priority to CN201811648180.9A priority Critical patent/CN109657016A/en
Publication of CN109657016A publication Critical patent/CN109657016A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to the method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model.The specifically technical fields such as diagram data, data mining algorithm.This method is intended to quantitatively evaluating using subgraph is divided, to subgraph;Weighted average increment is calculated, standard deviation calculating is carried out in the subgraph after calculating mean values for the difference of obtained standard deviation and obtains attribute corresponding to the value to the satisfaction degree of homogeney.The method for meeting the attribute of homogeney requirement is excavated in attribute graph model provided by the invention, and large-scale graph data is simplified to the method for processing.The subsequent Mining Problems of the more large-scale graph data of attribute classification can be simplified, reduce calculating and storage overhead.

Description

The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model
Technical field
The present invention relates to the method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model.Specifically figure number According to technical fields such as, data mining algorithms.
Background technique
All the time, traditional relational data model is constantly in dominant position in data modeling field.But with phase The progress of pass technology and the development in epoch, the range that traditional relational model uses is increasing, some relationships then occurs The scene that model can not be applicable in, such as in social networks, transportation network etc. between the frequent scene of relational operation entity among, because A large amount of contingency table is needed to go the relationship of record a series of complex when dealing with relationship problem for traditional relevant database.? It introduces after more multiple entity, more and more contingency tables will be needed, so that the solution based on relevant database is numerous Trivial fallibility.The defect of data model is not suitable with the status of current data rapid growth but also this data are not easy to extend.
Under this situation, chart database comes into being, and chart database is theoretical originating from Euler and Tu, alternatively referred to as towards figure Database.Its basic meaning be to scheme the storage of this data structure and inquiry data, its data model mainly with node and While advantage is can quickly to solve complicated relations problems to embody.
With flourishing for chart database, engineering and scientific research personnel can select graph model among many actual scenes Data are modeled, therefore also increasingly burning hot about the research of the related algorithm of diagram data.In various related algorithms, and to belong to Property weighted graph mining algorithm correlative study it is the most popular, such as community detection, cluster, figure divide, outlier detection.These belong to Mining algorithm on property figure is all based on the same hypothesis substantially: on all properties to be studied, homogeney must all be expired Foot.So-called homogeney, from the perspective of nodal community, exactly node is more likely to be connected to those increasingly similar with oneself On node.Newman delivered entitled Mixing patterns in network's on Physical Review in 2003 Article, he defines the mixed mode on meshed network for the first time in article, and proposes different with mixing (disassortative Mixing) problem.There are nodes to tend to the case where being connected to node lower with oneself similarity i.e. in network.
Therefore the premise work for doing the mining algorithm of attributed graph is to find out the attribute for meeting homogeney, then could be at these Excavation is done on attribute.Some scholars propose certain methods and can find out among multiple attributes in attribute graph model at present Meet the attribute of homogeney requirement, but existing method is all based on Numeric Attributes, therefore proposes a kind of suitable for a variety of categories Property, can find out meet in attribute graph model homogeney requirement attribute method it is very necessary.
Summary of the invention
The present invention provides the side that the attribute for meeting homogeney requirement is excavated in a kind of attribute graph model in view of the above deficiencies Method.
The present invention adopts the following technical scheme:
The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model of the present invention, the method is as follows:
1) subgraph, is divided by network structure based on figure and property set;Degree is divided as parameter regulation using module " degree ", Subgraph is divided into subgraph 1, subgraph 2 until subgraph n;
2) ordered categories type attribute and unordered type attribute in each subgraph, are distinguished;Each ordered categories type attribute is carried out It is intended to quantitatively evaluating;
3), for step 2 for quantization after ordered categories type subgraph 1, subgraph 2 until subgraph n in each subgraph calculate plus Weight average numerical value;
4) standard deviation calculating is carried out in the subgraph after, calculating mean values for step 3), for the difference of obtained standard deviation, Attribute corresponding to the value is obtained to the satisfaction degree of homogeney.
The method for meeting the attribute of homogeney requirement: institute in step 1) is excavated in a kind of attribute graph model of the present invention " degree " stated is the ratio for connecting side between the Lian Bianyu subgraph evaluated in the subgraph for dividing and obtaining.
The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model of the present invention: being united in step 2 Subgraph 1, subgraph 2 are counted until the number that each value of unordered type attribute occurs in subgraph n;Calculate the occupation ratio in the subgraph Example and distribution, by the otherness being distributed in each subgraph, obtain the node that attribute meets the requirement of homogeney.
The method for meeting the attribute of homogeney requirement: the step is excavated in a kind of attribute graph model of the present invention 2) it in after ordered categories type attribute weight average, only makes comparisons between the different subgraph of numerical value.
The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model of the present invention:: in step 3) Judge whether the attribute in each subgraph meets homogeney according to weighted average increment;If the weighted average between different subgraphs It differs greatly, then it is assumed that this attribute is more likely to meet homogeney.
Beneficial effect
The method for meeting the attribute of homogeney requirement is excavated in attribute graph model provided by the invention, and large-scale graph data is simplified The method of processing.The subsequent Mining Problems of the more large-scale graph data of attribute classification can be simplified, reduce and calculate and store Expense.
Detailed description of the invention
Fig. 1 is processing flow schematic diagram of the invention.
Specific embodiment
To keep purpose and the technical solution of the embodiment of the present invention clearer, below in conjunction with the attached of the embodiment of the present invention Figure, is clearly and completely described the technical solution of the embodiment of the present invention.Obviously, described embodiment is of the invention A part of the embodiment, instead of all the embodiments.Based on described the embodiment of the present invention, those of ordinary skill in the art Every other embodiment obtained, shall fall within the protection scope of the present invention under the premise of being not necessarily to creative work.
It is as shown in the figure: for the homogeney attribute excavation problem of a large-scale graph data collection, some category can be converted into Property whether meet homogeney require the problem of.The present invention in a kind of a kind of attribute graph model proposed based on calculation procedure by digging Pick meets the method for the attribute of homogeney requirement, the method is as follows:
Subgraph is divided by network structure based on figure and property set;Degree is divided as parameter regulation using module " degree ", it will be sub Figure is divided into subgraph 1, subgraph 2 until subgraph n;
Distinguish ordered categories type attribute and unordered type attribute in each subgraph;The amount of desire is carried out to each ordered categories type attribute Change evaluation;
For step 2 for the subgraph 1 of ordered categories type attribute, the subgraph 2 after quantization until each subgraph meter in subgraph n Calculate weighted average increment;
Standard deviation calculating is carried out in subgraph after calculating mean values for step 3) to obtain for the difference of obtained standard deviation To attribute corresponding to the value to the satisfaction degree of homogeney.
1), in attribute graph model some unordered type attributes (such as trip mode, including walking, bicycle, Iron, public transport, self-driving etc.), it is intended to quantitatively evaluate its requirement for whether meeting homogeney in certain network, and then according to these homogeneities Property attribute do further data mining etc. application, need to proceed as follows.
Attribute is not considered first, and whole figure is carried out according to network structure (node with the situation that is connected on side) i.e. in network It divides.Particularly, when carrying out figure division, this method designs and has used a kind of method for carrying out figure division based on modularity, It uses modularity to divide degree as parameter regulation.The meaning of modularity is that evaluation divides the Lian Bianyu in obtained subgraph Connect the ratio on side between subgraph, it is that good subgraph divides the result is that subgraph Nei Lianbian is intensive, it is sparse that side is connected between subgraph.This method It is a kind of method similar to community's detection, is first that all nodes distribute a label, for the maximum node of degree in figure, It is the most label of frequency of occurrence in its neighbor node by its tag replacement, it is believed that the node with same label belongs to The same subgraph.Iteration executes the above process, until result restrains or meet preset module angle value.In specific operation process In, the granularity divided to figure can be adjusted according to the concrete condition of data, and then reach better effect.
It is dividing among obtained subgraph, is counting the number that each value of the unordered category attribute occurs, and calculate separately Each value of this attribute ratio shared in entire subgraph, obtains distribution situation of the value of the attribute on subgraph.Then exist The distribution situation of the different value of the attribute is counted on whole figure.If distribution situation of the different value of the attribute between different subgraphs It differs greatly, and notable difference is distributed in the distribution situation in certain subgraphs and on entire data set, then it is assumed that this category Property meets the requirement of homogeney, i.e., in the angle of this attribute, node tends to be connected to node similar with oneself.
2) in attribute graph model some ordered categories type attributes (rating achievement rating of such as student, from it is good to difference successively It is A, B, C, D, E), it is intended to quantitatively evaluate its requirement for whether meeting homogeney in some network.
Using thinking similar to above, attribute value is not considered first, data set is used according to network structure Partition (modularity) method is divided.To the appearance feelings of the different value of the attribute in dividing obtained subgraph Condition does statistical analysis, and then understands the distribution situation of the value of the attribute.
It then obtains from 1) the middle each value frequency of occurrence in subgraph of statistics unlike the method for proportion, for having Sequence type attribute can assign an integer value to each classification in sequence and represent its classification, be with student performance grade Example, use 1 represent A, and 2 represent B, and 3 represent C, and 4 represent D, and 5 represent E.Particularly, this numerical value need not also need not be with 1 since 1 Step-length is incremented by, and numerical value apparent for some differences can increase step-length, such as 2, and 3,4,9, but must be passed according to classification sequence Increase or successively decreases.The number that each value occurs is counted in subgraph, and calculates weighted average of the attribute in this subgraph accordingly.
Weighted average are calculated by this method respectively on dividing obtained each subgraph, and according to these weighted average To judge whether this attribute meets homogeney.If weighted average differ greatly between different subgraphs, then it is assumed that this attribute is more Tend to meet homogeney.
Particularly, the judgement of homogeney whether is met for ordered categories type attribute, if will first have using the above method Class switching sequence is numerical value, judges whether it meets the method for homogeney by calculating the weighted average in different subgraphs, Then finally relatively weighted average when, with 1) in relatively category distribution method the difference is that, the method is only needed than less With the numerical value between subgraph, without being compared with the weighted average on whole figure.
Particularly, for ordered categories type data, can also divide between different subgraphs according to 1) middle relatively different attribute value The method of cloth situation to determine whether meet homogeney, concrete operation method with 1) in it is identical.
3) about the satisfaction degree in attributed graph inherent quantization evaluation attributes to homogeney, will belong to the invention proposes a kind of Property to homogeney satisfaction degree quantization method.
It, can will be each as 1), 2) in the method, the distribution situation of each attribute value is acquired between different subgraphs The average acquired on subgraph regards a sequence as, to this sequence ask standard deviation (seeking standard deviation is statistical common operation, Program is calculated).For different attributes, different standard deviations will necessarily be acquired, by the normalization (normalization of these standard deviations It is statistical common operation, refers to and pass through different size of data in appropriate suitable means scaling to 0 to 1 section), Obtained numerical value is that (obtained numerical value is the quantitatively evaluating attribute to satisfaction degree of the attribute corresponding to the value to homogeney It to a kind of index of homogeney satisfaction degree, artificially does judge according to actual needs).
Particularly, the method is only applicable to the comparison inside single attributed graph between different attribute, is not suitable for across figure ratio Compared with.
4) in view of the particularity of this method, if using traditional diagram data distributed libray scheme, it, will first to data cutting Data fragmentation is distributed on different nodes and executes calculating, will affect the effect of method.The present invention is for the digging proposed in the invention The method for meeting homogeney attribute in pick attribute graph model devises a kind of new two-part distributed schemes.
First stage executes division methods on cluster and does figure division, and the result after division is then stored in master section Point on.
The copy for the figure that second stage generates after dividing to different slave node distributions.Each slave node is responsible for one The judgement of a or multiple attributes.Master node leaves out when distributing copy to specific slave node and is not required to slave section The attribute of point processing.After each node receives copy, meet the side of homogeney attribute in local runtime excavation proposed by the present invention Method.
According to above-mentioned statement, citing is illustrated below:
It is successively A, B, C, D, E from getting well difference such as the rating achievement rating of student;Illustrate process based on the data combination this method.
By taking the social networks that certain primary school Third school grade school is constituted as an example, the node on behalf in this social networks is single Student, Lian Bian represent friend relation, and there are many attributes on node, and one of attribute is this term final examination achievement, from good It is respectively as follows: A, B, C, D, E to difference.Whether present analytic learning achievement attribute meets homogeney on this social networks.Step It is as follows:
1. setting module degree is done subgraph according to subgraph division methods described previously and is divided.(assuming that marking off 10 subgraphs)
2. setting the corresponding numerical value of performance level: A-1, B-2, C-3, D-7, E-10
3. calculated on different subgraphs achievement average value (assuming that the average value on 10 subgraphs is respectively a1, a2,,, A10);
4. calculating the standard deviation of this sequence of a1-a10;
5. by the standard deviation worked it out in this standard deviation and other attributes, (calculation method and 1-4 of other attributes difference are walked What is described in rapid is consistent) in common scaling to the section of 0-1, obtained final numerical value is bigger, attribute corresponding to the numerical value It is higher to the satisfaction of homogeney.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of protection of the claims Subject to.

Claims (5)

1. excavating the method for meeting the attribute of homogeney requirement in a kind of attribute graph model, it is characterised in that: method is as follows:
1) subgraph, is divided by network structure based on figure and property set;Degree is divided as parameter regulation using module " degree ", Subgraph is divided into subgraph 1, subgraph 2 until subgraph n;
2) ordered categories type attribute and unordered type attribute in each subgraph, are distinguished;Each ordered categories type attribute is carried out It is intended to quantitatively evaluating;
3), the subgraph 1 for the ordered categories type attribute being directed to after quantization for step 2, subgraph 2 are up to each subgraph in subgraph n Calculate weighted average increment;
4) standard deviation calculating is carried out in the subgraph after, calculating mean values for step 3), for the difference of obtained standard deviation, Attribute corresponding to the value is obtained to the satisfaction degree of homogeney.
2. excavating the method for meeting the attribute of homogeney requirement in attribute graph model according to claim 1, feature exists In: " degree " described in step 1) is the ratio for connecting side between the Lian Bianyu subgraph evaluated in the subgraph for dividing and obtaining.
3. excavating the method for meeting the attribute of homogeney requirement in attribute graph model according to claim 1, feature exists In: subgraph 1, subgraph 2 are counted in step 2 until the number that each value of unordered type attribute occurs in subgraph n;It calculates at this Occupation ratio and distribution in subgraph obtain the requirement that attribute meets homogeney by the otherness being distributed in each subgraph Node.
4. excavating the method for meeting the attribute of homogeney requirement in attribute graph model according to claim 1, feature exists In: in the step 2) after ordered categories type attribute weight average, only make comparisons between the different subgraph of numerical value.
5. excavating the method for meeting the attribute of homogeney requirement in attribute graph model according to claim 1, feature exists In: judge whether the attribute in each subgraph meets homogeney according to weighted average increment in step 3);If different subgraphs it Between weighted average differ greatly, then it is assumed that this attribute is more likely to meet homogeney.
CN201811648180.9A 2018-12-30 2018-12-30 The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model Pending CN109657016A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811648180.9A CN109657016A (en) 2018-12-30 2018-12-30 The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811648180.9A CN109657016A (en) 2018-12-30 2018-12-30 The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model

Publications (1)

Publication Number Publication Date
CN109657016A true CN109657016A (en) 2019-04-19

Family

ID=66118562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811648180.9A Pending CN109657016A (en) 2018-12-30 2018-12-30 The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model

Country Status (1)

Country Link
CN (1) CN109657016A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118377939A (en) * 2024-06-21 2024-07-23 杭州海亮铭优在线教育科技有限公司 Educational data processing method, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325061A (en) * 2012-11-02 2013-09-25 中国人民解放军国防科学技术大学 Community discovery method and system
CN104008165A (en) * 2014-05-29 2014-08-27 华东师范大学 Club detecting method based on network topology and node attribute
CN106777065A (en) * 2016-12-12 2017-05-31 郑州云海信息技术有限公司 The method and system that a kind of Frequent tree mining is excavated
US20170185910A1 (en) * 2015-12-28 2017-06-29 International Business Machines Corporation Steering graph mining algorithms applied to complex networks
CN107273934A (en) * 2017-06-28 2017-10-20 电子科技大学 A kind of figure clustering method merged based on attribute

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325061A (en) * 2012-11-02 2013-09-25 中国人民解放军国防科学技术大学 Community discovery method and system
CN104008165A (en) * 2014-05-29 2014-08-27 华东师范大学 Club detecting method based on network topology and node attribute
US20170185910A1 (en) * 2015-12-28 2017-06-29 International Business Machines Corporation Steering graph mining algorithms applied to complex networks
CN106777065A (en) * 2016-12-12 2017-05-31 郑州云海信息技术有限公司 The method and system that a kind of Frequent tree mining is excavated
CN107273934A (en) * 2017-06-28 2017-10-20 电子科技大学 A kind of figure clustering method merged based on attribute

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴烨等: "一种高效的属性图聚类方法", 《计算机学报》 *
吴钟刚等: "一种基于局部相似性的社区发现算法", 《计算机工程》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118377939A (en) * 2024-06-21 2024-07-23 杭州海亮铭优在线教育科技有限公司 Educational data processing method, equipment and medium

Similar Documents

Publication Publication Date Title
Cao et al. Detecting prosumer-community groups in smart grids from the multiagent perspective
CN106709037B (en) A kind of film recommended method based on Heterogeneous Information network
CN104346481B (en) A kind of community detection method based on dynamic synchronization model
CN106326637A (en) Link prediction method based on local effective path degree
Ding et al. A new hierarchical ranking aggregation method
CN106951524A (en) Overlapping community discovery method based on node influence power
Xia et al. CHIEF: Clustering with higher-order motifs in big networks
CN106528804B (en) A kind of tenant group method based on fuzzy clustering
Xu et al. Finding overlapping community from social networks based on community forest model
CN108984830A (en) A kind of building efficiency evaluation method and device based on FUZZY NETWORK analysis
CN108765180A (en) The overlapping community discovery method extended with seed based on influence power
Xing et al. Overlapping Community Detection by Local Community Expansion.
Li et al. A community clustering algorithm based on genetic algorithm with novel coding scheme
CN107276093B (en) Power system probability load flow calculation method based on scene reduction
CN116415199B (en) Business data outlier analysis method based on audit intermediate table
CN108400889A (en) A kind of community discovery method based on suboptimization
CN112257950A (en) Trade path configuration method applied to power market and computer-readable storage medium
CN109657016A (en) The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model
Hao et al. The research and analysis in decision tree algorithm based on C4. 5 algorithm
CN105631751A (en) Directional local group discovery method
Zhou et al. Identifying technology evolution pathways by integrating citation network and text mining
CN108509531A (en) A kind of uncertain data collection frequent-item method based on Spark platforms
CN114943019A (en) Top k non-overlapping diversified community discovery method based on double-layer weight network random walk
Watts Rules versus hierarchy: an application of fuzzy set theory to the assessment of spatial grouping techniques
CN109886313A (en) A kind of Dynamic Graph clustering method based on density peak

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190419