CN109657016A - The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model - Google Patents
The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model Download PDFInfo
- Publication number
- CN109657016A CN109657016A CN201811648180.9A CN201811648180A CN109657016A CN 109657016 A CN109657016 A CN 109657016A CN 201811648180 A CN201811648180 A CN 201811648180A CN 109657016 A CN109657016 A CN 109657016A
- Authority
- CN
- China
- Prior art keywords
- attribute
- subgraph
- homogeney
- requirement
- meeting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to the method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model.The specifically technical fields such as diagram data, data mining algorithm.This method is intended to quantitatively evaluating using subgraph is divided, to subgraph;Weighted average increment is calculated, standard deviation calculating is carried out in the subgraph after calculating mean values for the difference of obtained standard deviation and obtains attribute corresponding to the value to the satisfaction degree of homogeney.The method for meeting the attribute of homogeney requirement is excavated in attribute graph model provided by the invention, and large-scale graph data is simplified to the method for processing.The subsequent Mining Problems of the more large-scale graph data of attribute classification can be simplified, reduce calculating and storage overhead.
Description
Technical field
The present invention relates to the method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model.Specifically figure number
According to technical fields such as, data mining algorithms.
Background technique
All the time, traditional relational data model is constantly in dominant position in data modeling field.But with phase
The progress of pass technology and the development in epoch, the range that traditional relational model uses is increasing, some relationships then occurs
The scene that model can not be applicable in, such as in social networks, transportation network etc. between the frequent scene of relational operation entity among, because
A large amount of contingency table is needed to go the relationship of record a series of complex when dealing with relationship problem for traditional relevant database.?
It introduces after more multiple entity, more and more contingency tables will be needed, so that the solution based on relevant database is numerous
Trivial fallibility.The defect of data model is not suitable with the status of current data rapid growth but also this data are not easy to extend.
Under this situation, chart database comes into being, and chart database is theoretical originating from Euler and Tu, alternatively referred to as towards figure
Database.Its basic meaning be to scheme the storage of this data structure and inquiry data, its data model mainly with node and
While advantage is can quickly to solve complicated relations problems to embody.
With flourishing for chart database, engineering and scientific research personnel can select graph model among many actual scenes
Data are modeled, therefore also increasingly burning hot about the research of the related algorithm of diagram data.In various related algorithms, and to belong to
Property weighted graph mining algorithm correlative study it is the most popular, such as community detection, cluster, figure divide, outlier detection.These belong to
Mining algorithm on property figure is all based on the same hypothesis substantially: on all properties to be studied, homogeney must all be expired
Foot.So-called homogeney, from the perspective of nodal community, exactly node is more likely to be connected to those increasingly similar with oneself
On node.Newman delivered entitled Mixing patterns in network's on Physical Review in 2003
Article, he defines the mixed mode on meshed network for the first time in article, and proposes different with mixing (disassortative
Mixing) problem.There are nodes to tend to the case where being connected to node lower with oneself similarity i.e. in network.
Therefore the premise work for doing the mining algorithm of attributed graph is to find out the attribute for meeting homogeney, then could be at these
Excavation is done on attribute.Some scholars propose certain methods and can find out among multiple attributes in attribute graph model at present
Meet the attribute of homogeney requirement, but existing method is all based on Numeric Attributes, therefore proposes a kind of suitable for a variety of categories
Property, can find out meet in attribute graph model homogeney requirement attribute method it is very necessary.
Summary of the invention
The present invention provides the side that the attribute for meeting homogeney requirement is excavated in a kind of attribute graph model in view of the above deficiencies
Method.
The present invention adopts the following technical scheme:
The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model of the present invention, the method is as follows:
1) subgraph, is divided by network structure based on figure and property set;Degree is divided as parameter regulation using module " degree ",
Subgraph is divided into subgraph 1, subgraph 2 until subgraph n;
2) ordered categories type attribute and unordered type attribute in each subgraph, are distinguished;Each ordered categories type attribute is carried out
It is intended to quantitatively evaluating;
3), for step 2 for quantization after ordered categories type subgraph 1, subgraph 2 until subgraph n in each subgraph calculate plus
Weight average numerical value;
4) standard deviation calculating is carried out in the subgraph after, calculating mean values for step 3), for the difference of obtained standard deviation,
Attribute corresponding to the value is obtained to the satisfaction degree of homogeney.
The method for meeting the attribute of homogeney requirement: institute in step 1) is excavated in a kind of attribute graph model of the present invention
" degree " stated is the ratio for connecting side between the Lian Bianyu subgraph evaluated in the subgraph for dividing and obtaining.
The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model of the present invention: being united in step 2
Subgraph 1, subgraph 2 are counted until the number that each value of unordered type attribute occurs in subgraph n;Calculate the occupation ratio in the subgraph
Example and distribution, by the otherness being distributed in each subgraph, obtain the node that attribute meets the requirement of homogeney.
The method for meeting the attribute of homogeney requirement: the step is excavated in a kind of attribute graph model of the present invention
2) it in after ordered categories type attribute weight average, only makes comparisons between the different subgraph of numerical value.
The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model of the present invention:: in step 3)
Judge whether the attribute in each subgraph meets homogeney according to weighted average increment;If the weighted average between different subgraphs
It differs greatly, then it is assumed that this attribute is more likely to meet homogeney.
Beneficial effect
The method for meeting the attribute of homogeney requirement is excavated in attribute graph model provided by the invention, and large-scale graph data is simplified
The method of processing.The subsequent Mining Problems of the more large-scale graph data of attribute classification can be simplified, reduce and calculate and store
Expense.
Detailed description of the invention
Fig. 1 is processing flow schematic diagram of the invention.
Specific embodiment
To keep purpose and the technical solution of the embodiment of the present invention clearer, below in conjunction with the attached of the embodiment of the present invention
Figure, is clearly and completely described the technical solution of the embodiment of the present invention.Obviously, described embodiment is of the invention
A part of the embodiment, instead of all the embodiments.Based on described the embodiment of the present invention, those of ordinary skill in the art
Every other embodiment obtained, shall fall within the protection scope of the present invention under the premise of being not necessarily to creative work.
It is as shown in the figure: for the homogeney attribute excavation problem of a large-scale graph data collection, some category can be converted into
Property whether meet homogeney require the problem of.The present invention in a kind of a kind of attribute graph model proposed based on calculation procedure by digging
Pick meets the method for the attribute of homogeney requirement, the method is as follows:
Subgraph is divided by network structure based on figure and property set;Degree is divided as parameter regulation using module " degree ", it will be sub
Figure is divided into subgraph 1, subgraph 2 until subgraph n;
Distinguish ordered categories type attribute and unordered type attribute in each subgraph;The amount of desire is carried out to each ordered categories type attribute
Change evaluation;
For step 2 for the subgraph 1 of ordered categories type attribute, the subgraph 2 after quantization until each subgraph meter in subgraph n
Calculate weighted average increment;
Standard deviation calculating is carried out in subgraph after calculating mean values for step 3) to obtain for the difference of obtained standard deviation
To attribute corresponding to the value to the satisfaction degree of homogeney.
1), in attribute graph model some unordered type attributes (such as trip mode, including walking, bicycle,
Iron, public transport, self-driving etc.), it is intended to quantitatively evaluate its requirement for whether meeting homogeney in certain network, and then according to these homogeneities
Property attribute do further data mining etc. application, need to proceed as follows.
Attribute is not considered first, and whole figure is carried out according to network structure (node with the situation that is connected on side) i.e. in network
It divides.Particularly, when carrying out figure division, this method designs and has used a kind of method for carrying out figure division based on modularity,
It uses modularity to divide degree as parameter regulation.The meaning of modularity is that evaluation divides the Lian Bianyu in obtained subgraph
Connect the ratio on side between subgraph, it is that good subgraph divides the result is that subgraph Nei Lianbian is intensive, it is sparse that side is connected between subgraph.This method
It is a kind of method similar to community's detection, is first that all nodes distribute a label, for the maximum node of degree in figure,
It is the most label of frequency of occurrence in its neighbor node by its tag replacement, it is believed that the node with same label belongs to
The same subgraph.Iteration executes the above process, until result restrains or meet preset module angle value.In specific operation process
In, the granularity divided to figure can be adjusted according to the concrete condition of data, and then reach better effect.
It is dividing among obtained subgraph, is counting the number that each value of the unordered category attribute occurs, and calculate separately
Each value of this attribute ratio shared in entire subgraph, obtains distribution situation of the value of the attribute on subgraph.Then exist
The distribution situation of the different value of the attribute is counted on whole figure.If distribution situation of the different value of the attribute between different subgraphs
It differs greatly, and notable difference is distributed in the distribution situation in certain subgraphs and on entire data set, then it is assumed that this category
Property meets the requirement of homogeney, i.e., in the angle of this attribute, node tends to be connected to node similar with oneself.
2) in attribute graph model some ordered categories type attributes (rating achievement rating of such as student, from it is good to difference successively
It is A, B, C, D, E), it is intended to quantitatively evaluate its requirement for whether meeting homogeney in some network.
Using thinking similar to above, attribute value is not considered first, data set is used according to network structure
Partition (modularity) method is divided.To the appearance feelings of the different value of the attribute in dividing obtained subgraph
Condition does statistical analysis, and then understands the distribution situation of the value of the attribute.
It then obtains from 1) the middle each value frequency of occurrence in subgraph of statistics unlike the method for proportion, for having
Sequence type attribute can assign an integer value to each classification in sequence and represent its classification, be with student performance grade
Example, use 1 represent A, and 2 represent B, and 3 represent C, and 4 represent D, and 5 represent E.Particularly, this numerical value need not also need not be with 1 since 1
Step-length is incremented by, and numerical value apparent for some differences can increase step-length, such as 2, and 3,4,9, but must be passed according to classification sequence
Increase or successively decreases.The number that each value occurs is counted in subgraph, and calculates weighted average of the attribute in this subgraph accordingly.
Weighted average are calculated by this method respectively on dividing obtained each subgraph, and according to these weighted average
To judge whether this attribute meets homogeney.If weighted average differ greatly between different subgraphs, then it is assumed that this attribute is more
Tend to meet homogeney.
Particularly, the judgement of homogeney whether is met for ordered categories type attribute, if will first have using the above method
Class switching sequence is numerical value, judges whether it meets the method for homogeney by calculating the weighted average in different subgraphs,
Then finally relatively weighted average when, with 1) in relatively category distribution method the difference is that, the method is only needed than less
With the numerical value between subgraph, without being compared with the weighted average on whole figure.
Particularly, for ordered categories type data, can also divide between different subgraphs according to 1) middle relatively different attribute value
The method of cloth situation to determine whether meet homogeney, concrete operation method with 1) in it is identical.
3) about the satisfaction degree in attributed graph inherent quantization evaluation attributes to homogeney, will belong to the invention proposes a kind of
Property to homogeney satisfaction degree quantization method.
It, can will be each as 1), 2) in the method, the distribution situation of each attribute value is acquired between different subgraphs
The average acquired on subgraph regards a sequence as, to this sequence ask standard deviation (seeking standard deviation is statistical common operation,
Program is calculated).For different attributes, different standard deviations will necessarily be acquired, by the normalization (normalization of these standard deviations
It is statistical common operation, refers to and pass through different size of data in appropriate suitable means scaling to 0 to 1 section),
Obtained numerical value is that (obtained numerical value is the quantitatively evaluating attribute to satisfaction degree of the attribute corresponding to the value to homogeney
It to a kind of index of homogeney satisfaction degree, artificially does judge according to actual needs).
Particularly, the method is only applicable to the comparison inside single attributed graph between different attribute, is not suitable for across figure ratio
Compared with.
4) in view of the particularity of this method, if using traditional diagram data distributed libray scheme, it, will first to data cutting
Data fragmentation is distributed on different nodes and executes calculating, will affect the effect of method.The present invention is for the digging proposed in the invention
The method for meeting homogeney attribute in pick attribute graph model devises a kind of new two-part distributed schemes.
First stage executes division methods on cluster and does figure division, and the result after division is then stored in master section
Point on.
The copy for the figure that second stage generates after dividing to different slave node distributions.Each slave node is responsible for one
The judgement of a or multiple attributes.Master node leaves out when distributing copy to specific slave node and is not required to slave section
The attribute of point processing.After each node receives copy, meet the side of homogeney attribute in local runtime excavation proposed by the present invention
Method.
According to above-mentioned statement, citing is illustrated below:
It is successively A, B, C, D, E from getting well difference such as the rating achievement rating of student;Illustrate process based on the data combination this method.
By taking the social networks that certain primary school Third school grade school is constituted as an example, the node on behalf in this social networks is single
Student, Lian Bian represent friend relation, and there are many attributes on node, and one of attribute is this term final examination achievement, from good
It is respectively as follows: A, B, C, D, E to difference.Whether present analytic learning achievement attribute meets homogeney on this social networks.Step
It is as follows:
1. setting module degree is done subgraph according to subgraph division methods described previously and is divided.(assuming that marking off 10 subgraphs)
2. setting the corresponding numerical value of performance level: A-1, B-2, C-3, D-7, E-10
3. calculated on different subgraphs achievement average value (assuming that the average value on 10 subgraphs is respectively a1, a2,,,
A10);
4. calculating the standard deviation of this sequence of a1-a10;
5. by the standard deviation worked it out in this standard deviation and other attributes, (calculation method and 1-4 of other attributes difference are walked
What is described in rapid is consistent) in common scaling to the section of 0-1, obtained final numerical value is bigger, attribute corresponding to the numerical value
It is higher to the satisfaction of homogeney.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art,
It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of protection of the claims
Subject to.
Claims (5)
1. excavating the method for meeting the attribute of homogeney requirement in a kind of attribute graph model, it is characterised in that: method is as follows:
1) subgraph, is divided by network structure based on figure and property set;Degree is divided as parameter regulation using module " degree ",
Subgraph is divided into subgraph 1, subgraph 2 until subgraph n;
2) ordered categories type attribute and unordered type attribute in each subgraph, are distinguished;Each ordered categories type attribute is carried out
It is intended to quantitatively evaluating;
3), the subgraph 1 for the ordered categories type attribute being directed to after quantization for step 2, subgraph 2 are up to each subgraph in subgraph n
Calculate weighted average increment;
4) standard deviation calculating is carried out in the subgraph after, calculating mean values for step 3), for the difference of obtained standard deviation,
Attribute corresponding to the value is obtained to the satisfaction degree of homogeney.
2. excavating the method for meeting the attribute of homogeney requirement in attribute graph model according to claim 1, feature exists
In: " degree " described in step 1) is the ratio for connecting side between the Lian Bianyu subgraph evaluated in the subgraph for dividing and obtaining.
3. excavating the method for meeting the attribute of homogeney requirement in attribute graph model according to claim 1, feature exists
In: subgraph 1, subgraph 2 are counted in step 2 until the number that each value of unordered type attribute occurs in subgraph n;It calculates at this
Occupation ratio and distribution in subgraph obtain the requirement that attribute meets homogeney by the otherness being distributed in each subgraph
Node.
4. excavating the method for meeting the attribute of homogeney requirement in attribute graph model according to claim 1, feature exists
In: in the step 2) after ordered categories type attribute weight average, only make comparisons between the different subgraph of numerical value.
5. excavating the method for meeting the attribute of homogeney requirement in attribute graph model according to claim 1, feature exists
In: judge whether the attribute in each subgraph meets homogeney according to weighted average increment in step 3);If different subgraphs it
Between weighted average differ greatly, then it is assumed that this attribute is more likely to meet homogeney.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811648180.9A CN109657016A (en) | 2018-12-30 | 2018-12-30 | The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811648180.9A CN109657016A (en) | 2018-12-30 | 2018-12-30 | The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109657016A true CN109657016A (en) | 2019-04-19 |
Family
ID=66118562
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811648180.9A Pending CN109657016A (en) | 2018-12-30 | 2018-12-30 | The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109657016A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118377939A (en) * | 2024-06-21 | 2024-07-23 | 杭州海亮铭优在线教育科技有限公司 | Educational data processing method, equipment and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103325061A (en) * | 2012-11-02 | 2013-09-25 | 中国人民解放军国防科学技术大学 | Community discovery method and system |
CN104008165A (en) * | 2014-05-29 | 2014-08-27 | 华东师范大学 | Club detecting method based on network topology and node attribute |
CN106777065A (en) * | 2016-12-12 | 2017-05-31 | 郑州云海信息技术有限公司 | The method and system that a kind of Frequent tree mining is excavated |
US20170185910A1 (en) * | 2015-12-28 | 2017-06-29 | International Business Machines Corporation | Steering graph mining algorithms applied to complex networks |
CN107273934A (en) * | 2017-06-28 | 2017-10-20 | 电子科技大学 | A kind of figure clustering method merged based on attribute |
-
2018
- 2018-12-30 CN CN201811648180.9A patent/CN109657016A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103325061A (en) * | 2012-11-02 | 2013-09-25 | 中国人民解放军国防科学技术大学 | Community discovery method and system |
CN104008165A (en) * | 2014-05-29 | 2014-08-27 | 华东师范大学 | Club detecting method based on network topology and node attribute |
US20170185910A1 (en) * | 2015-12-28 | 2017-06-29 | International Business Machines Corporation | Steering graph mining algorithms applied to complex networks |
CN106777065A (en) * | 2016-12-12 | 2017-05-31 | 郑州云海信息技术有限公司 | The method and system that a kind of Frequent tree mining is excavated |
CN107273934A (en) * | 2017-06-28 | 2017-10-20 | 电子科技大学 | A kind of figure clustering method merged based on attribute |
Non-Patent Citations (2)
Title |
---|
吴烨等: "一种高效的属性图聚类方法", 《计算机学报》 * |
吴钟刚等: "一种基于局部相似性的社区发现算法", 《计算机工程》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118377939A (en) * | 2024-06-21 | 2024-07-23 | 杭州海亮铭优在线教育科技有限公司 | Educational data processing method, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cao et al. | Detecting prosumer-community groups in smart grids from the multiagent perspective | |
CN106709037B (en) | A kind of film recommended method based on Heterogeneous Information network | |
CN104346481B (en) | A kind of community detection method based on dynamic synchronization model | |
CN106326637A (en) | Link prediction method based on local effective path degree | |
Ding et al. | A new hierarchical ranking aggregation method | |
CN106951524A (en) | Overlapping community discovery method based on node influence power | |
Xia et al. | CHIEF: Clustering with higher-order motifs in big networks | |
CN106528804B (en) | A kind of tenant group method based on fuzzy clustering | |
Xu et al. | Finding overlapping community from social networks based on community forest model | |
CN108984830A (en) | A kind of building efficiency evaluation method and device based on FUZZY NETWORK analysis | |
CN108765180A (en) | The overlapping community discovery method extended with seed based on influence power | |
Xing et al. | Overlapping Community Detection by Local Community Expansion. | |
Li et al. | A community clustering algorithm based on genetic algorithm with novel coding scheme | |
CN107276093B (en) | Power system probability load flow calculation method based on scene reduction | |
CN116415199B (en) | Business data outlier analysis method based on audit intermediate table | |
CN108400889A (en) | A kind of community discovery method based on suboptimization | |
CN112257950A (en) | Trade path configuration method applied to power market and computer-readable storage medium | |
CN109657016A (en) | The method for meeting the attribute of homogeney requirement is excavated in a kind of attribute graph model | |
Hao et al. | The research and analysis in decision tree algorithm based on C4. 5 algorithm | |
CN105631751A (en) | Directional local group discovery method | |
Zhou et al. | Identifying technology evolution pathways by integrating citation network and text mining | |
CN108509531A (en) | A kind of uncertain data collection frequent-item method based on Spark platforms | |
CN114943019A (en) | Top k non-overlapping diversified community discovery method based on double-layer weight network random walk | |
Watts | Rules versus hierarchy: an application of fuzzy set theory to the assessment of spatial grouping techniques | |
CN109886313A (en) | A kind of Dynamic Graph clustering method based on density peak |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190419 |