CN107562854B - Modeling method for quantitatively analyzing party building data - Google Patents
Modeling method for quantitatively analyzing party building data Download PDFInfo
- Publication number
- CN107562854B CN107562854B CN201710751678.7A CN201710751678A CN107562854B CN 107562854 B CN107562854 B CN 107562854B CN 201710751678 A CN201710751678 A CN 201710751678A CN 107562854 B CN107562854 B CN 107562854B
- Authority
- CN
- China
- Prior art keywords
- party building
- party
- work
- keywords
- correlation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention discloses a modeling method for quantitatively analyzing party building data. The method comprises the steps of firstly extracting party building work keywords in a party building work text, giving quantitative measurement to the correlation between every two party building work keywords, abstracting the party building work keywords into nodes of a polytree model, determining directed edges of the model according to the correlation between every two party building work keywords, and then determining conditional probability parameters of the model. The invention provides a quantitative measurement method for the correlation between party building work keywords, and provides visual modeling for the correlation between all party building work keywords by using a polytree model, thereby providing support for further analyzing party building big data.
Description
Technical Field
The invention belongs to the technical field of data mining, and relates to a modeling method for quantitatively analyzing party building data.
Background
The party building work is a fundamental guarantee for grasping the party team and also a fundamental guarantee for doing all the work, and the improvement of the scientific level of the party building work is an important task of the current party building work. In the process of development of party building work, massive party building data including thought building data, organization building data, composition building data, system building data and the like are generated, and intelligent management and effective analysis of the party building data become an urgent need. Quantitative modeling and association analysis are carried out on the party building big data, and an effective analysis mining method is researched, so that the method is a key for effectively analyzing the party building big data and is a basis for improving the party building scientific level. The polytree model is a simple probabilistic graph model for uncertainty knowledge representation and reasoning, can capture quantitative uncertainty relation among data, and provides an efficient reasoning mechanism for quantitative analysis of party construction work. The invention carries out quantitative modeling on party construction work by using a polytree model, provides a modeling means for mining global correlation by quantitatively measuring the correlation among the keywords of the party construction work, provides support for party construction text analysis and party construction work analysis, and also provides technical support for improving the scientific level of the party construction.
Disclosure of Invention
Aiming at massive data generated in party construction work, the invention provides an effective modeling method for mining the global correlation relationship of the party construction data and provides support for the analysis of big data of the party construction work. The method mainly comprises the following steps:
the method comprises the following steps of firstly, quantifying a work text of each party, specifically:
1.2, defining a document frequency function f (x), wherein x represents a keyword combination sequence appearing and not appearing in the document, and building work keywords for partiesRepresenting keywordsIs present in the document and is,representing keywordsDo not appear in the document; for exampleTo representAppear andthe combination of keywords that does not occur,representing keywordsAppear butFrequency of documents not present;
second step, toZhonganyang work keywordsDefinition of Determination by Chi-Square testAndwhether or not independent of each other; by correlationQuantitative measurementAndif there is a direct correlation between them, ifAndindependently of each other, then0, otherwise
Thirdly, establishing a maximum weight spanning tree T of m nodes
3.2, examining the correlation degree between every two words from large to smallIf no loop is generated, adding a non-directional edge in TOtherwise, abandon theUntil T hasEdge orUntil the end;
the fourth step, for T neutron pictureComputingDetermination by Chi-Square testAndwhether or not toThe conditions are independent; if it is notAndaboutAre not subject to the conditions, andthen the subgraphAs a convergent structureUntil there is no subgraph satisfying the condition and a graph G' is obtained;
the fifth step, under the condition of not generating new convergence structureGIn the method, all the undirected edges are set as directed edges to obtain a graph structure of polytreeG;
Drawings
FIG. 1. Process for building a party data polytree model;
Detailed Description
The following detailed description of the embodiments according to the present invention is provided with reference to fig. 1.
The method comprises the following steps of firstly, quantifying a work text of each party, specifically:
1.2, defining a document frequency function f (x), wherein x represents a keyword combination sequence appearing and not appearing in the document, and building work keywords for partiesRepresenting keywordsIs present in the document and is,representing keywordsDo not appear in the document; for exampleTo representAppear andthe combination of keywords that does not occur,representing keywordsAppear butFrequency of documents not present;
Second step, toZhonganyang work keywordsDefinition of Determination by Chi-Square testAndwhether or not independent of each other; by correlationQuantitative measurementAndif there is a direct correlation between them, ifAndindependently of each other, then0, otherwise
For example, if the document frequencies of the two words α and β, respectively, are the result of the calculation in the first step, then
Thirdly, establishing a maximum weight spanning tree T of m nodes, specifically
3.2, examining the correlation degree between every two words from large to smallIf no loop is generated, adding a non-directional edge in TOtherwise, abandon theUntil T hasEdge orFIG. 1 (left) shows a maximum weight spanning tree T;
the fourth step, for T neutron pictureIf it is notThen the directed edge cannot be determined; examination subgraphIf it is notAndaboutAre not conditionally independent and then will beAs a convergent structureChecking other subgraphs meeting the conditions in the same way; one possible configuration is shown in FIG. 1 (in) as diagram G';
fifthly, under the condition that a new convergence structure is not generated, all the undirected edges in G' are set as directed edges, for example, the undirected edges are set as directed edges Is also arrangedHowever, cannot be placedTo this end, a new convergence structure is created According to the principle, a graph structure G of polytree can be obtained finally;
sixthly, calculating the conditional probability of each node v in G under the condition of parent node pa (v)And obtaining a conditional probability set P, and finally obtaining a complete polyree model (G, P), as shown in FIG. 1 (right).
Maximum weight spanning treeTPartial directed graphG' complete polyree modelG,P)
FIG. 1. Process for building a Party data polytree model
Claims (1)
1. A modeling method for quantitatively analyzing party building data is characterized by comprising the following steps:
the method comprises the following steps: quantifying each party building work text
1.1, pairnIndividual party building work text collectionExtraction ofmWork keyword set for individual party construction
1.2 defining a document frequency functionf(x) Indicates that the keyword condition is satisfiedxThe number of party building work texts; whereinxThe method is characterized in that the method is formed by combining keyword conditions appearing in texts and non-appearing keyword conditions, and work keywords are established for partiesBy usingRepresenting keywordsAppear in the party building work text, useRepresenting keywordsDoes not appear in the party building work text;
step two: to pairZhonganyang work keywordsDefinition of Determination by Chi-Square testAndwhether or not independent of each other; by correlationQuantitative measurementAndif there is a direct correlation between them, ifAndare independent of each other and can be used for, otherwise
Step three: establishingmMaximum weight spanning tree of individual nodeT
3.2 examine the correlation degree from large to smallIf no loop is generated thenTAdding a non-directional edge Otherwise, abandon theUp toTTherein is provided withEdge or
Step four: to pairTMiddle graphComputingDetermination by Chi-Square testAndwhether or not toThe conditions are independent; if it is notAndaboutAre not subject to the conditions, andthen the subgraphAs a convergent structureUntil there are no more subgraphs satisfying the condition and a graph is obtainedG′;
Step five: without generating new convergent structures, willGIn the method, all the undirected edges are set as directed edges to obtain a graph structure of polytreeG;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710751678.7A CN107562854B (en) | 2017-08-28 | 2017-08-28 | Modeling method for quantitatively analyzing party building data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710751678.7A CN107562854B (en) | 2017-08-28 | 2017-08-28 | Modeling method for quantitatively analyzing party building data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107562854A CN107562854A (en) | 2018-01-09 |
CN107562854B true CN107562854B (en) | 2020-09-22 |
Family
ID=60977304
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710751678.7A Active CN107562854B (en) | 2017-08-28 | 2017-08-28 | Modeling method for quantitatively analyzing party building data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107562854B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049569A (en) * | 2012-12-31 | 2013-04-17 | 武汉传神信息技术有限公司 | Text similarity matching method on basis of vector space model |
CN106598999A (en) * | 2015-10-19 | 2017-04-26 | 北京国双科技有限公司 | Method and device for calculating text theme membership degree |
CN106844328A (en) * | 2016-08-23 | 2017-06-13 | 华南师范大学 | A kind of new extensive document subject matter semantic analysis and system |
CN106874695A (en) * | 2017-03-22 | 2017-06-20 | 北京大数医达科技有限公司 | The construction method and device of medical knowledge collection of illustrative plates |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104516904B (en) * | 2013-09-29 | 2018-04-03 | 北大方正集团有限公司 | A kind of Key Points recommend method and its system |
-
2017
- 2017-08-28 CN CN201710751678.7A patent/CN107562854B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103049569A (en) * | 2012-12-31 | 2013-04-17 | 武汉传神信息技术有限公司 | Text similarity matching method on basis of vector space model |
CN106598999A (en) * | 2015-10-19 | 2017-04-26 | 北京国双科技有限公司 | Method and device for calculating text theme membership degree |
CN106844328A (en) * | 2016-08-23 | 2017-06-13 | 华南师范大学 | A kind of new extensive document subject matter semantic analysis and system |
CN106874695A (en) * | 2017-03-22 | 2017-06-20 | 北京大数医达科技有限公司 | The construction method and device of medical knowledge collection of illustrative plates |
Also Published As
Publication number | Publication date |
---|---|
CN107562854A (en) | 2018-01-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rauber et al. | Foolbox native: Fast adversarial attacks to benchmark the robustness of machine learning models in pytorch, tensorflow, and jax | |
Dai et al. | Attribute selection based on a new conditional entropy for incomplete decision systems | |
Ahmed et al. | Network sampling: From static to streaming graphs | |
Zhang et al. | Parallel rough set based knowledge acquisition using MapReduce from big data | |
JP7103496B2 (en) | Related score calculation system, method and program | |
CN107220902A (en) | The cascade scale forecast method of online community network | |
CN104317794B (en) | Chinese Feature Words association mode method for digging and its system based on dynamic item weights | |
Dobra et al. | Loglinear model selection and human mobility | |
Sun et al. | Mining software repositories for automatic interface recommendation | |
CN103440308B (en) | A kind of digital thesis search method based on form concept analysis | |
Hu | Medical data mining based on decision tree algorithm | |
Lu et al. | Predicting viral news events in online media | |
Löhnertz et al. | Steinmetz: Toward Automatic Decomposition of Monolithic Software Into Microservices. | |
Tahir et al. | Big data—an evolving concern for forensic investigators | |
CN107562854B (en) | Modeling method for quantitatively analyzing party building data | |
Molik et al. | Combining natural language processing and metabarcoding to reveal pathogen-environment associations | |
Okubo et al. | Structural change pattern mining based on constrained maximal k-plex search | |
CN112750047B (en) | Behavior relation information extraction method and device, storage medium and electronic equipment | |
Nguyen et al. | An efficient approach for mining weighted uncertain interesting patterns | |
Tuchowski et al. | OBCAS-An Ontology-Based Cluster Analysis System | |
Li et al. | Special section on big data and service computing | |
Ishikawa et al. | A data model for integrating data management and data mining in social big data | |
CN117634894B (en) | Ecological environment risk assessment method and device, electronic equipment and storage medium | |
Hejazy et al. | An approach for deriving semantically related category hierarchies from Wikipedia category graphs | |
Vanetik | Sufficient instance networks for computing support in graph databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |