CN107562854B - Modeling method for quantitatively analyzing party building data - Google Patents

Modeling method for quantitatively analyzing party building data Download PDF

Info

Publication number
CN107562854B
CN107562854B CN201710751678.7A CN201710751678A CN107562854B CN 107562854 B CN107562854 B CN 107562854B CN 201710751678 A CN201710751678 A CN 201710751678A CN 107562854 B CN107562854 B CN 107562854B
Authority
CN
China
Prior art keywords
party building
party
work
keywords
correlation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710751678.7A
Other languages
Chinese (zh)
Other versions
CN107562854A (en
Inventor
李维华
王兵益
郭延哺
王顺芳
何敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201710751678.7A priority Critical patent/CN107562854B/en
Publication of CN107562854A publication Critical patent/CN107562854A/en
Application granted granted Critical
Publication of CN107562854B publication Critical patent/CN107562854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a modeling method for quantitatively analyzing party building data. The method comprises the steps of firstly extracting party building work keywords in a party building work text, giving quantitative measurement to the correlation between every two party building work keywords, abstracting the party building work keywords into nodes of a polytree model, determining directed edges of the model according to the correlation between every two party building work keywords, and then determining conditional probability parameters of the model. The invention provides a quantitative measurement method for the correlation between party building work keywords, and provides visual modeling for the correlation between all party building work keywords by using a polytree model, thereby providing support for further analyzing party building big data.

Description

Modeling method for quantitatively analyzing party building data
Technical Field
The invention belongs to the technical field of data mining, and relates to a modeling method for quantitatively analyzing party building data.
Background
The party building work is a fundamental guarantee for grasping the party team and also a fundamental guarantee for doing all the work, and the improvement of the scientific level of the party building work is an important task of the current party building work. In the process of development of party building work, massive party building data including thought building data, organization building data, composition building data, system building data and the like are generated, and intelligent management and effective analysis of the party building data become an urgent need. Quantitative modeling and association analysis are carried out on the party building big data, and an effective analysis mining method is researched, so that the method is a key for effectively analyzing the party building big data and is a basis for improving the party building scientific level. The polytree model is a simple probabilistic graph model for uncertainty knowledge representation and reasoning, can capture quantitative uncertainty relation among data, and provides an efficient reasoning mechanism for quantitative analysis of party construction work. The invention carries out quantitative modeling on party construction work by using a polytree model, provides a modeling means for mining global correlation by quantitatively measuring the correlation among the keywords of the party construction work, provides support for party construction text analysis and party construction work analysis, and also provides technical support for improving the scientific level of the party construction.
Disclosure of Invention
Aiming at massive data generated in party construction work, the invention provides an effective modeling method for mining the global correlation relationship of the party construction data and provides support for the analysis of big data of the party construction work. The method mainly comprises the following steps:
the method comprises the following steps of firstly, quantifying a work text of each party, specifically:
1.1, work text set is built for n parties
Figure 789526DEST_PATH_IMAGE001
Extracting m party construction work keyword sets
Figure 54285DEST_PATH_IMAGE002
1.2, defining a document frequency function f (x), wherein x represents a keyword combination sequence appearing and not appearing in the document, and building work keywords for parties
Figure 338636DEST_PATH_IMAGE003
Representing keywords
Figure 688846DEST_PATH_IMAGE004
Is present in the document and is,
Figure 388949DEST_PATH_IMAGE005
representing keywords
Figure 6750DEST_PATH_IMAGE004
Do not appear in the document; for example
Figure 399685DEST_PATH_IMAGE006
To represent
Figure 237191DEST_PATH_IMAGE004
Appear and
Figure 740984DEST_PATH_IMAGE007
the combination of keywords that does not occur,
Figure 216222DEST_PATH_IMAGE008
representing keywords
Figure 780058DEST_PATH_IMAGE004
Appear but
Figure 104860DEST_PATH_IMAGE007
Frequency of documents not present;
second step, to
Figure 772864DEST_PATH_IMAGE009
Zhonganyang work keywords
Figure 866722DEST_PATH_IMAGE010
Definition of
Figure 601460DEST_PATH_IMAGE011
Figure 646514DEST_PATH_IMAGE012
Determination by Chi-Square test
Figure 757689DEST_PATH_IMAGE013
And
Figure 706054DEST_PATH_IMAGE014
whether or not independent of each other; by correlation
Figure 611693DEST_PATH_IMAGE015
Quantitative measurement
Figure 146973DEST_PATH_IMAGE013
And
Figure 530681DEST_PATH_IMAGE014
if there is a direct correlation between them, if
Figure 599131DEST_PATH_IMAGE013
And
Figure 675671DEST_PATH_IMAGE014
independently of each other, then
Figure 695318DEST_PATH_IMAGE016
0, otherwise
Figure 882717DEST_PATH_IMAGE017
Thirdly, establishing a maximum weight spanning tree T of m nodes
3.1, mixing
Figure 805673DEST_PATH_IMAGE018
Each party in (1) creates a work keyword abstracted as a node in T;
3.2, examining the correlation degree between every two words from large to small
Figure 53115DEST_PATH_IMAGE019
If no loop is generated, adding a non-directional edge in T
Figure 551268DEST_PATH_IMAGE020
Otherwise, abandon the
Figure 542358DEST_PATH_IMAGE021
Until T has
Figure 319821DEST_PATH_IMAGE022
Edge or
Figure 738164DEST_PATH_IMAGE023
Until the end;
the fourth step, for T neutron picture
Figure 466824DEST_PATH_IMAGE024
Computing
Figure 261604DEST_PATH_IMAGE025
Determination by Chi-Square test
Figure 893574DEST_PATH_IMAGE013
And
Figure 482818DEST_PATH_IMAGE014
whether or not to
Figure 528135DEST_PATH_IMAGE026
The conditions are independent; if it is not
Figure 362492DEST_PATH_IMAGE013
And
Figure 848968DEST_PATH_IMAGE014
about
Figure 874693DEST_PATH_IMAGE026
Are not subject to the conditions, and
Figure 79409DEST_PATH_IMAGE027
then the subgraph
Figure 448948DEST_PATH_IMAGE028
As a convergent structure
Figure 55510DEST_PATH_IMAGE029
Until there is no subgraph satisfying the condition and a graph G' is obtained;
the fifth step, under the condition of not generating new convergence structureGIn the method, all the undirected edges are set as directed edges to obtain a graph structure of polytreeG
Sixthly, calculating the conditional probability of each node v in G under the condition of parent node pa (v)
Figure 252136DEST_PATH_IMAGE030
And obtaining a conditional probability set P, and finally obtaining a complete party building big data polytree model (G, P).
Drawings
FIG. 1. Process for building a party data polytree model;
Detailed Description
The following detailed description of the embodiments according to the present invention is provided with reference to fig. 1.
The method comprises the following steps of firstly, quantifying a work text of each party, specifically:
1.1, work text set is built for n parties
Figure 678570DEST_PATH_IMAGE031
Extracting m party construction work keyword sets
Figure 120309DEST_PATH_IMAGE032
1.2, defining a document frequency function f (x), wherein x represents a keyword combination sequence appearing and not appearing in the document, and building work keywords for parties
Figure 581377DEST_PATH_IMAGE033
Representing keywords
Figure 683325DEST_PATH_IMAGE013
Is present in the document and is,
Figure 862634DEST_PATH_IMAGE034
representing keywords
Figure 839555DEST_PATH_IMAGE013
Do not appear in the document; for example
Figure 155130DEST_PATH_IMAGE035
To represent
Figure 693558DEST_PATH_IMAGE013
Appear and
Figure 94584DEST_PATH_IMAGE036
the combination of keywords that does not occur,
Figure 878125DEST_PATH_IMAGE037
representing keywords
Figure 376103DEST_PATH_IMAGE038
Appear but
Figure 819854DEST_PATH_IMAGE036
Frequency of documents not present;
assuming n =100, some two words are counted
Figure 973754DEST_PATH_IMAGE038
And
Figure 793943DEST_PATH_IMAGE036
the document frequency of
Figure 582645DEST_PATH_IMAGE039
Figure 197297DEST_PATH_IMAGE040
Second step, to
Figure 572915DEST_PATH_IMAGE041
Zhonganyang work keywords
Figure 196794DEST_PATH_IMAGE042
Definition of
Figure 854651DEST_PATH_IMAGE043
Figure 781150DEST_PATH_IMAGE044
Determination by Chi-Square test
Figure 142599DEST_PATH_IMAGE038
And
Figure 304590DEST_PATH_IMAGE036
whether or not independent of each other; by correlation
Figure 507033DEST_PATH_IMAGE045
Quantitative measurement
Figure 230531DEST_PATH_IMAGE038
And
Figure 315162DEST_PATH_IMAGE036
if there is a direct correlation between them, if
Figure 218527DEST_PATH_IMAGE038
And
Figure 836328DEST_PATH_IMAGE036
independently of each other, then
Figure 229263DEST_PATH_IMAGE046
0, otherwise
Figure 4452DEST_PATH_IMAGE047
For example, if the document frequencies of the two words α and β, respectively, are the result of the calculation in the first step, then
Figure 9711DEST_PATH_IMAGE048
Thirdly, establishing a maximum weight spanning tree T of m nodes, specifically
3.1, mixing
Figure 983483DEST_PATH_IMAGE049
Each party in (1) creates a work keyword abstracted as a node in T;
3.2, examining the correlation degree between every two words from large to small
Figure 547320DEST_PATH_IMAGE050
If no loop is generated, adding a non-directional edge in T
Figure 606542DEST_PATH_IMAGE051
Otherwise, abandon the
Figure 976344DEST_PATH_IMAGE052
Until T has
Figure 568737DEST_PATH_IMAGE053
Edge or
Figure 303475DEST_PATH_IMAGE054
FIG. 1 (left) shows a maximum weight spanning tree T;
the fourth step, for T neutron picture
Figure 849994DEST_PATH_IMAGE055
If it is not
Figure 695590DEST_PATH_IMAGE056
Then the directed edge cannot be determined; examination subgraph
Figure 145419DEST_PATH_IMAGE057
If it is not
Figure 51058DEST_PATH_IMAGE058
And
Figure 84873DEST_PATH_IMAGE059
about
Figure 734161DEST_PATH_IMAGE060
Are not conditionally independent and
Figure 301146DEST_PATH_IMAGE061
Figure 377686DEST_PATH_IMAGE062
then will be
Figure 633218DEST_PATH_IMAGE063
As a convergent structure
Figure 513626DEST_PATH_IMAGE064
Checking other subgraphs meeting the conditions in the same way; one possible configuration is shown in FIG. 1 (in) as diagram G';
fifthly, under the condition that a new convergence structure is not generated, all the undirected edges in G' are set as directed edges, for example, the undirected edges are set as directed edges
Figure 498899DEST_PATH_IMAGE065
Figure 746341DEST_PATH_IMAGE066
Is also arranged
Figure 489169DEST_PATH_IMAGE067
However, cannot be placed
Figure 480259DEST_PATH_IMAGE068
To this end, a new convergence structure is created
Figure 756257DEST_PATH_IMAGE069
Figure 174600DEST_PATH_IMAGE070
According to the principle, a graph structure G of polytree can be obtained finally;
sixthly, calculating the conditional probability of each node v in G under the condition of parent node pa (v)
Figure 670303DEST_PATH_IMAGE071
And obtaining a conditional probability set P, and finally obtaining a complete polyree model (G, P), as shown in FIG. 1 (right).
Figure 199505DEST_PATH_IMAGE073
Maximum weight spanning treeTPartial directed graphG' complete polyree modelG,P)
FIG. 1. Process for building a Party data polytree model

Claims (1)

1. A modeling method for quantitatively analyzing party building data is characterized by comprising the following steps:
the method comprises the following steps: quantifying each party building work text
1.1, pairnIndividual party building work text collection
Figure DEST_PATH_IMAGE001
Extraction ofmWork keyword set for individual party construction
Figure 370142DEST_PATH_IMAGE002
1.2 defining a document frequency functionf(x) Indicates that the keyword condition is satisfiedxThe number of party building work texts; whereinxThe method is characterized in that the method is formed by combining keyword conditions appearing in texts and non-appearing keyword conditions, and work keywords are established for parties
Figure DEST_PATH_IMAGE003
By using
Figure 378416DEST_PATH_IMAGE004
Representing keywords
Figure DEST_PATH_IMAGE005
Appear in the party building work text, use
Figure 620042DEST_PATH_IMAGE006
Representing keywords
Figure 957482DEST_PATH_IMAGE005
Does not appear in the party building work text;
step two: to pair
Figure DEST_PATH_IMAGE007
Zhonganyang work keywords
Figure 725587DEST_PATH_IMAGE008
Definition of
Figure DEST_PATH_IMAGE009
Figure 60753DEST_PATH_IMAGE010
Determination by Chi-Square test
Figure 789675DEST_PATH_IMAGE005
And
Figure DEST_PATH_IMAGE011
whether or not independent of each other; by correlation
Figure 868490DEST_PATH_IMAGE012
Quantitative measurement
Figure 930249DEST_PATH_IMAGE005
And
Figure 701896DEST_PATH_IMAGE011
if there is a direct correlation between them, if
Figure 714851DEST_PATH_IMAGE005
And
Figure 597356DEST_PATH_IMAGE011
are independent of each other and can be used for,
Figure DEST_PATH_IMAGE013
Figure 12157DEST_PATH_IMAGE014
otherwise
Figure DEST_PATH_IMAGE015
Step three: establishingmMaximum weight spanning tree of individual nodeT
3.1, mixing
Figure 954705DEST_PATH_IMAGE016
Each party in (a) constructs a work keyword abstract asTOne node in (1);
3.2 examine the correlation degree from large to small
Figure DEST_PATH_IMAGE017
If no loop is generated thenTAdding a non-directional edge
Figure 454957DEST_PATH_IMAGE018
Figure 875574DEST_PATH_IMAGE011
Otherwise, abandon the
Figure DEST_PATH_IMAGE019
Up toTTherein is provided with
Figure 908996DEST_PATH_IMAGE020
Edge or
Figure DEST_PATH_IMAGE021
Step four: to pairTMiddle graph
Figure 84762DEST_PATH_IMAGE022
Computing
Figure DEST_PATH_IMAGE023
Determination by Chi-Square test
Figure 72310DEST_PATH_IMAGE005
And
Figure 296618DEST_PATH_IMAGE011
whether or not to
Figure 420431DEST_PATH_IMAGE024
The conditions are independent; if it is not
Figure 704782DEST_PATH_IMAGE005
And
Figure 117309DEST_PATH_IMAGE011
about
Figure 709090DEST_PATH_IMAGE024
Are not subject to the conditions, and
Figure DEST_PATH_IMAGE025
then the subgraph
Figure 890672DEST_PATH_IMAGE026
As a convergent structure
Figure DEST_PATH_IMAGE027
Until there are no more subgraphs satisfying the condition and a graph is obtainedG′;
Step five: without generating new convergent structures, willGIn the method, all the undirected edges are set as directed edges to obtain a graph structure of polytreeG
Step six: computingGEach node invAt its parent node pa: (v) Conditional probability under the condition
Figure 142662DEST_PATH_IMAGE028
And obtaining a conditional probability setPFinally, a complete party building data polytree model is obtained (G,P)。
CN201710751678.7A 2017-08-28 2017-08-28 Modeling method for quantitatively analyzing party building data Active CN107562854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710751678.7A CN107562854B (en) 2017-08-28 2017-08-28 Modeling method for quantitatively analyzing party building data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710751678.7A CN107562854B (en) 2017-08-28 2017-08-28 Modeling method for quantitatively analyzing party building data

Publications (2)

Publication Number Publication Date
CN107562854A CN107562854A (en) 2018-01-09
CN107562854B true CN107562854B (en) 2020-09-22

Family

ID=60977304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710751678.7A Active CN107562854B (en) 2017-08-28 2017-08-28 Modeling method for quantitatively analyzing party building data

Country Status (1)

Country Link
CN (1) CN107562854B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049569A (en) * 2012-12-31 2013-04-17 武汉传神信息技术有限公司 Text similarity matching method on basis of vector space model
CN106598999A (en) * 2015-10-19 2017-04-26 北京国双科技有限公司 Method and device for calculating text theme membership degree
CN106844328A (en) * 2016-08-23 2017-06-13 华南师范大学 A kind of new extensive document subject matter semantic analysis and system
CN106874695A (en) * 2017-03-22 2017-06-20 北京大数医达科技有限公司 The construction method and device of medical knowledge collection of illustrative plates

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104516904B (en) * 2013-09-29 2018-04-03 北大方正集团有限公司 A kind of Key Points recommend method and its system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049569A (en) * 2012-12-31 2013-04-17 武汉传神信息技术有限公司 Text similarity matching method on basis of vector space model
CN106598999A (en) * 2015-10-19 2017-04-26 北京国双科技有限公司 Method and device for calculating text theme membership degree
CN106844328A (en) * 2016-08-23 2017-06-13 华南师范大学 A kind of new extensive document subject matter semantic analysis and system
CN106874695A (en) * 2017-03-22 2017-06-20 北京大数医达科技有限公司 The construction method and device of medical knowledge collection of illustrative plates

Also Published As

Publication number Publication date
CN107562854A (en) 2018-01-09

Similar Documents

Publication Publication Date Title
Rauber et al. Foolbox native: Fast adversarial attacks to benchmark the robustness of machine learning models in pytorch, tensorflow, and jax
Dai et al. Attribute selection based on a new conditional entropy for incomplete decision systems
Ahmed et al. Network sampling: From static to streaming graphs
Zhang et al. Parallel rough set based knowledge acquisition using MapReduce from big data
JP7103496B2 (en) Related score calculation system, method and program
CN107220902A (en) The cascade scale forecast method of online community network
CN104317794B (en) Chinese Feature Words association mode method for digging and its system based on dynamic item weights
Dobra et al. Loglinear model selection and human mobility
Sun et al. Mining software repositories for automatic interface recommendation
CN103440308B (en) A kind of digital thesis search method based on form concept analysis
Hu Medical data mining based on decision tree algorithm
Lu et al. Predicting viral news events in online media
Löhnertz et al. Steinmetz: Toward Automatic Decomposition of Monolithic Software Into Microservices.
Tahir et al. Big data—an evolving concern for forensic investigators
CN107562854B (en) Modeling method for quantitatively analyzing party building data
Molik et al. Combining natural language processing and metabarcoding to reveal pathogen-environment associations
Okubo et al. Structural change pattern mining based on constrained maximal k-plex search
CN112750047B (en) Behavior relation information extraction method and device, storage medium and electronic equipment
Nguyen et al. An efficient approach for mining weighted uncertain interesting patterns
Tuchowski et al. OBCAS-An Ontology-Based Cluster Analysis System
Li et al. Special section on big data and service computing
Ishikawa et al. A data model for integrating data management and data mining in social big data
CN117634894B (en) Ecological environment risk assessment method and device, electronic equipment and storage medium
Hejazy et al. An approach for deriving semantically related category hierarchies from Wikipedia category graphs
Vanetik Sufficient instance networks for computing support in graph databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant