CN107562854B

CN107562854B - Modeling method for quantitatively analyzing party building data

Info

Publication number: CN107562854B
Application number: CN201710751678.7A
Authority: CN
Inventors: 李维华; 王兵益; 郭延哺; 王顺芳; 何敏
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2017-08-28
Filing date: 2017-08-28
Publication date: 2020-09-22
Anticipated expiration: 2037-08-28
Also published as: CN107562854A

Abstract

The invention discloses a modeling method for quantitatively analyzing party building data. The method comprises the steps of firstly extracting party building work keywords in a party building work text, giving quantitative measurement to the correlation between every two party building work keywords, abstracting the party building work keywords into nodes of a polytree model, determining directed edges of the model according to the correlation between every two party building work keywords, and then determining conditional probability parameters of the model. The invention provides a quantitative measurement method for the correlation between party building work keywords, and provides visual modeling for the correlation between all party building work keywords by using a polytree model, thereby providing support for further analyzing party building big data.

Description

Modeling method for quantitatively analyzing party building data

Technical Field

The invention belongs to the technical field of data mining, and relates to a modeling method for quantitatively analyzing party building data.

Background

The party building work is a fundamental guarantee for grasping the party team and also a fundamental guarantee for doing all the work, and the improvement of the scientific level of the party building work is an important task of the current party building work. In the process of development of party building work, massive party building data including thought building data, organization building data, composition building data, system building data and the like are generated, and intelligent management and effective analysis of the party building data become an urgent need. Quantitative modeling and association analysis are carried out on the party building big data, and an effective analysis mining method is researched, so that the method is a key for effectively analyzing the party building big data and is a basis for improving the party building scientific level. The polytree model is a simple probabilistic graph model for uncertainty knowledge representation and reasoning, can capture quantitative uncertainty relation among data, and provides an efficient reasoning mechanism for quantitative analysis of party construction work. The invention carries out quantitative modeling on party construction work by using a polytree model, provides a modeling means for mining global correlation by quantitatively measuring the correlation among the keywords of the party construction work, provides support for party construction text analysis and party construction work analysis, and also provides technical support for improving the scientific level of the party construction.

Disclosure of Invention

Aiming at massive data generated in party construction work, the invention provides an effective modeling method for mining the global correlation relationship of the party construction data and provides support for the analysis of big data of the party construction work. The method mainly comprises the following steps:

the method comprises the following steps of firstly, quantifying a work text of each party, specifically:

1.1, work text set is built for n parties

Extracting m party construction work keyword sets

1.2, defining a document frequency function f (x), wherein x represents a keyword combination sequence appearing and not appearing in the document, and building work keywords for parties

Representing keywords

Is present in the document and is,

representing keywords

Do not appear in the document; for example

To represent

Appear and

the combination of keywords that does not occur,

representing keywords

Appear but

Frequency of documents not present;

second step, to

Zhonganyang work keywords

Definition of

Determination by Chi-Square test

And

whether or not independent of each other; by correlation

Quantitative measurement

And

if there is a direct correlation between them, if

And

independently of each other, then

0, otherwise

Thirdly, establishing a maximum weight spanning tree T of m nodes

3.1, mixing

Each party in (1) creates a work keyword abstracted as a node in T;

3.2, examining the correlation degree between every two words from large to small

If no loop is generated, adding a non-directional edge in T

Otherwise, abandon the

Until T has

Edge or

Until the end;

the fourth step, for T neutron picture

Computing

Determination by Chi-Square test

And

whether or not to

The conditions are independent; if it is not

And

about

Are not subject to the conditions, and

then the subgraph

As a convergent structure

Until there is no subgraph satisfying the condition and a graph G' is obtained;

the fifth step, under the condition of not generating new convergence structureGIn the method, all the undirected edges are set as directed edges to obtain a graph structure of polytreeG；

Sixthly, calculating the conditional probability of each node v in G under the condition of parent node pa (v)

And obtaining a conditional probability set P, and finally obtaining a complete party building big data polytree model (G, P).

Drawings

FIG. 1. Process for building a party data polytree model;

Detailed Description

The following detailed description of the embodiments according to the present invention is provided with reference to fig. 1.

1.1, work text set is built for n parties

Extracting m party construction work keyword sets

Representing keywords

Is present in the document and is,

representing keywords

Do not appear in the document; for example

To represent

Appear and

the combination of keywords that does not occur,

representing keywords

Appear but

Frequency of documents not present;

assuming n =100, some two words are counted

And

the document frequency of

Second step, to

Zhonganyang work keywords

Definition of

Determination by Chi-Square test

And

whether or not independent of each other; by correlation

Quantitative measurement

And

if there is a direct correlation between them, if

And

independently of each other, then

0, otherwise

For example, if the document frequencies of the two words α and β, respectively, are the result of the calculation in the first step, then

Thirdly, establishing a maximum weight spanning tree T of m nodes, specifically

3.1, mixing

Each party in (1) creates a work keyword abstracted as a node in T;

If no loop is generated, adding a non-directional edge in T

Otherwise, abandon the

Until T has

Edge or

FIG. 1 (left) shows a maximum weight spanning tree T;

the fourth step, for T neutron picture

If it is not

Then the directed edge cannot be determined; examination subgraph

If it is not

And

about

Are not conditionally independent and

then will be

As a convergent structure

Checking other subgraphs meeting the conditions in the same way; one possible configuration is shown in FIG. 1 (in) as diagram G';

fifthly, under the condition that a new convergence structure is not generated, all the undirected edges in G' are set as directed edges, for example, the undirected edges are set as directed edges

Is also arranged

However, cannot be placed

To this end, a new convergence structure is created

According to the principle, a graph structure G of polytree can be obtained finally;

And obtaining a conditional probability set P, and finally obtaining a complete polyree model (G, P), as shown in FIG. 1 (right).

Maximum weight spanning treeTPartial directed graphG' complete polyree modelG,P)

FIG. 1. Process for building a Party data polytree model

Claims

1. A modeling method for quantitatively analyzing party building data is characterized by comprising the following steps:

the method comprises the following steps: quantifying each party building work text

1.1, pairnIndividual party building work text collection

Extraction ofmWork keyword set for individual party construction

1.2 defining a document frequency functionf(x) Indicates that the keyword condition is satisfiedxThe number of party building work texts; whereinxThe method is characterized in that the method is formed by combining keyword conditions appearing in texts and non-appearing keyword conditions, and work keywords are established for parties

By using

Representing keywords

Appear in the party building work text, use

Representing keywords

Does not appear in the party building work text;

step two: to pair

Zhonganyang work keywords

Definition of

Determination by Chi-Square test

And

whether or not independent of each other; by correlation

Quantitative measurement

And

if there is a direct correlation between them, if

And

are independent of each other and can be used for,

otherwise

Step three: establishingmMaximum weight spanning tree of individual nodeT

3.1, mixing

Each party in (a) constructs a work keyword abstract asTOne node in (1);

3.2 examine the correlation degree from large to small

If no loop is generated thenTAdding a non-directional edge

Otherwise, abandon the

Up toTTherein is provided with

Edge or

Step four: to pairTMiddle graph

Computing

Determination by Chi-Square test

And

whether or not to

The conditions are independent; if it is not

And

about

Are not subject to the conditions, and

then the subgraph

As a convergent structure

Until there are no more subgraphs satisfying the condition and a graph is obtainedG′；

Step five: without generating new convergent structures, willGIn the method, all the undirected edges are set as directed edges to obtain a graph structure of polytreeG；

Step six: computingGEach node invAt its parent node pa: (v) Conditional probability under the condition

And obtaining a conditional probability setPFinally, a complete party building data polytree model is obtained (G,P)。