CN107562854A

CN107562854A - A kind of modeling method of quantitative analysis Party building data

Info

Publication number: CN107562854A
Application number: CN201710751678.7A
Authority: CN
Inventors: 李维华; 王兵益; 郭延哺; 王顺芳; 何敏
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2017-08-28
Filing date: 2017-08-28
Publication date: 2018-01-09
Anticipated expiration: 2037-08-28
Also published as: CN107562854B

Abstract

The present invention discloses a kind of modeling method of quantitative analysis Party building data.The present invention extracts the work of developing a party keyword in work of developing a party text first, dependency relation quantitative between work of developing a party keyword two-by-two is measured, then work of developing a party keyword is abstracted as to the node of polytree models, the directed edge of model is determined according to the dependency relation between work of developing a party keyword two-by-two, then determines the conditional probability parameter of model.The present invention provides the quantitative measurement method of dependency relation between work of developing a party keyword, and is provided with dependency relation of the polytree models between all work of developing a party keywords and intuitively modeled, and support is provided for further analysis Party building big data.

Description

A kind of modeling method of quantitative analysis Party building data

Technical field

The invention belongs to data mining technology field, is related to a kind of modeling method of quantitative analysis Party building data.

Background technology

The work of developing a party is to do a good job of it the basic guarantee of party member troop, and carries out the basic guarantee of all work, improves Party building The scientific level that works is a vital task of the current work of developing a party.During the work of developing a party is carried out, magnanimity is produced Party building data, including ideological building data, organizational building data, work style improvement data and institutional improvement data etc., to Party building Big data, which carries out intelligent management and effectively analysis, turns into an active demand.Quantitative modeling and pass are carried out to Party building big data Connection analysis, and studies effective analysis mining method, is the key of effective analysis Party building big data, and to improve Party building scientific Horizontal basis.Polytree models are a kind of probability graph models of simple uncertainty knowledge expression and reasoning, not only may be used To catch quantitative uncertainty relation between data, while also provide efficient inference mechanism for the quantitative analysis of the work of developing a party. The present invention carries out quantitative modeling with polytree models to the work of developing a party, passes through the phase between quantitative measurement work of developing a party keyword Pass relation, there is provided a kind of modeling means for excavating global dependency relation, branch is provided for Party building text analyzing and work of developing a party analysis Hold, also provide technical support to improve the scientific level of Party building.

The content of the invention

For caused mass data in the work of developing a party, the present invention provides one kind to excavate Party building data overall situation dependency relation Effective modeling method, support is provided for the analysis of work of developing a party big data.This method mainly includes the following steps that：

The first step, each work of developing a party text is quantified, be specially：

1.1st, it is rightnIndividual work of developing a party text collectionD={d ₁,d ₂,…,d _n, extractionmIndividual work of developing a party keyword setW={w ₁,w ₂,…,w _m};

1.2nd, definition document frequency functionf(x), whereinxKey combination sequence occurring in document and occurring without is represented, it is right Work of developing a party keywordα∈W,α ₁Represent keywordαOccur in a document,α ₀Represent keywordαOccur without in a document；Such asRepresentαOccur andβThe key combination occurred without,f(α ₁,β ₀) represent keywordαOccur butβThe text occurred without Shelves frequency；

Second step, it is rightWIn any work of developing a party keywordα、βWithγ, definition, sentenced with Chi-square Test (chi-square test) It is fixedαWithβIt is whether separate；With degree of correlation quantitative measurementαWithβBetween directly related relation, ifαWithβIndependently of each other, then, otherwise；

3rd step, establishmThe maximum weight spanning tree of individual nodeT

3.1st, willW={w ₁,w ₂,…,w _mIn each work of developing a party keyword be abstracted asTIn a node；

3.2nd, the degree of correlation between word two-by-two is investigated from big to small, if loop is not producedTIn plus a nothing Xiang Bianα―β, otherwise abandon this, untilTIn havem- 1 side orUntill；

4th step is rightTMiddle subgraphα―γ―β, calculate, with card (chi-square test) is examined to be judged in sideαWithβWhether onγConditional sampling；IfαWithβOn γ, condition is not only It is vertical, and, then by subgraphα―γ―βIt is set to aggregation infrastructureα→γ←β, until there is no satisfaction The subgraph of condition simultaneously obtains a figureG′；

5th step, will under conditions of new aggregation infrastructure is not producedG' in all nonoriented edges be set to directed edge, obtain Polytree graph structureG；

6th step, calculateGIn each nodevIts father node pa (v) under the conditions of conditional probability, and obtain set of conditional probabilitiesP, finally give complete Party building big data Polytree models (G,P)。

Brief description of the drawings

Fig. 1 build the process of Party building data polytree models；

Embodiment

Below in conjunction with accompanying drawing 1, to according to embodiment provided by the invention, describing in detail as follows.

Assuming thatnWhen=100, certain two word for counting onαWithβDocument frequency bef(α ₁,β ₁)=20,f(α ₁,β ₀)=20,f(α ₀,β ₁)=10,f(α ₀,β ₀)=50,f(α ₁)=40,f(α ₀)=60,f(β ₁)=30,f(β ₀)=70。

Second step, it is rightWIn any work of developing a party keywordα、βWithγ, definition, carried out with Chi-square Test (chi-square test) JudgeαWithβIt is whether separate；Use the degree of correlationQuantitative measurementαWithβBetween directly related relation, ifαWithβ Independently of each other, then, otherwise；

If for example, two wordsαWithβDocument frequency be the result calculated in the first step respectively, then

=0.063；

3rd step, establishmThe maximum weight spanning tree of individual nodeT, it is specially

3.2nd, the degree of correlation between word two-by-two is investigated from big to small, if loop is not producedTIn plus one Nonoriented edgeα―β, otherwise abandon this, untilTIn havem- 1 side or；Shown in Fig. 1 (left side) For a maximum weight spanning treeT；

4th step is rightTMiddle subgraphw ₁―w ₂―w ₄If, then not can determine that directed edge；Check subgraphw ₃―w ₄―w ₂Ifw ₂Withw ₃Onw ₄Not conditional sampling and, then willw ₃―w ₄―w ₂ It is set to aggregation infrastructurew ₃→w ₄←w ₂；Similarly check other subgraphs for meeting condition；Fig. 1 (in) shown in figureG' be exactly one can The structure of energy；

5th step, will under conditions of new aggregation infrastructure is not producedG' in all nonoriented edges be set to directed edge, such as putw ₁→w ₂, orw ₁→w ₂；Equally putw ₄→w ₆；But it can not putw ₄←w ₆, because will so produce new aggregation infrastructurew ₂→w ₄←w ₆； According to such principle, polytree graph structure can be finally obtainedG；

6th step, calculateGIn each nodevIts father node pa (v) under the conditions of conditional probability, and obtain set of conditional probabilitiesP, finally obtain complete polyree Model (G,P), as shown in Fig. 1 (right side).

Claims

A kind of 1. modeling method of quantitative analysis Party building data, the method is characterized in that comprising the following steps：

Step 1：Quantify each work of developing a party text

1.1st, it is rightnIndividual work of developing a party text collectionD={d ₁,d ₂,…,d _n, extractionmIndividual work of developing a party keyword setW={w ₁,w ₂,…,w _m};

1.2nd, definition document frequency functionf(x), whereinxKey combination sequence occurring in document and occurring without is represented, it is right Work of developing a party keywordα∈W,α ₁Represent keywordαOccur in a document,α ₀Represent keywordαOccur without in a document；

Step 2：It is rightWIn any work of developing a party keywordα、βWithγ, definition, judged with Chi-square Test (chi-square test)α WithβIt is whether separate；Use the degree of correlationQuantitative measurementαWithβBetween directly related relation, ifαWithβMutually solely It is vertical, then, otherwise；

Step 3：EstablishmThe maximum weight spanning tree of individual nodeT

3.1st, willW={w ₁,w ₂,…,w _mIn each work of developing a party keyword be abstracted asTIn a node；

3.2nd, the degree of correlation is investigated from big to small, if loop is not producedTIn plus a nonoriented edgeα―β, Otherwise this is abandoned, untilTIn havem- 1 side or；

Step 4：It is rightTMiddle subgraphα―γ―β, calculate, Judged with Chi-square Test (chi-square test)αWithβWhether onγConditional sampling；IfαWithβOn γ not bars Part is independent, and, then by subgraphα―γ―βIt is set to aggregation infrastructureα→γ←β, until There is no the subgraph for the condition that meets and obtain a figureG′；

Step 5：, will under conditions of new aggregation infrastructure is not producedG' in all nonoriented edges be set to directed edge, obtain Polytree graph structureG；

Step 6：CalculateGIn each nodevIts father node pa (v) under the conditions of conditional probability, and obtain set of conditional probabilitiesP, finally give complete Party building data polytree Model (G,P)。