CN110968651A

CN110968651A - Data processing method and system based on grey fuzzy clustering

Info

Publication number: CN110968651A
Application number: CN201911129127.2A
Authority: CN
Inventors: 肖炯恩
Original assignee: GUANGZHOU SAIBAO LIANRUI INFORMATION TECHNOLOGY CO LTD; Guangdong University of Business Studies
Current assignee: GUANGZHOU SAIBAO LIANRUI INFORMATION TECHNOLOGY CO LTD; Guangdong University of Business Studies
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-04-07

Abstract

The invention discloses a data processing method based on gray fuzzy clustering, and government affair data sharing is an important and complex project and relates to a large number of influencing factors. In order to identify key factors, the method adopts a model combining gray correlation analysis and fuzzy clustering analysis to obtain 43 influencing factors, and performs empirical research to identify the important influencing factors, namely the common important influencing factors. And according to the grey fuzzy clustering result, proposing a countermeasure or suggestion for promoting government affair data sharing by combining with the attitude curve of the evaluator. The data sharing method provides a reference for data sharing work of provincial and urban governments and can implement an overall solution idea.

Description

Data processing method and system based on grey fuzzy clustering

Technical Field

The invention relates to the field of data processing, in particular to a data processing method and system based on gray fuzzy clustering.

Background

In recent years, with the progress of technology and the guidance in policy, the reform of digital governments proposes to break through government ' data islands ', the optimization of operator environments requires that each government unit carries out data butt joint, increasingly severe safety problems require that the data of each unit can be effectively shared, the optimization and upgrade of convenient service requires that each government department cooperates to realize zero running legs ' of the public, and all the steps of governments are promoted to actively promote the cross-department government affair data sharing. The cross-department government affair data sharing is a complex project, is limited by a plurality of influence factors, and needs to analyze the core influence factors by deeply and comprehensively analyzing the influence factors. However, the traditional analysis method has certain disadvantages or short boards, so that the analysis of the influence factors of the cross-department government affair data sharing by adopting a new angle or method becomes the focus of research of a plurality of scholars.

Many scholars have made beneficial exploration in government cross-department data sharing from the perspective of theoretical and empirical research. In 1996 daves first conducted systematic research on government cross-department data sharing, conducted field investigations in new york state, and analyzed 173 government personnel' opinions on the interests and barriers to information sharing to present a theoretical framework for government cross-department data sharing based on organizational and policy aspects, whereas investigations conducted in the 90 s of the 20 th century failed to consider technical factors that facilitate electronic sharing of cross-department data. Landsberge and Wolken investigated five state federal and state government officers on the basis of the theoretical model proposed by Dawes, acquired the data of two cases (environmental reports and geographic information positioning service), proposed an extended government cross-department data sharing model, and intensively studied the influence of factors such as technical infrastructure, laws, management and policies. The previous research mainly focuses on starting from a conceptual model of the influencing factors, mainly searching the influencing factors, taking the factors revealing the positive influencing factors or hindering data sharing as research focuses, adopting a qualitative analysis method and proposing corresponding countermeasures. And the students mainly adopt a questionnaire survey mode to analyze the government affair data sharing influence factors, but questionnaire survey evaluation results have ambiguity, and the traditional research method cannot reflect the ambiguity relation among the factors, for example, a certain influence factor has a part of scorers giving high scores and another part of scorers giving low scores, so that the calculated average score or total score cannot well reflect the actual attitude of the scorers, and the strength relation among the factors cannot be well reflected.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a data processing method and a data processing system based on gray fuzzy clustering, namely, correlation among identification factors is analyzed through gray correlation, and a fuzzy clustering method is adopted to realize clustering of key factors. And according to the clustering result, establishing an attitude curve, analyzing different factors of the attitude curve, and further formulating a strategy capable of effectively promoting government data sharing.

The purpose of the invention is realized by adopting the following technical scheme:

a data processing method based on gray fuzzy clustering comprises the following steps: collecting data from each government department, wherein the collected data comprises evaluation data and fact data; data preprocessing: synthesizing the evaluation data to obtain comprehensive evaluation data; converting the fact data into fact evaluation data; the comprehensive evaluation data and the factual evaluation data constitute n evaluation factors and generate an evaluation vector X_k＝(x_k1,x_k2,…,x_kj,…x_km) Wherein x is_kmThe n evaluation vectors constitute an object matrix X as the influence degree of the factors_n×mNormalizing the matrix and calculating a correlation coefficient and a correlation degree; fuzzy clustering: obtaining a fuzzy similarity matrix R according to the relevance; obtaining a transfer packet t (R) according to a transfer closed packet method, and ordering elements in the t (R) from large to small, wherein the lambda belongs to [0,1]]And sequentially taking values from large to small according to the sequence of t (R) to obtain the truncations of different lambda levels.

Further, the data preprocessing step adopts

Normalizing said matrix, wherein x_ijIs a matrix X_n×mRow i and column j.

Further, the data preprocessing step is carried out by

Obtaining the correlation coefficient ξ_ijWherein x is₀＝(x₀₁,x₀₂,…,x_0j,…,x_0m) And selecting rho as the resolution according to a specific practical problem.

Further, the value of the resolution ρ is 0.5.

Further, the degree of association

i＝1,2,…,n，p_jThe weight of the jth influence level in the evaluation target.

Further, the fuzzy similarity matrix R ═ (R)_ij)_n×n，r_ij＝1-|r_i-r_j|，(i,j＝1,2,…n)。

Further, the transfer packet t (R) ═ R^kWherein R is^kοR^k＝R^k。

Further, the influence degree of the factors is divided into f grades of 1,2 … … f, the degree from 1 to f is changed from weak to strong, and the value of the influence degree is obtained according to the statistical frequency.

The invention also provides a data processing system based on gray fuzzy clustering, which comprises a data acquisition module, a data preprocessing module, a fuzzy clustering module and a result output module; the data acquisition module acquires data from each government department, and the acquired data comprises evaluation data and fact data; the data preprocessing module receives the evaluation data and the fact data output by the data acquisition module, and synthesizes the evaluation data to obtain comprehensive evaluation data; converting the fact data into fact evaluation data; the comprehensive evaluation data and the factual evaluation data constitute n evaluation factorsElement and generate an evaluation vector X_k＝(x_k1,x_k2,…,x_kj,…x_km) Wherein x is_kmThe n evaluation vectors constitute an object matrix X as the influence degree of the factors_n×mNormalizing the matrix and calculating a correlation coefficient and a correlation degree; the fuzzy clustering module obtains a fuzzy similarity matrix R according to the relevance; obtaining a transfer packet t (R) according to a transfer closed packet method, and ordering elements in the t (R) from large to small, wherein the lambda belongs to [0,1]]Sequentially taking values from large to small according to t (R) sequence to obtain truncation sets with different lambda levels; and the result output module outputs clusters according to the cut sets of different lambda levels.

The method decomposes a plurality of influence factors from each department, accurately identifies key factors from the plurality of influence factors, clusters each influence factor according to the standard to achieve the purpose of qualitative analysis, and identifies the influence degrees of different clusters through analyzing the influence degrees of the influence factors so as to achieve the purpose of quantitative analysis.

Drawings

FIG. 1 is a system for evaluating cross-department data influencing factors according to the present invention.

FIG. 2 is a comprehensive evaluation chart of grey correlation of government affair data sharing influence factors according to the invention;

FIG. 3 is a dynamic clustering chart of government data sharing influencing factors of the present invention;

FIG. 4 is an attitude curve highlighting the emphasis classes of the government data sharing influencing factors of the present invention;

FIG. 5 is an attitude curve for a general emphasis class of government data sharing influencing factors of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and the detailed description below:

a data processing method based on gray fuzzy clustering comprises the processes of data acquisition, data preprocessing and fuzzy clustering.

The data acquisition process refers to the data acquisition from each government department, and the acquired data comprises evaluation data and fact data; the evaluation data is an evaluation of the degree of influence of each department on a certain matter, and these evaluations are often not obtained from objective factual data, and need to be obtained by summarizing the work of each department or performing subjective evaluation according to actual needs, for example: the national legislation or the promotion of a specific item, the decision process of the project with democratics, the influence factors of the project cooperation participants and other member relations, and the like.

The fact data refers to business data existing in database of each department and government database, and the evaluation process can be obtained through calculation. For example, the factor "having sufficient financial investment" may be evaluated according to the effect of the factor "having sufficient financial investment" on the capital support requested by the department, the actual capital support, and the final completion of the business. For example, the degree of influence of quantitative indicators such as the academic degree, the level and quantity of the thesis, the final implementation effect and the like on the key project participants with leading technical ability can be evaluated.

Data preprocessing (Gray correlation analysis)

In the data preprocessing stage, the evaluation data can be integrated to obtain integrated evaluation data; the fact data is converted into fact evaluation data. The comprehensive evaluation data and the factual evaluation data constitute n evaluation factors.

The matrix of study objects may be established based on the evaluation factors. The number of the government affair data sharing influence factors is n, and each factor is evaluated by m grades, so that the influence factor set to be classified is U-U₁,u₂,…,u_k,…u_n}. Wherein each u_kIs a set of factor evaluation vectors, denoted X_k＝(x_k1,x_k2,…,x_kj,…x_km) And representing the influence degree grade set of the kth factor, wherein the grade set is obtained by processing the evaluation data or the fact data according to the invention in a calculation mode, namely, the grade influence is obtained by statistics or is obtained by an objective data calculation mode. Further, a study object matrix can be obtained:

and (5) a data standardization process. In order to reduce the problem of 'large number swallowing small number' caused by comparing data with different orders of magnitude, the data normalization is required to be carried out on the original data, and the data is normalized to the [0,1] interval. Adopting a normalization process on the jth column of the matrix X:

and calculating the correlation coefficient and the correlation degree. And (3) after the reference sequence and the evaluation object are subjected to data standardization, evaluating the correlation coefficient of the corresponding index between the index and the reference sequence according to the j level of the ith factor in the step (2).

Wherein the optimal reference sequence of the evaluation target index is x₀＝(x₀₁,x₀₂,…,x_0j,…,x_0m) Generally, it is selected according to the practical problem. ρ is resolution, and the value range is (0,1), generally 0.5.

When the relevance degree is calculated, the influence degrees of different indexes in a research object need to be considered comprehensively, so index weight, namely the weight of the influence level of each influence factor, is introduced, the relevance coefficients of all indexes of each factor can be integrated into one relevance degree, the relevance degree after the weight is considered can be obtained, and the relevance degree calculation formula is as shown in formula (3).

Wherein p is_jThe weight of the jth influence level in the evaluation object; r is_iAnd the weighted association degree of the ith influencing factor and the reference sequence is shown, and the larger the value, the more similar the sequence and the reference sequence is, namely the stronger the influence degree of the influencing factor is.

(II) fuzzy clustering

The clustering analysis is a mathematical method for objectively classifying objects according to the relationships among the objects, such as different characteristics, degree of affinity and sparseness, similarity and the like, and the mathematical basis of the method is multivariate analysis in mathematical statistics. Generally, the boundary between objects to be clustered is fuzzy, and the fuzzy clustering method is very suitable for being applied. The research result of the influence factors of the government affair data sharing also has the characteristic of ambiguity. Therefore, clustering analysis is to measure the degree of affinity and sparseness among the factors by using the similarity, so as to perform clustering. Firstly, a similar matrix needs to be established, and the essence of the similar matrix is to establish the similar relation between every two objects in the factor set to be evaluated. The relevance calculation result is adopted to establish a similarity matrix among the influencing factors.

And establishing a fuzzy similarity matrix. Establishing a similarity matrix among the influence factors according to the relevance calculated by the formula (3), expressing the difference among the influence factors by using Euclidean distance, and showing the similarity coefficient of the influence factors by the formula (4), thereby obtaining a fuzzy similarity matrix as follows: r ═ R (R)_ij)_n×n。

r_ij＝1-|r_i-r_j|，(i,j＝1,2,…n) (4)

Wherein r is more than or equal to 0_ij≤1，r_ijCloser to 1 indicates a higher similarity of the two factors.

And establishing a fuzzy equivalent matrix. The fuzzy similar matrix obtained by the formula (4) generally only has reflexibility and symmetry but not transitivity, and the fuzzy similar matrix needs to be transformed into a fuzzy equivalent matrix by adopting a transmission closed-packet method so that the matrix has transitivity. The transmission closed-packet method is to calculate R in turn²,R⁴,R⁸…, find k, let R^kοR^k＝R^kThe transitive closure of R is t (R) ═ R^k。

And (5) dynamic fuzzy clustering. And (3) sequencing elements in the t (R) from large to small, and sequentially taking values of lambda belonging to [0,1] according to the sequence of the t (R) from large to small to obtain intercepts with different lambda levels, thereby realizing dynamic clustering.

According to the data processing method of gray fuzzy clustering, the invention constructs a data processing system based on gray fuzzy clustering, which comprises a data acquisition module, a data preprocessing module, a fuzzy clustering module and a result output module; what is needed isThe data acquisition module acquires data from each government department, and the acquired data comprises evaluation data and fact data; the data preprocessing module receives the evaluation data and the fact data output by the data acquisition module, and synthesizes the evaluation data to obtain comprehensive evaluation data; converting the fact data into fact evaluation data; the comprehensive evaluation data and the factual evaluation data constitute n evaluation factors and generate an evaluation vector X_k＝(x_k1,x_k2,…,x_kj,…x_km) Wherein x is_kmThe n evaluation vectors constitute an object matrix X as the influence degree of the factors_n×mNormalizing the matrix and calculating a correlation coefficient and a correlation degree; the fuzzy clustering module obtains a fuzzy similarity matrix R according to the relevance; obtaining a transfer packet t (R) according to a transfer closed packet method, and ordering elements in the t (R) from large to small, wherein the lambda belongs to [0,1]]Sequentially taking values from large to small according to t (R) sequence to obtain truncation sets with different lambda levels; and the result output module outputs clusters according to the cut sets of different lambda levels. The system of the invention is capable of fully applying all of the methods of the invention described above.

Of course, the data processing method of gray fuzzy clustering according to the present invention can also be placed in a system or a memory, that is, the system or the memory contains an execution code, and the execution code can execute the data processing method of gray fuzzy clustering according to the present invention.

The invention also provides a specific embodiment of a data processing method and a system for gray fuzzy clustering.

The invention adopts the business data of each department of the government of a certain city and collects the evaluation data fed back by each department, and arranges six categories of 43 factors in total, wherein the influence degree of each factor is divided into seven grades of 1,2,3,4,5,6 and 7, and the degree is changed from 1 to 7. The evaluation data can directly specify and adopt 7 influence factor grades for evaluation in the evaluation process, and the fact data divides evaluation grades to which a certain data influence value range belongs according to the calculated data influence value and an expected range.

Influence factor set is U ═ U₁,u₂,…,u_k,…u₄₃}. Calculating each influence factor u in the survey statistical table_kEvaluation vector X of_k＝(x_k1,x_k2,x_k3,x_k4,x_k5,x_k6,x_k7). The influence factor evaluation matrix is X ═ X_ij]_43×7. The evaluation matrix X obtained by calculation is subjected to data normalization processing to obtain a normalization matrix of the influencing factors, as shown in table 2. It is desirable that the more

influential grades

5,6 and 7 are the better, the less

influential grades

1,2 and 3 are the better, and grade 4 belongs to the transition phase, conveniently taken as the median value. Reference sequence x was constructed in conjunction with the normalized data of Table 1₀The correlation between the factor to be evaluated and the reference sequence is compared, and the correlation is calculated by a formula (2); the evaluation grade is determined by expert consultation and by an analytic hierarchy process to be p ═ 0.01,0.02,0.03,0.04,0.25,0.30 and 0.35.

TABLE 1 influence factor normalized data for government affairs data sharing

The correlation was calculated from equation (3), and some of the results are shown in Table 2. The relevance is sorted according to the size, and the larger the value is, the higher the influence degree is, and the relevance can be regarded as a key factor. As shown in fig. 2.

TABLE 2 correlation coefficient and degree of correlation (partial data only)

And (3) calculating the similarity relation between every two influence factors according to the obtained relevance value formula (4) to construct a fuzzy similarity matrix R shown in table 3. And transforming R into a fuzzy equivalent matrix t (R) by adopting a transfer closure method, and the table 4 shows.

TABLE 3 fuzzy similarity matrix

TABLE 4 fuzzy equivalence matrix

Dynamic clustering of the influencing factors is realized by determining different levels of lambda values, and the dynamic clustering result is shown in fig. 3. As can be seen from the figure, different lambda values are selected, and clustering results with different classification and refinement degrees are formed. The larger the lambda value, the finer the classification. Combining the relevance calculation and the graph of fig. 2, when λ is 1, 43 factors are respectively in one class, the influence strength of each factor is ranked as shown in fig. 2, the influence degree of the factor 3 is strongest, and the influence degree of the factor 18 is weakest; when λ is 0.9849, the influencing factors can be divided into three categories: the influencing factors are as follows from strong to weak: the first type is {3, 11, 14, 2, 12}, the corresponding influence factors are { D3, O2, O5, D2, O3}, the second type is {32, 1, 13}, the corresponding influence factors are { OM3, D1, O4}, and the third type is: others; when λ is 0.9758, if the response factors are classified into two categories: the key factors are as follows: {3, 11, 14, 2, 12}, others are non-critical factors. Meanwhile, the dynamic clustering result is consistent with the correlation analysis result, the influence degree of each factor can be better reflected, and the method has great value for guiding practice.

In the construction process of government affair data sharing, all factors play a certain role in influence. The influence factors are divided into three categories to better make measures and policies so as to promote the construction of government affair data sharing. According to the result of gray fuzzy clustering, the influence factors can be divided into three categories: the influencing factors are as follows from strong to weak: emphasis classes D3, O2, O5, D2, O3, general emphasis classes OM3, D1, O4, and others. And constructing an attitude curve according to the scores corresponding to the influence factors in the classes, and proposing a corresponding strategy suggestion according to the attitude curve. As shown in fig. 4-5 below.

Various other modifications and changes may be made by those skilled in the art based on the above-described technical solutions and concepts, and all such modifications and changes should fall within the scope of the claims of the present invention.

Claims

1. A data processing method based on gray fuzzy clustering is characterized by comprising the following steps:

data acquisition: collecting data from each government department, wherein the collected data comprises evaluation data and fact data;

data preprocessing: synthesizing the evaluation data to obtain comprehensive evaluation data; converting the fact data into fact evaluation data; the comprehensive evaluation data and the factual evaluation data constitute n evaluation factors and generate an evaluation vector X_k＝(x_k1,x_k2,…,x_kj,…x_km) Wherein x is_kmThe n evaluation vectors constitute an object matrix X as the influence degree of the factors_n×mNormalizing the matrix and calculating a correlation coefficient and a correlation degree;

fuzzy clustering: obtaining a fuzzy similarity matrix R according to the relevance; and obtaining a transfer packet t (R) according to a transfer closed packet method, sequencing elements in the t (R) from large to small, and sequentially taking values from large to small according to the sequence of the t (R) by the lambda epsilon [0,1] to obtain the cut sets with different lambda levels.

2. The data processing method based on gray fuzzy clustering of claim 1, characterized in that: the data preprocessing step adopts

Normalizing said matrix, wherein x_ijIs a matrix X_n×mRow i and column j.

3. The data processing method based on gray fuzzy clustering of claim 1, characterized in that: the dataIn the pretreatment step, by

4. A data processing method based on gray fuzzy clustering as claimed in claim 3, characterized in that: the resolution p is 0.5.

5. The data processing method based on gray fuzzy clustering of claim 4, characterized in that: the degree of association

6. The data processing method based on gray fuzzy clustering of claim 5, characterized in that: the fuzzy similarity matrix R ═ (R)_ij)_n×n，r_ij＝1-|r_i-r_j|，(i,j＝1,2,…n)。

7. The data processing method based on gray fuzzy clustering of claim 6, characterized in that: the fuzzy similarity matrix R ═ (R)_ij)_n×n，r_ij＝1-|r_i-r_j|，(i,j＝1,2,…n)。

8. The data processing method based on gray fuzzy clustering of claim 7, wherein: the transfer packet t (R) ═ R^kWherein

9. A data processing method based on gray fuzzy clustering according to any of the claims 1-8, characterized in that: the influence degree of the factors is divided into f grades of 1,2 … … f, the degree from 1 to f is changed from weak to strong, and the value of the influence degree is obtained according to the statistical frequency.

10. A data processing system based on gray fuzzy clustering comprises a data acquisition module, a data preprocessing module, a fuzzy clustering module and a result output module; the method is characterized in that:

the data acquisition module acquires data from each government department, and the acquired data comprises evaluation data and fact data;

the data preprocessing module receives the evaluation data and the fact data output by the data acquisition module, and synthesizes the evaluation data to obtain comprehensive evaluation data; converting the fact data into fact evaluation data; the comprehensive evaluation data and the factual evaluation data constitute n evaluation factors and generate an evaluation vector X_k＝(x_k1,x_k2,…,x_kj,…x_km) Wherein x is_kmThe n evaluation vectors constitute an object matrix X as the influence degree of the factors_n×mNormalizing the matrix and calculating a correlation coefficient and a correlation degree;

the fuzzy clustering module obtains a fuzzy similarity matrix R according to the relevance; obtaining a transfer packet t (R) according to a transfer closed packet method, sequencing elements in the t (R) from large to small, and sequentially taking values from large to small according to the sequence of the t (R) by the lambda epsilon [0,1] to obtain cut sets with different lambda levels;

and the result output module outputs clusters according to the cut sets of different lambda levels.