CN111723136A

CN111723136A - Single-dimensional clustering analysis method for classified and graded treatment of grid events

Info

Publication number: CN111723136A
Application number: CN201911143455.8A
Authority: CN
Inventors: 钱华; 姜永华; 钱建华; 王巧荣; 房查; 张宏斌
Original assignee: Jiangsu Fablesoft Co ltd; Political And Legal Committee Of Nantong Municipal Committee Of Communist Party Of China
Current assignee: Jiangsu Fablesoft Co ltd; Political And Legal Committee Of Nantong Municipal Committee Of Communist Party Of China
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2020-09-29

Abstract

The invention relates to a single-dimensional clustering analysis method for classified and graded treatment of grid events, which comprises the following steps: acquiring a grid event record; cleaning the acquired grid event record, and only keeping a grid event condition text; extracting semantic features of a grid event situation text by using a BERT model to generate a multi-dimensional feature vector; clustering the text data of the grid event condition by using a single-dimensional clustering algorithm; generating K grid event type clusters of different types, and storing the grid event type clusters into HBASE; and (6) identifying the type of the high-occurrence grid event. According to the scheme, the dispersed grid event records can be efficiently subjected to fusion analysis, the high-occurrence grid events are accurately subjected to key monitoring and early warning, and the efficiency of classified and graded disposal of the grid events is improved.

Description

Single-dimensional clustering analysis method for classified and graded treatment of grid events

Technical Field

The invention relates to an analysis method, in particular to a single-dimensional clustering analysis method for classified and graded treatment of grid events, and belongs to the technical field of grid event analysis.

Background

At present, the types of events facing the society of the whole society are various, and in order to analyze various events and disputes in a grid under a grid social governance mode, a basic grid worker is generally required to record and report various events such as public security, contradiction disputes, fire safety, food sanitation safety and the like existing in the grid.

However, the method of reporting by only depending on manual record is not only lack of accuracy judgment standard for grid event conditions, but also lack of fusion analysis for scattered event records, so that it is difficult to perform targeted key monitoring and early warning for high-occurrence grid event types in a grid area, and the efficiency of grid event handling is seriously affected. Therefore, a new solution to solve the above technical problems is urgently needed.

Disclosure of Invention

The invention provides a single-dimensional clustering analysis method for classified and graded treatment of grid events, aiming at the problems in the prior art, the scheme can efficiently perform fusion analysis on scattered grid event records, accurately perform key monitoring and early warning on high-occurrence grid events, and improve the efficiency of classified and graded treatment of the grid events.

In order to achieve the above object, a technical solution of the present invention is as follows, in which an efficiency of classified and hierarchical handling of grid events is improved, the method includes the following steps:

acquiring a grid event record;

cleaning the acquired grid event record, and only keeping a grid event condition text;

extracting semantic features of a grid event situation text by using a BERT model to generate a multi-dimensional feature vector;

clustering grid event condition data by using a single-dimensional clustering algorithm;

generating K grid event type clusters of different types, and storing the grid event type clusters into HBASE;

and (6) identifying the type of the high-occurrence grid event.

As an improvement of the present invention, the step (1) specifically comprises: grid event records are extracted from a multi-source database.

As an improvement of the present invention, the step (2) specifically comprises: and cleaning the extracted grid event record by using an ETL tool, removing fields such as an event occurrence place, an event occurrence time and the like, and only keeping a grid event situation text.

As an improvement of the present invention, the step (3) specifically comprises: and inputting a grid event situation text, calculating a weight value of the text by using a BERT model, and outputting a multi-dimensional semantic feature vector of the text.

As a refinement of the invention, said step (4) comprises the following sub-steps:

(41) calculating the similarity value simVal between any one preselected grid event situation text and the rest event situation texts; calculation procedure 1) using the calculation formula simVal ═ cX1+ dX2

C, presetting a value range: (0.8-0.9) c is a weight parameter

D, presetting a value range: (0.1-0.2) d is a weight parameter

X1: cosine value of vector included angle;

x2: vector Euclidean distance normalization value;

(42) generating N clusters containing all grid event situation texts similar to any one preselected grid event situation text according to the similarity value calculation result;

(43) selecting N pre-selected grid event condition texts corresponding to the N grid event condition text clusters as a central text;

(44) calculating a similarity value simVal between any two central texts in the N central texts;

calculation procedure 1) using the calculation formula simVal ═ cX1+ dX 2;

c is 0.9, and c is a weight parameter;

d is 0.1, and d is a weight parameter;

x1: cosine value of vector included angle;

x2: vector Euclidean distance normalization value;

(45) if any two central files are similar, deleting the smaller cluster of the two clusters containing any two central texts; if any two central files are not similar, two clusters containing any two central texts are reserved;

(46) outputting M clusters generated by the initial clustering;

(47) and carrying out secondary de-duplication on the output M clusters, wherein M is smaller than N.

As an improvement of the present invention, the step (6) is specifically to sort the K grid event type clusters of different types in a descending order, and identify the grid event type with high occurrence.

Compared with the prior art, the invention has the following technical effects: the method clusters a large amount of scattered grid event records by means of an improved clustering algorithm, can obtain similar grid event type clusters according to grid event situation texts, and realizes accurate identification of high-occurrence grid event types through statistical analysis, so that the high-occurrence grid event types can be subjected to key monitoring and early warning, and the occupation of large data computing resources for carrying out clustering algorithm operation is reduced; the scheme realizes the accurate identification of the type of the high-occurrence grid event, so that comprehensive treatment center personnel can carry out efficient classified and graded treatment on the grid event; compared with the existing clustering algorithm, the algorithm of the scheme is innovative, and the occupation of big data computing resources for bearing the running of the clustering algorithm is reduced.

Description of the drawings:

FIG. 1 is a flow chart of a single-dimensional clustering analysis method for classified and graded treatment of grid events according to the present invention;

FIG. 2 is a flow chart of the one-dimensional clustering algorithm of the present invention.

The specific implementation mode is as follows:

for the purpose of enhancing an understanding of the present invention, the present invention will be described in detail with reference to the following examples.

Example 1: referring to fig. 1 and 2, a method for single-dimensional cluster analysis for grid event classification and ranking treatment includes the following steps:

acquiring a grid event record;

clustering grid event data by using a single-dimensional clustering algorithm;

and (6) identifying the type of the high-occurrence grid event.

The step (1) is specifically as follows: extracting grid event records from a multi-source database;

the step (2) is specifically as follows: and cleaning the extracted grid event record by using an ETL tool, removing fields such as an event occurrence place, an event occurrence time and the like, and only keeping a grid event situation text.

The step (3) is specifically as follows: and inputting a grid event situation text, calculating a weight value of the text by using a BERT model, and outputting a multi-dimensional semantic feature vector of the text.

The step (4) comprises the following substeps:

(41) calculating the similarity value between any one preselected grid event situation text and the rest grid event situation texts by using a formula simVal ═ cX1+ dX 2;

calculation procedure 1) using the calculation formula simVal ═ cX1+ dX2

c is 0.9, and c is a weight parameter;

d is 0.1, and d is a weight parameter;

x1: cosine value of vector included angle;

x2: vector Euclidean distance normalization value;

(43) selecting N pre-selected grid event condition texts corresponding to the N grid event condition text clusters as a central text; m is less than N.

(44) Calculating a similarity value between any two of the N center texts using the formula simVal ═ cX1+ dX 2;

(46) outputting M clusters generated by the initial clustering, and arranging the M clusters according to a descending order;

(47) selecting the largest cluster of the M clusters as an initial cluster, and sequentially calculating the coincidence value of the remaining arbitrary cluster classes and the initial cluster according to a descending order

(48) Manually setting the contact ratio threshold value to be 0.1, if the contact ratio threshold value is larger than 0.1, considering that the cluster class is similar to the initial cluster, deleting the cluster class, and simultaneously returning to the step D7;

(49) if the cluster is smaller than the contact ratio threshold value, the cluster class is considered to be dissimilar to the initial cluster, the cluster is reserved, and the initial cluster is changed;

repeating steps (47), (48) and (49) until all remaining cluster classes are traversed.

Specifically, the step (6) is to arrange the grid event type clusters of the K different types in a descending order and identify the grid event type with high occurrence.

It should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and all equivalent modifications and substitutions based on the above-mentioned technical solutions are within the scope of the present invention as defined in the claims.

Claims

1. A single-dimensional clustering analysis method for classified and hierarchical treatment of grid events is characterized by comprising the following steps:

acquiring a grid event record;

and (6) identifying the type of the high-occurrence grid event.

2. The method for single-dimensional cluster analysis with grid event classification and ranking oriented processing according to claim 1, wherein the step (1) is specifically: grid event records are extracted from a multi-source database.

3. The method for single-dimensional cluster analysis with grid event classification and ranking oriented processing according to claim 1, wherein the step (2) is specifically: and cleaning the extracted grid event record by using an ETL tool, removing fields such as an event occurrence place, an event occurrence time and the like, and only keeping an event situation text.

4. The method for single-dimensional cluster analysis with grid event classification and ranking oriented processing according to claim 1, wherein the step (3) is specifically: and inputting a grid event situation text, calculating a weight value of the text by using a BERT model, and outputting a multi-dimensional semantic feature vector of the text.

5. The method for single-dimensional cluster analysis with hierarchical treatment for grid event oriented classification as claimed in claim 1, wherein said step (4) comprises the following sub-steps:

(41) calculating the similarity value simVal between any one preselected grid event situation text and the rest grid event situation texts; calculation procedure 1) using the calculation formula simVal ═ cX1+ dX2

C, presetting a value range: (0.8-0.9) c is a weight parameter;

d, presetting a value range: (0.1-0.2) d is a weight parameter;

x1: cosine value of vector included angle;

x2: vector Euclidean distance normalization value;

calculation procedure 1) using the calculation formula simVal ═ cX1+ dX2

c is 0.9, and c is a weight parameter;

d is 0.1, and d is a weight parameter;

x1: cosine value of vector included angle;

x2: vector Euclidean distance normalization value;

(46) outputting M clusters generated by the initial clustering;

6. The method for single-dimensional cluster analysis with classification and hierarchical processing oriented to grid events according to claim 1, wherein the step (6) is specifically to sort the grid type clusters of K different types in a descending order and identify the high-occurrence grid event type.