CN106294542B

CN106294542B - A kind of letters and calls data mining methods of marking and system

Info

Publication number: CN106294542B
Application number: CN201610585288.2A
Authority: CN
Inventors: 张宗林
Original assignee: Beijing Contradiction Analysis And Research Center
Current assignee: Beijing Contradiction Analysis And Research Center
Priority date: 2016-07-25
Filing date: 2016-07-25
Publication date: 2018-03-30
Anticipated expiration: 2036-07-25
Also published as: CN106294542A

Abstract

The present invention relates to a kind of method and system of letters and calls data mining scoring, wherein method includes：Step 1：Qualified letters and calls data are extracted from large database concept to be handled, and obtain being adapted in the mining data deposit mining data storehouse of data mining, all history letters and calls data are preserved in the large database concept；Step 2：At least one keyword is extracted to the mining data in mining data storehouse, feature extraction is carried out to mining data point based on each keyword, obtains the analytical table for each keyword；Step 3：Statistical analysis is carried out according to the mining data at least one analytical table, a weighted value for each keyword is obtained, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords.The present invention, which incorporates, is dispersed in each system and all letters and calls data isolated between each other, by establishing standards of grading, letters and calls data can be sorted out and be counted, and is easy to handle letters and calls data in next step.

Description

A kind of letters and calls data mining methods of marking and system

Technical field

The present invention relates to a kind of method and system of letters and calls data mining scoring, belong to field of computer technology.

Background technology

Letters and calls, refer to citizen, legal person or other tissues using letter, Email, fax, phone, the form such as visit, Report situations, advise, opinion or complaint request to people's governments at all levels, department of the people's government at or above the county level, according to The activity that method is handled by relevant administration.

Letters and calls be except it is exlex another solve problem method, be one kind than relatively straightforward articulation of interests form. The surge of the volume of letters in recent years has triggered a large amount of aggregations of letters and calls data, how to change into these letters and calls data multi-level, more The information and knowledge of dimension, the logic association of data behind is disclosed, so as to effectively solve letters and calls protrusion from policy aspect for government Contradiction, it is the major issue that letters and calls research field is faced.The depth analysis to letters and calls data is realized, is to solve this problem Prerequisite.

Our uses for letters and calls data remain in the layer that the top layers such as typing, inquiry, simple statistics are collected at present Face, profound logic association under covering in letters and calls data can not be found.And the logic association of these data behinds is just society The very crux of meeting contradiction, it is the important evidence that guide policy is worked out.

The content of the invention

The technical problems to be solved by the invention are that do not have unified large database concept for prior art, for letters and calls data It can not call as needed, and the deficiency that can not be solved in time to problem present in letters and calls data, there is provided a kind of letters and calls number According to the method and system for excavating scoring.

The technical scheme that the present invention solves above-mentioned technical problem is as follows：A kind of method of letters and calls data mining scoring, including Following steps：

Step 1：Qualified letters and calls data are extracted from large database concept to be handled, and obtain the digging for being adapted to data mining Dig in data deposit mining data storehouse, all history letters and calls data are preserved in the large database concept；

Step 2：At least one keyword is extracted to the mining data in mining data storehouse, based on each keyword to excavating Data carry out feature extraction, obtain the analytical table for each keyword；

Step 3：Statistical analysis is carried out according to the mining data at least one analytical table, obtains being directed to each keyword A weighted value, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords.

The beneficial effects of the invention are as follows：The present invention, which incorporates, is dispersed in each system and all letters isolated between each other Visit data, automatic decimation pattern, association, change, abnormal and significant structure from letters and calls data, from increasing letters and calls Valuable knowledge is excavated in data, so as to reach with numeral reflection law contradiction, the purpose for the decision-making that advanced science with rule.This Letters and calls item comprehensive grading system in invention can predict in the recent period it is possible that too drastic letters and calls item and too drastic letters and calls people, To cause the attention of each relevant departments, social contradications prevention neutralizing is highly profitable.

On the basis of above-mentioned technical proposal, the present invention can also do following improvement.

Further, the letters and calls data to be prestored in the large database concept include mail, the electronics postal obtained by data acquisition Part, voice, video and the data such as visiting.

Further, extracting the process of letters and calls data in the step 1 from large database concept includes：

In large database concept when there is data to change, the mode of passage time stamp condition or Update log counts from big The data to be changed according to being extracted in storehouse, obtained data are qualified letters and calls data.

Further, processing of the step 1 to letters and calls data includes data scrubbing and data convert；

The data scrubbing obtains the letters and calls data scrubbing of extraction without the standard letters and calls data repeated；

The data, which become, changes commanders standard letters and calls data from transactional data conversion into the mining data of suitable data mining.

Further, the data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal is by letters and calls data The middle data for repeating typing remove；The standardized data item sorts the letters and calls data of multi-form typing according to unified standard Record, makes the data after processing be more easy to count；The denoising removes the noise data in letters and calls data.

Further, the process of data conversion includes smooth aggregation, Data generalization, standardization, Concept Hierarchies and discrete The operation such as change.

Further, the keyword in the step 2 include too drastic number, letters and calls number, letters and calls number, letters and calls approach number and Letters and calls are time-consuming etc..

Further, different keyword roots obtain the percentage with integrally scoring according to each self-corresponding weighted value in the step 3 Than, by percentage corresponding to all keywords by from big to small sort after establish comprehensive grading standard；Wherein described weighted value is got over Big percentage is bigger.

The technical scheme that the present invention solves above-mentioned technical problem is as follows：A kind of system of letters and calls data mining scoring, including：

Abstraction module, qualified letters and calls data are extracted from large database concept and are handled, obtain being adapted to data mining Mining data deposit mining data storehouse in, all history letters and calls data are preserved in the large database concept；

Module is excavated, at least one keyword is extracted to the mining data in mining data storehouse, based on each keyword pair Mining data carries out feature extraction, obtains the analytical table for each keyword；

Standard establishes module, carries out statistical analysis according to the mining data at least one analytical table, obtains for every One weighted value of individual keyword, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords.

Further, extracting the process of letters and calls data in the abstraction module from large database concept includes：

Further, processing of the abstraction module to letters and calls data includes data scrubbing and data convert；

Further, the keyword excavated in module includes too drastic number, letters and calls number, letters and calls number, letters and calls approach Number and letters and calls are time-consuming etc..

Further, the standard is established different keyword roots in module and obtained and overall scoring according to each self-corresponding weighted value Percentage, by percentage corresponding to all keywords by from big to small sort after establish comprehensive grading standard；Wherein described power Weight values are bigger, and percentage is bigger.

Brief description of the drawings

Fig. 1 is a kind of method flow diagram of letters and calls data mining scoring described in the embodiment of the present invention 1；

Fig. 2 is a kind of system structure diagram of letters and calls data mining scoring described in the embodiment of the present invention 2.

In accompanying drawing, the list of parts representated by each label is as follows：

1st, abstraction module, 2, excavate module, 3, standard establish module.

Embodiment

The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the present invention.

As shown in figure 1, be a kind of method of letters and calls data mining scoring described in the embodiment of the present invention 1, including following step Suddenly：

The letters and calls data to be prestored in the large database concept include by data acquisition acquisition mail, Email, voice, Video and the data such as visiting.

Extracting the process of letters and calls data in the step 1 from large database concept includes：

Processing of the step 1 to letters and calls data includes data scrubbing and data convert；

The data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal will repeat in letters and calls data The data of typing remove；The standardized data item by the letters and calls data of multi-form typing according to unified standard order recording, The data after processing are made to be more easy to count；The denoising removes the noise data in letters and calls data.

The process of the data conversion includes the behaviour such as smooth aggregation, Data generalization, standardization, Concept Hierarchies and discretization Make.

Keyword in the step 2 includes too drastic number, letters and calls number, letters and calls number, letters and calls approach number and letters and calls consumption When etc..

The percentage that different keyword roots are obtained and integrally scored according to each self-corresponding weighted value in the step 3, by institute Have percentage corresponding to keyword by from big to small sort after establish comprehensive grading standard；Wherein described weighted value is bigger by shared hundred Divide ratio bigger.

As shown in Fig. 2 be a kind of system of letters and calls data mining scoring described in the embodiment of the present invention 2, including：

Abstraction module 1, qualified letters and calls data are extracted from large database concept and are handled, obtain being adapted to data mining Mining data deposit mining data storehouse in, all history letters and calls data are preserved in the large database concept；

Module 2 is excavated, at least one keyword is extracted to the mining data in mining data storehouse, based on each keyword pair Mining data carries out feature extraction, obtains the analytical table for each keyword；

Standard establishes module 3, carries out statistical analysis according to the mining data at least one analytical table, obtains for every One weighted value of individual keyword, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords.

Extracting the process of letters and calls data in the abstraction module 1 from large database concept includes：

Processing of the abstraction module 1 to letters and calls data includes data scrubbing and data convert；

The keyword excavated in module 2 includes too drastic number, letters and calls number, letters and calls number, letters and calls approach number and letter Visit time-consuming etc..

The standard establishes the percentage that different keyword roots are obtained and integrally scored according to each self-corresponding weighted value in module 3 Than, by percentage corresponding to all keywords by from big to small sort after establish comprehensive grading standard；Wherein described weighted value is got over Big percentage is bigger.

The system combination that the present invention is scored by a kind of letters and calls data mining of proposition is dispersed in each system and each Between individual business and mutually isolated all letters and calls data into large database concept, including：Beijing's letters and calls comprehensive office system Letter, visit to city, anon-normal frequentation, State Bureau visits and the Email of mayor's mailbox；By data acquisition platform from Beijing City's letters and calls comprehensive office system, mayor's mailbox system extraction letter, visit to city, anon-normal frequentation, State Bureau visit and mayor's mailbox Letters and calls number of packages evidence, data acquisition platform, which possesses, to be extracted letters and calls data, cleaning letters and calls data, is loaded into letters and calls data to data excavation storehouse Function.

By the integration process to all letters and calls data, a series of new letters and calls concepts have therefrom been extracted, including：Letters and calls Item and letters and calls people, too drastic letters and calls item, too drastic letters and calls people, first aggressive behavior, repetition aggressive behavior etc..

Incidence relation will be set up between all letters and calls data by data mining and intellectual analysis, and it is numerous and disorderly from these The identical letters and calls item of multiservice system extracting data of complexity, identical letters and calls people；Identifying the key feature of same letters and calls people is Name, address, identification card number (possible nothing), the key feature for identifying same letters and calls item are that letters and calls part sentences re-mark, letters and calls part Reference identification, letters and calls people and synopsis information.

Key feature is extracted for letters and calls item：Letters and calls number, the average number of letters and calls, letters and calls time, aggressive behavior occur Time, with the presence or absence of aggressive behavior, classifying content, letters and calls purpose, affiliated area, average age etc., for letters and calls people and letters and calls The advanced row data signature analysis of key feature of item, data characteristics are analyzed essentially according to classifying content, hot issue, institute possession Area, average age, income stratum, aggressive behavior whether occurs, whether colony's letters and calls, colony's letters and calls grade (are divided by letters and calls numbers Level), repeat the dimensions such as letters and calls grade (being classified according to letters and calls number) and be combined analysis, analysis indexes mainly have the volume of letters and Shi Shouli rates, rate, timely rate of reply are finished in time, combining multiple dimensions, analysis finds data characteristics, data mining also pin together The colony's letters and calls paid close attention to and aggressive behavior letters and calls event are carried out with deep data characteristics analysis, signature analysis causes me Grasped the essential characteristic of letters and calls data and related profound data statistic analysis result.

After having basic insight to the data characteristics of letters and calls data, we are targetedly to letters and calls total amount, colony The volume of letters, repeat the volume of letters, the data characteristics that aggressive behavior the volume of letters this several class pay close attention to data has done correlation analysis, slap This few class data volumes and promptly accepting rate, timely rate of reply held, have finished between rate, average age, income stratum (annual income) Dependency relation.

By multiple comparison, sampling, the experiment to these letters and calls data, it is established that letters and calls item comprehensive grading standard bodies System, realizes a comprehensive grading to letters and calls item and letters and calls people, the features such as according to the order of severity of letters and calls item, urgency level Extract letters and calls item, the letters and calls people that emphasis need to be paid close attention to.

According to data mining above and intellectual analysis process and letters and calls core business demand, we have grasped letters and calls item And the data characteristics and correlation statistical analysis situation of letters and calls people, and the whether too drastic of letters and calls item is recognized according to correlation analysis Behavior, colony's letters and calls rank, repeat letters and calls number rank, which hot issue be characterized in positive correlation or negatively correlated with, from And excavate and form the letters and calls item order of severity, the core feature of urgency level height correlation, and according to the correlation of these features Degree analysis draws COMPREHENSIVE CALCULATING each weight, finally draws a calculating letters and calls item comprehensive grading system standard.

As shown in table 1, obtained comprehensive grading standard is shown with specific example, wherein the comprehensive grading of each letters and calls item Full marks are 100 points, and using bonus point algorithm, basis point is 0 point, specific bonus point item.

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims

A kind of 1. method of letters and calls data mining scoring, it is characterised in that comprise the following steps：

Step 1：Qualified letters and calls data are extracted from large database concept to be handled, and obtain the excavation number for being adapted to data mining According in deposit mining data storehouse, all history letters and calls data are preserved in the large database concept；

Step 2：At least one keyword is extracted to the mining data in mining data storehouse, based on each keyword to mining data Feature extraction is carried out, obtains the analytical table for each keyword；

Step 3：Statistical analysis is carried out according to the mining data at least one analytical table, obtains one for each keyword Individual weighted value, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords；

The letters and calls data to be prestored in the large database concept include mail, Email, voice, the video obtained by data acquisition And visiting data；

Extracting the process of letters and calls data in the step 1 from large database concept includes：When there is data to become in large database concept During change, the mode of passage time stamp condition or Update log extracts the data to change from large database concept, obtained number According to for qualified letters and calls data；

Processing of the step 1 to letters and calls data includes data scrubbing and data convert；The data scrubbing is by the letters and calls of extraction Data scrubbing is obtained without the standard letters and calls data repeated；The data become change commanders standard letters and calls data from transactional data conversion into It is adapted to the mining data of data mining；

The data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal will repeat typing in letters and calls data Data remove；The letters and calls data of multi-form typing according to unified standard order recording, are made place by the standardized data item Data after reason are more easy to count；The denoising removes the noise data in letters and calls data；

The process of the data conversion includes smooth aggregation, Data generalization, standardization, Concept Hierarchies and discretization operations；

Keyword in the step 2 takes including too drastic number, letters and calls number, letters and calls number, letters and calls approach number and letters and calls；

The percentage that different keyword roots are obtained and integrally scored according to each self-corresponding weighted value in the step 3, institute is relevant Percentage corresponding to keyword by from big to small sort after establish comprehensive grading standard；The wherein described bigger percentage of weighted value It is bigger.
A kind of 2. system of letters and calls data mining scoring, it is characterised in that including：Abstraction module, symbol is extracted from large database concept The letters and calls data of conjunction condition are handled, and obtain being adapted in the mining data deposit mining data storehouse of data mining, the big number According to preserving all history letters and calls data in storehouse；Module is excavated, at least one key is extracted to the mining data in mining data storehouse Word, feature extraction is carried out to mining data based on each keyword, obtains the analytical table for each keyword；Standard establishes mould Block, statistical analysis is carried out according to the mining data at least one analytical table, obtains a weight for each keyword Value, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords；

The letters and calls data to be prestored in the large database concept include mail, Email, voice, the video obtained by data acquisition And visiting data；

Extracting the process of letters and calls data in the abstraction module from large database concept includes：

In large database concept when there is data to change, the mode of passage time stamp condition or Update log is from large database concept Middle to extract the data to change, obtained data are qualified letters and calls data；

Processing of the abstraction module to letters and calls data includes data scrubbing and data convert；The data scrubbing is by the letter of extraction Data scrubbing is visited to obtain without the standard letters and calls data repeated；The data become standard letters and calls data of changing commanders from transactional data conversion Into the mining data of suitable data mining；

The data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal will repeat typing in letters and calls data Data remove；The letters and calls data of multi-form typing according to unified standard order recording, are made place by the standardized data item Data after reason are more easy to count；The denoising removes the noise data in letters and calls data；

The process of the data conversion includes smooth aggregation, Data generalization, standardization, Concept Hierarchies and discretization operations；

Keyword in the excavation module includes too drastic number, letters and calls number, letters and calls number, letters and calls approach number and letters and calls consumption When；

The standard establishes the percentage that different keyword roots are obtained and integrally scored according to each self-corresponding weighted value in module, will Percentage corresponding to all keywords by from big to small sort after establish comprehensive grading standard；Wherein described weighted value is bigger shared Percentage is bigger.