CN106294542B - A kind of letters and calls data mining methods of marking and system - Google Patents

A kind of letters and calls data mining methods of marking and system Download PDF

Info

Publication number
CN106294542B
CN106294542B CN201610585288.2A CN201610585288A CN106294542B CN 106294542 B CN106294542 B CN 106294542B CN 201610585288 A CN201610585288 A CN 201610585288A CN 106294542 B CN106294542 B CN 106294542B
Authority
CN
China
Prior art keywords
data
letters
calls
mining
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610585288.2A
Other languages
Chinese (zh)
Other versions
CN106294542A (en
Inventor
张宗林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Contradiction Analysis And Research Center
Original Assignee
Beijing Contradiction Analysis And Research Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Contradiction Analysis And Research Center filed Critical Beijing Contradiction Analysis And Research Center
Priority to CN201610585288.2A priority Critical patent/CN106294542B/en
Publication of CN106294542A publication Critical patent/CN106294542A/en
Application granted granted Critical
Publication of CN106294542B publication Critical patent/CN106294542B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of method and system of letters and calls data mining scoring, wherein method includes:Step 1:Qualified letters and calls data are extracted from large database concept to be handled, and obtain being adapted in the mining data deposit mining data storehouse of data mining, all history letters and calls data are preserved in the large database concept;Step 2:At least one keyword is extracted to the mining data in mining data storehouse, feature extraction is carried out to mining data point based on each keyword, obtains the analytical table for each keyword;Step 3:Statistical analysis is carried out according to the mining data at least one analytical table, a weighted value for each keyword is obtained, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords.The present invention, which incorporates, is dispersed in each system and all letters and calls data isolated between each other, by establishing standards of grading, letters and calls data can be sorted out and be counted, and is easy to handle letters and calls data in next step.

Description

A kind of letters and calls data mining methods of marking and system
Technical field
The present invention relates to a kind of method and system of letters and calls data mining scoring, belong to field of computer technology.
Background technology
Letters and calls, refer to citizen, legal person or other tissues using letter, Email, fax, phone, the form such as visit, Report situations, advise, opinion or complaint request to people's governments at all levels, department of the people's government at or above the county level, according to The activity that method is handled by relevant administration.
Letters and calls be except it is exlex another solve problem method, be one kind than relatively straightforward articulation of interests form. The surge of the volume of letters in recent years has triggered a large amount of aggregations of letters and calls data, how to change into these letters and calls data multi-level, more The information and knowledge of dimension, the logic association of data behind is disclosed, so as to effectively solve letters and calls protrusion from policy aspect for government Contradiction, it is the major issue that letters and calls research field is faced.The depth analysis to letters and calls data is realized, is to solve this problem Prerequisite.
Our uses for letters and calls data remain in the layer that the top layers such as typing, inquiry, simple statistics are collected at present Face, profound logic association under covering in letters and calls data can not be found.And the logic association of these data behinds is just society The very crux of meeting contradiction, it is the important evidence that guide policy is worked out.
The content of the invention
The technical problems to be solved by the invention are that do not have unified large database concept for prior art, for letters and calls data It can not call as needed, and the deficiency that can not be solved in time to problem present in letters and calls data, there is provided a kind of letters and calls number According to the method and system for excavating scoring.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:A kind of method of letters and calls data mining scoring, including Following steps:
Step 1:Qualified letters and calls data are extracted from large database concept to be handled, and obtain the digging for being adapted to data mining Dig in data deposit mining data storehouse, all history letters and calls data are preserved in the large database concept;
Step 2:At least one keyword is extracted to the mining data in mining data storehouse, based on each keyword to excavating Data carry out feature extraction, obtain the analytical table for each keyword;
Step 3:Statistical analysis is carried out according to the mining data at least one analytical table, obtains being directed to each keyword A weighted value, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords.
The beneficial effects of the invention are as follows:The present invention, which incorporates, is dispersed in each system and all letters isolated between each other Visit data, automatic decimation pattern, association, change, abnormal and significant structure from letters and calls data, from increasing letters and calls Valuable knowledge is excavated in data, so as to reach with numeral reflection law contradiction, the purpose for the decision-making that advanced science with rule.This Letters and calls item comprehensive grading system in invention can predict in the recent period it is possible that too drastic letters and calls item and too drastic letters and calls people, To cause the attention of each relevant departments, social contradications prevention neutralizing is highly profitable.
On the basis of above-mentioned technical proposal, the present invention can also do following improvement.
Further, the letters and calls data to be prestored in the large database concept include mail, the electronics postal obtained by data acquisition Part, voice, video and the data such as visiting.
Further, extracting the process of letters and calls data in the step 1 from large database concept includes:
In large database concept when there is data to change, the mode of passage time stamp condition or Update log counts from big The data to be changed according to being extracted in storehouse, obtained data are qualified letters and calls data.
Further, processing of the step 1 to letters and calls data includes data scrubbing and data convert;
The data scrubbing obtains the letters and calls data scrubbing of extraction without the standard letters and calls data repeated;
The data, which become, changes commanders standard letters and calls data from transactional data conversion into the mining data of suitable data mining.
Further, the data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal is by letters and calls data The middle data for repeating typing remove;The standardized data item sorts the letters and calls data of multi-form typing according to unified standard Record, makes the data after processing be more easy to count;The denoising removes the noise data in letters and calls data.
Further, the process of data conversion includes smooth aggregation, Data generalization, standardization, Concept Hierarchies and discrete The operation such as change.
Further, the keyword in the step 2 include too drastic number, letters and calls number, letters and calls number, letters and calls approach number and Letters and calls are time-consuming etc..
Further, different keyword roots obtain the percentage with integrally scoring according to each self-corresponding weighted value in the step 3 Than, by percentage corresponding to all keywords by from big to small sort after establish comprehensive grading standard;Wherein described weighted value is got over Big percentage is bigger.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:A kind of system of letters and calls data mining scoring, including:
Abstraction module, qualified letters and calls data are extracted from large database concept and are handled, obtain being adapted to data mining Mining data deposit mining data storehouse in, all history letters and calls data are preserved in the large database concept;
Module is excavated, at least one keyword is extracted to the mining data in mining data storehouse, based on each keyword pair Mining data carries out feature extraction, obtains the analytical table for each keyword;
Standard establishes module, carries out statistical analysis according to the mining data at least one analytical table, obtains for every One weighted value of individual keyword, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords.
On the basis of above-mentioned technical proposal, the present invention can also do following improvement.
Further, the letters and calls data to be prestored in the large database concept include mail, the electronics postal obtained by data acquisition Part, voice, video and the data such as visiting.
Further, extracting the process of letters and calls data in the abstraction module from large database concept includes:
In large database concept when there is data to change, the mode of passage time stamp condition or Update log counts from big The data to be changed according to being extracted in storehouse, obtained data are qualified letters and calls data.
Further, processing of the abstraction module to letters and calls data includes data scrubbing and data convert;
The data scrubbing obtains the letters and calls data scrubbing of extraction without the standard letters and calls data repeated;
The data, which become, changes commanders standard letters and calls data from transactional data conversion into the mining data of suitable data mining.
Further, the data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal is by letters and calls data The middle data for repeating typing remove;The standardized data item sorts the letters and calls data of multi-form typing according to unified standard Record, makes the data after processing be more easy to count;The denoising removes the noise data in letters and calls data.
Further, the process of data conversion includes smooth aggregation, Data generalization, standardization, Concept Hierarchies and discrete The operation such as change.
Further, the keyword excavated in module includes too drastic number, letters and calls number, letters and calls number, letters and calls approach Number and letters and calls are time-consuming etc..
Further, the standard is established different keyword roots in module and obtained and overall scoring according to each self-corresponding weighted value Percentage, by percentage corresponding to all keywords by from big to small sort after establish comprehensive grading standard;Wherein described power Weight values are bigger, and percentage is bigger.
Brief description of the drawings
Fig. 1 is a kind of method flow diagram of letters and calls data mining scoring described in the embodiment of the present invention 1;
Fig. 2 is a kind of system structure diagram of letters and calls data mining scoring described in the embodiment of the present invention 2.
In accompanying drawing, the list of parts representated by each label is as follows:
1st, abstraction module, 2, excavate module, 3, standard establish module.
Embodiment
The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the present invention.
As shown in figure 1, be a kind of method of letters and calls data mining scoring described in the embodiment of the present invention 1, including following step Suddenly:
Step 1:Qualified letters and calls data are extracted from large database concept to be handled, and obtain the digging for being adapted to data mining Dig in data deposit mining data storehouse, all history letters and calls data are preserved in the large database concept;
Step 2:At least one keyword is extracted to the mining data in mining data storehouse, based on each keyword to excavating Data carry out feature extraction, obtain the analytical table for each keyword;
Step 3:Statistical analysis is carried out according to the mining data at least one analytical table, obtains being directed to each keyword A weighted value, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords.
The letters and calls data to be prestored in the large database concept include by data acquisition acquisition mail, Email, voice, Video and the data such as visiting.
Extracting the process of letters and calls data in the step 1 from large database concept includes:
In large database concept when there is data to change, the mode of passage time stamp condition or Update log counts from big The data to be changed according to being extracted in storehouse, obtained data are qualified letters and calls data.
Processing of the step 1 to letters and calls data includes data scrubbing and data convert;
The data scrubbing obtains the letters and calls data scrubbing of extraction without the standard letters and calls data repeated;
The data, which become, changes commanders standard letters and calls data from transactional data conversion into the mining data of suitable data mining.
The data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal will repeat in letters and calls data The data of typing remove;The standardized data item by the letters and calls data of multi-form typing according to unified standard order recording, The data after processing are made to be more easy to count;The denoising removes the noise data in letters and calls data.
The process of the data conversion includes the behaviour such as smooth aggregation, Data generalization, standardization, Concept Hierarchies and discretization Make.
Keyword in the step 2 includes too drastic number, letters and calls number, letters and calls number, letters and calls approach number and letters and calls consumption When etc..
The percentage that different keyword roots are obtained and integrally scored according to each self-corresponding weighted value in the step 3, by institute Have percentage corresponding to keyword by from big to small sort after establish comprehensive grading standard;Wherein described weighted value is bigger by shared hundred Divide ratio bigger.
As shown in Fig. 2 be a kind of system of letters and calls data mining scoring described in the embodiment of the present invention 2, including:
Abstraction module 1, qualified letters and calls data are extracted from large database concept and are handled, obtain being adapted to data mining Mining data deposit mining data storehouse in, all history letters and calls data are preserved in the large database concept;
Module 2 is excavated, at least one keyword is extracted to the mining data in mining data storehouse, based on each keyword pair Mining data carries out feature extraction, obtains the analytical table for each keyword;
Standard establishes module 3, carries out statistical analysis according to the mining data at least one analytical table, obtains for every One weighted value of individual keyword, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords.
The letters and calls data to be prestored in the large database concept include by data acquisition acquisition mail, Email, voice, Video and the data such as visiting.
Extracting the process of letters and calls data in the abstraction module 1 from large database concept includes:
In large database concept when there is data to change, the mode of passage time stamp condition or Update log counts from big The data to be changed according to being extracted in storehouse, obtained data are qualified letters and calls data.
Processing of the abstraction module 1 to letters and calls data includes data scrubbing and data convert;
The data scrubbing obtains the letters and calls data scrubbing of extraction without the standard letters and calls data repeated;
The data, which become, changes commanders standard letters and calls data from transactional data conversion into the mining data of suitable data mining.
The data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal will repeat in letters and calls data The data of typing remove;The standardized data item by the letters and calls data of multi-form typing according to unified standard order recording, The data after processing are made to be more easy to count;The denoising removes the noise data in letters and calls data.
The process of the data conversion includes the behaviour such as smooth aggregation, Data generalization, standardization, Concept Hierarchies and discretization Make.
The keyword excavated in module 2 includes too drastic number, letters and calls number, letters and calls number, letters and calls approach number and letter Visit time-consuming etc..
The standard establishes the percentage that different keyword roots are obtained and integrally scored according to each self-corresponding weighted value in module 3 Than, by percentage corresponding to all keywords by from big to small sort after establish comprehensive grading standard;Wherein described weighted value is got over Big percentage is bigger.
The system combination that the present invention is scored by a kind of letters and calls data mining of proposition is dispersed in each system and each Between individual business and mutually isolated all letters and calls data into large database concept, including:Beijing's letters and calls comprehensive office system Letter, visit to city, anon-normal frequentation, State Bureau visits and the Email of mayor's mailbox;By data acquisition platform from Beijing City's letters and calls comprehensive office system, mayor's mailbox system extraction letter, visit to city, anon-normal frequentation, State Bureau visit and mayor's mailbox Letters and calls number of packages evidence, data acquisition platform, which possesses, to be extracted letters and calls data, cleaning letters and calls data, is loaded into letters and calls data to data excavation storehouse Function.
By the integration process to all letters and calls data, a series of new letters and calls concepts have therefrom been extracted, including:Letters and calls Item and letters and calls people, too drastic letters and calls item, too drastic letters and calls people, first aggressive behavior, repetition aggressive behavior etc..
Incidence relation will be set up between all letters and calls data by data mining and intellectual analysis, and it is numerous and disorderly from these The identical letters and calls item of multiservice system extracting data of complexity, identical letters and calls people;Identifying the key feature of same letters and calls people is Name, address, identification card number (possible nothing), the key feature for identifying same letters and calls item are that letters and calls part sentences re-mark, letters and calls part Reference identification, letters and calls people and synopsis information.
Key feature is extracted for letters and calls item:Letters and calls number, the average number of letters and calls, letters and calls time, aggressive behavior occur Time, with the presence or absence of aggressive behavior, classifying content, letters and calls purpose, affiliated area, average age etc., for letters and calls people and letters and calls The advanced row data signature analysis of key feature of item, data characteristics are analyzed essentially according to classifying content, hot issue, institute possession Area, average age, income stratum, aggressive behavior whether occurs, whether colony's letters and calls, colony's letters and calls grade (are divided by letters and calls numbers Level), repeat the dimensions such as letters and calls grade (being classified according to letters and calls number) and be combined analysis, analysis indexes mainly have the volume of letters and Shi Shouli rates, rate, timely rate of reply are finished in time, combining multiple dimensions, analysis finds data characteristics, data mining also pin together The colony's letters and calls paid close attention to and aggressive behavior letters and calls event are carried out with deep data characteristics analysis, signature analysis causes me Grasped the essential characteristic of letters and calls data and related profound data statistic analysis result.
After having basic insight to the data characteristics of letters and calls data, we are targetedly to letters and calls total amount, colony The volume of letters, repeat the volume of letters, the data characteristics that aggressive behavior the volume of letters this several class pay close attention to data has done correlation analysis, slap This few class data volumes and promptly accepting rate, timely rate of reply held, have finished between rate, average age, income stratum (annual income) Dependency relation.
By multiple comparison, sampling, the experiment to these letters and calls data, it is established that letters and calls item comprehensive grading standard bodies System, realizes a comprehensive grading to letters and calls item and letters and calls people, the features such as according to the order of severity of letters and calls item, urgency level Extract letters and calls item, the letters and calls people that emphasis need to be paid close attention to.
According to data mining above and intellectual analysis process and letters and calls core business demand, we have grasped letters and calls item And the data characteristics and correlation statistical analysis situation of letters and calls people, and the whether too drastic of letters and calls item is recognized according to correlation analysis Behavior, colony's letters and calls rank, repeat letters and calls number rank, which hot issue be characterized in positive correlation or negatively correlated with, from And excavate and form the letters and calls item order of severity, the core feature of urgency level height correlation, and according to the correlation of these features Degree analysis draws COMPREHENSIVE CALCULATING each weight, finally draws a calculating letters and calls item comprehensive grading system standard.
As shown in table 1, obtained comprehensive grading standard is shown with specific example, wherein the comprehensive grading of each letters and calls item Full marks are 100 points, and using bonus point algorithm, basis point is 0 point, specific bonus point item.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims (2)

  1. A kind of 1. method of letters and calls data mining scoring, it is characterised in that comprise the following steps:
    Step 1:Qualified letters and calls data are extracted from large database concept to be handled, and obtain the excavation number for being adapted to data mining According in deposit mining data storehouse, all history letters and calls data are preserved in the large database concept;
    Step 2:At least one keyword is extracted to the mining data in mining data storehouse, based on each keyword to mining data Feature extraction is carried out, obtains the analytical table for each keyword;
    Step 3:Statistical analysis is carried out according to the mining data at least one analytical table, obtains one for each keyword Individual weighted value, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords;
    The letters and calls data to be prestored in the large database concept include mail, Email, voice, the video obtained by data acquisition And visiting data;
    Extracting the process of letters and calls data in the step 1 from large database concept includes:When there is data to become in large database concept During change, the mode of passage time stamp condition or Update log extracts the data to change from large database concept, obtained number According to for qualified letters and calls data;
    Processing of the step 1 to letters and calls data includes data scrubbing and data convert;The data scrubbing is by the letters and calls of extraction Data scrubbing is obtained without the standard letters and calls data repeated;The data become change commanders standard letters and calls data from transactional data conversion into It is adapted to the mining data of data mining;
    The data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal will repeat typing in letters and calls data Data remove;The letters and calls data of multi-form typing according to unified standard order recording, are made place by the standardized data item Data after reason are more easy to count;The denoising removes the noise data in letters and calls data;
    The process of the data conversion includes smooth aggregation, Data generalization, standardization, Concept Hierarchies and discretization operations;
    Keyword in the step 2 takes including too drastic number, letters and calls number, letters and calls number, letters and calls approach number and letters and calls;
    The percentage that different keyword roots are obtained and integrally scored according to each self-corresponding weighted value in the step 3, institute is relevant Percentage corresponding to keyword by from big to small sort after establish comprehensive grading standard;The wherein described bigger percentage of weighted value It is bigger.
  2. A kind of 2. system of letters and calls data mining scoring, it is characterised in that including:Abstraction module, symbol is extracted from large database concept The letters and calls data of conjunction condition are handled, and obtain being adapted in the mining data deposit mining data storehouse of data mining, the big number According to preserving all history letters and calls data in storehouse;Module is excavated, at least one key is extracted to the mining data in mining data storehouse Word, feature extraction is carried out to mining data based on each keyword, obtains the analytical table for each keyword;Standard establishes mould Block, statistical analysis is carried out according to the mining data at least one analytical table, obtains a weight for each keyword Value, comprehensive grading standard is established based on each self-corresponding weighted value of different keywords;
    The letters and calls data to be prestored in the large database concept include mail, Email, voice, the video obtained by data acquisition And visiting data;
    Extracting the process of letters and calls data in the abstraction module from large database concept includes:
    In large database concept when there is data to change, the mode of passage time stamp condition or Update log is from large database concept Middle to extract the data to change, obtained data are qualified letters and calls data;
    Processing of the abstraction module to letters and calls data includes data scrubbing and data convert;The data scrubbing is by the letter of extraction Data scrubbing is visited to obtain without the standard letters and calls data repeated;The data become standard letters and calls data of changing commanders from transactional data conversion Into the mining data of suitable data mining;
    The data scrubbing includes duplicate removal, standardized data item and denoising operation, and the duplicate removal will repeat typing in letters and calls data Data remove;The letters and calls data of multi-form typing according to unified standard order recording, are made place by the standardized data item Data after reason are more easy to count;The denoising removes the noise data in letters and calls data;
    The process of the data conversion includes smooth aggregation, Data generalization, standardization, Concept Hierarchies and discretization operations;
    Keyword in the excavation module includes too drastic number, letters and calls number, letters and calls number, letters and calls approach number and letters and calls consumption When;
    The standard establishes the percentage that different keyword roots are obtained and integrally scored according to each self-corresponding weighted value in module, will Percentage corresponding to all keywords by from big to small sort after establish comprehensive grading standard;Wherein described weighted value is bigger shared Percentage is bigger.
CN201610585288.2A 2016-07-25 2016-07-25 A kind of letters and calls data mining methods of marking and system Active CN106294542B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610585288.2A CN106294542B (en) 2016-07-25 2016-07-25 A kind of letters and calls data mining methods of marking and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610585288.2A CN106294542B (en) 2016-07-25 2016-07-25 A kind of letters and calls data mining methods of marking and system

Publications (2)

Publication Number Publication Date
CN106294542A CN106294542A (en) 2017-01-04
CN106294542B true CN106294542B (en) 2018-03-30

Family

ID=57652139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610585288.2A Active CN106294542B (en) 2016-07-25 2016-07-25 A kind of letters and calls data mining methods of marking and system

Country Status (1)

Country Link
CN (1) CN106294542B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107527289B (en) * 2017-08-25 2021-08-06 上海优扬新媒信息技术有限公司 Investment portfolio industry configuration method, device, server and storage medium
CN110717045A (en) * 2019-10-15 2020-01-21 同方知网(北京)技术有限公司 Letter element automatic extraction method based on letter overview
CN112819352A (en) * 2021-02-07 2021-05-18 神彩科技股份有限公司 Environment data processing method and device, electronic equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177009A (en) * 2011-12-22 2013-06-26 苏州威世博知识产权服务有限公司 Method and system of supporting automatic update of patent information
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN105138558A (en) * 2015-07-22 2015-12-09 山东大学 User access content-based real-time personalized information collection method
CN105701084A (en) * 2015-12-28 2016-06-22 广东顺德中山大学卡内基梅隆大学国际联合研究院 Characteristic extraction method of text classification on the basis of mutual information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177009A (en) * 2011-12-22 2013-06-26 苏州威世博知识产权服务有限公司 Method and system of supporting automatic update of patent information
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method
CN105138558A (en) * 2015-07-22 2015-12-09 山东大学 User access content-based real-time personalized information collection method
CN105701084A (en) * 2015-12-28 2016-06-22 广东顺德中山大学卡内基梅隆大学国际联合研究院 Characteristic extraction method of text classification on the basis of mutual information

Also Published As

Publication number Publication date
CN106294542A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
CN103024746B (en) System and method for processing spam short messages for telecommunication operator
Rowe et al. Automated social hierarchy detection through email network analysis
CN103744928B (en) A kind of network video classification method based on history access record
CN103927398B (en) The microblogging excavated based on maximum frequent itemsets propagandizes colony's discovery method
CN106294542B (en) A kind of letters and calls data mining methods of marking and system
CN101674264B (en) Spam detection device and method based on user relationship mining and credit evaluation
CN107086952A (en) A kind of Bayesian SPAM Filtering method based on TF IDF Chinese word segmentations
Katirai et al. Filtering junk e-mail
CN106131017A (en) Cloud computing information security visualization system based on trust computing
CN103580919B (en) A kind of method and system that mail user mark is carried out using mail server daily record
CN109284626A (en) Random forests algorithm towards difference secret protection
CN103037339A (en) Short message filtering method based on user creditworthiness and short message spam degree
CN102945246B (en) The disposal route of network information data and device
CN109919436A (en) A kind of promise breaking user's probability forecasting method based on sparse features insertion
CN108647730A (en) A kind of data partition method and system based on historical behavior co-occurrence
CN107403007A (en) A kind of method of network Twitter message reliability discriminant model
CN107844914A (en) Risk management and control system and implementation method based on group management
Leão et al. Evolutionary patterns in the geographic range size of Atlantic Forest plants
CN109783805A (en) A kind of network community user recognition methods and device
CN108090787A (en) A kind of call bill data depth based on Apriori algorithm is excavated and the method for user's behavior prediction
CN110611655A (en) Blacklist screening method and related product
CN108920694A (en) A kind of short text multi-tag classification method and device
CN106557983B (en) Microblog junk user detection method based on fuzzy multi-class SVM
Mishra et al. Analysis of random forest and Naive Bayes for spam mail using feature selection categorization
CN104156228B (en) A kind of embedded feature database of client filtering short message and update method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant