CN102819576A - Data mining method and system based on microblog - Google Patents
Data mining method and system based on microblog Download PDFInfo
- Publication number
- CN102819576A CN102819576A CN2012102546853A CN201210254685A CN102819576A CN 102819576 A CN102819576 A CN 102819576A CN 2012102546853 A CN2012102546853 A CN 2012102546853A CN 201210254685 A CN201210254685 A CN 201210254685A CN 102819576 A CN102819576 A CN 102819576A
- Authority
- CN
- China
- Prior art keywords
- data
- microblogging
- module
- machine learning
- knowledge base
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention provides a data mining method and a data mining system based on a microblog, relating to food service industries. The method comprises the following courses: a training course and a judgment course, wherein the training course comprises the following steps: performing text preprocessing including word segmentation, feature extraction and the like on microblog sample data as input of a machine learning algorithm depending on a knowledge base system, and creating a classifier in accordance with the machine learning algorithm and informing the classifier of classification judgment standards; and the judgment course comprises the following steps: performing text preprocessing on the microblog data and performing preprocessing including word segmentation and feature extraction, etc. on the microblog data depending on the knowledge base system, and sending the preprocessed microblog data to the classifier and receiving good or bad assessment results returned by the classifier. The invention also provides the data mining system based on the microblog.
Description
Technical field
The present invention relates to the data mining technology field, relate in particular to a kind of data digging method and system based on microblogging.
Background technology
At catering industry, at present not based on the microblogging data mining, the product that provides data to support for enterprise management decision-making or consumer spending decision-making.
Summary of the invention
The technical matters that the present invention solves is how to provide the evaluation information of product or service.
In order to overcome the above problems, the embodiment of the invention provides a kind of data digging method based on microblogging, comprises following process:
Training process: rely on the knowledge base system, the microblogging sample data is done text pre-service work such as participle, feature extraction, as the input of machine learning algorithm,, create sorter then, and tell the sorter standard that classification is judged through machine learning algorithm;
Deterministic process: rely on the knowledge base system, the microblogging data are carried out the text pre-service, the microblogging data are done pre-service work such as participle, feature extraction, pretreated microblogging data are issued sorter, and receive favorable comment or the poor result that comments that sorter returns.
Further, as preferred version, knowledge body storehouse system is multi-level tree-like knowledge base system.
Further, as preferred version, machine learning algorithm is the expansion bayesian algorithm.
The embodiment of the invention also provides a kind of data digging system based on microblogging; Comprise: training module: rely on knowledge base system module,, the microblogging sample data is done text pre-service work such as participle, feature extraction earlier through the first text pre-processing module; Then as the input of machine learning module; Through machine learning algorithm, create classifier modules, and tell the standard that the sorter sort module is judged;
Judge module: rely on the knowledge base system; Earlier through the second text pre-processing module; The microblogging data are carried out the text pre-service; The microblogging data are done pre-service work such as participle, feature extraction, pretreated microblogging data are issued classifier modules, and favorable comment or difference that the reception classifier modules is returned comment the result to show at display terminal;
Knowledge base system module: be that the first text pre-processing module, the second text pre-processing module and machine learning module provide data.
Owing to adopted the microblogging data mining technology, the evaluation information of product or service is provided, help the relative merits of food and beverage enterprise's discovery self product or service, for enterprise management decision-making provides the data support.
Description of drawings
When combining accompanying drawing to consider; Through with reference to following detailed, can more completely understand the present invention better and learn wherein many attendant advantages easily, but accompanying drawing described herein is used to provide further understanding of the present invention; Constitute a part of the present invention; Illustrative examples of the present invention and explanation thereof are used to explain the present invention, do not constitute to improper qualification of the present invention, wherein:
Fig. 1 is a method for digging embodiment process flow diagram of the present invention;
Fig. 2 is a digging system embodiment block diagram of the present invention.
Embodiment
Followingly embodiments of the invention are described with reference to Fig. 1-2.
As shown in Figure 1, a kind of data digging method based on microblogging comprises following process:
S1, training process: rely on the knowledge base system, the microblogging sample data is done text pre-service work such as participle, feature extraction, as the input of machine learning algorithm,, create sorter then, and tell the sorter standard that classification is judged through machine learning algorithm;
S2, deterministic process: rely on the knowledge base system, the microblogging data are carried out the text pre-service, the microblogging data are done pre-service work such as participle, feature extraction, pretreated microblogging data are issued sorter, and receive favorable comment or the poor result that comments that sorter returns.
Knowledge body storehouse system is multi-level tree-like knowledge base system.Machine learning algorithm is the expansion bayesian algorithm.
As shown in Figure 2, a kind of data digging system based on microblogging comprises:
Training module 1: rely on knowledge base system module; Earlier through the first text pre-processing module 11; The microblogging sample data is done text pre-service work such as participle, feature extraction, then as the input of machine learning module 12, through machine learning algorithm; Create classifier modules 13, and tell the standard that sorter sort module 13 is judged;
Judge module 2: rely on the knowledge base system; Earlier through the second text pre-processing module 21; The microblogging data are carried out the text pre-service; The microblogging data are done pre-service work such as participle, feature extraction, pretreated microblogging data are issued classifier modules 13, and favorable comment or difference that reception classifier modules 13 is returned comment the result to show at display terminal 22;
Knowledge base system module 3: be that the first text pre-processing module 11, the second text pre-processing module 21 and machine learning module 12 provide data.
As stated, embodiments of the invention have been carried out explanation at length, but as long as not breaking away from inventive point of the present invention and effect in fact can have a lot of distortion, this will be readily apparent to persons skilled in the art.Therefore, such variation also all is included within protection scope of the present invention.
Claims (4)
1. data digging method based on microblogging; It is characterized in that, comprise following process: training process: rely on the knowledge base system, the microblogging sample data is done text pre-service work such as participle, feature extraction; Then as the input of machine learning algorithm; Through machine learning algorithm, create sorter, and tell the sorter standard that classification is judged;
Deterministic process: rely on the knowledge base system, the microblogging data are carried out the text pre-service, the microblogging data are done pre-service work such as participle, feature extraction, pretreated microblogging data are issued sorter, and receive favorable comment or the poor result that comments that sorter returns.
2. according to claim 1 based on the data digging method of microblogging, it is characterized in that said knowledge body storehouse system is multi-level tree-like knowledge base system.
3. according to claim 1 based on the data digging method of microblogging, it is characterized in that said machine learning algorithm is the expansion bayesian algorithm.
4. the data digging system based on microblogging is characterized in that, comprising:
Training module: rely on knowledge base system module; Earlier through the first text pre-processing module; The microblogging sample data is done text pre-service work such as participle, feature extraction, then as the input of machine learning module, through machine learning algorithm; Create classifier modules, and tell the standard that the sorter sort module is judged;
Judge module: rely on the knowledge base system; Earlier through the second text pre-processing module; The microblogging data are carried out the text pre-service; The microblogging data are done pre-service work such as participle, feature extraction, pretreated microblogging data are issued classifier modules, and favorable comment or difference that the reception classifier modules is returned comment the result to show at display terminal;
Knowledge base system module: be that the first text pre-processing module, the second text pre-processing module and machine learning module provide data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012102546853A CN102819576A (en) | 2012-07-23 | 2012-07-23 | Data mining method and system based on microblog |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012102546853A CN102819576A (en) | 2012-07-23 | 2012-07-23 | Data mining method and system based on microblog |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102819576A true CN102819576A (en) | 2012-12-12 |
Family
ID=47303687
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012102546853A Pending CN102819576A (en) | 2012-07-23 | 2012-07-23 | Data mining method and system based on microblog |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102819576A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530402A (en) * | 2013-10-23 | 2014-01-22 | 北京航空航天大学 | Method for identifying microblog key users based on improved Page Rank |
CN104615718A (en) * | 2015-02-05 | 2015-05-13 | 北京航空航天大学 | Hierarchical analysis method for social network emergency |
CN104616198A (en) * | 2015-02-12 | 2015-05-13 | 哈尔滨工业大学 | P2P (peer-to-peer) network lending risk prediction system based on text analysis |
CN105868193A (en) * | 2015-01-19 | 2016-08-17 | 富士通株式会社 | Device and method used to detect product relevant information in electronic text |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101556582A (en) * | 2008-04-09 | 2009-10-14 | 上海复旦光华信息科技股份有限公司 | System for analyzing and predicting netizen interest in forum |
CN102012985A (en) * | 2010-11-19 | 2011-04-13 | 国网电力科学研究院 | Sensitive data dynamic identification method based on data mining |
-
2012
- 2012-07-23 CN CN2012102546853A patent/CN102819576A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101556582A (en) * | 2008-04-09 | 2009-10-14 | 上海复旦光华信息科技股份有限公司 | System for analyzing and predicting netizen interest in forum |
CN102012985A (en) * | 2010-11-19 | 2011-04-13 | 国网电力科学研究院 | Sensitive data dynamic identification method based on data mining |
Non-Patent Citations (1)
Title |
---|
徐禾芳等: "《基于搜索引擎和数据挖掘的博客营销》", 《商场现代化》, no. 527, 31 January 2008 (2008-01-31) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103530402A (en) * | 2013-10-23 | 2014-01-22 | 北京航空航天大学 | Method for identifying microblog key users based on improved Page Rank |
CN105868193A (en) * | 2015-01-19 | 2016-08-17 | 富士通株式会社 | Device and method used to detect product relevant information in electronic text |
CN104615718A (en) * | 2015-02-05 | 2015-05-13 | 北京航空航天大学 | Hierarchical analysis method for social network emergency |
CN104615718B (en) * | 2015-02-05 | 2017-12-15 | 北京航空航天大学 | The Hierarchy Analysis Method of social networks accident |
CN104616198A (en) * | 2015-02-12 | 2015-05-13 | 哈尔滨工业大学 | P2P (peer-to-peer) network lending risk prediction system based on text analysis |
CN104616198B (en) * | 2015-02-12 | 2018-01-26 | 哈尔滨工业大学 | A kind of P2P network loan Risk Forecast Systems based on text analyzing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109255565B (en) | Address attribution identification and logistics task distribution method and device | |
CN109685052A (en) | Method for processing text images, device, electronic equipment and computer-readable medium | |
CN104536953B (en) | A kind of recognition methods of text emotional valence and device | |
CN104464733A (en) | Multi-scene managing method and device of voice conversation | |
WO2017000716A3 (en) | Image management method and device, and terminal device | |
WO2013014667A3 (en) | System and methods for computerized machine-learning based authentication of electronic documents including use of linear programming for classification | |
WO2018045241A3 (en) | Detection of anomalies in multivariate data | |
CN103955660A (en) | Method for recognizing batch two-dimension code images | |
CN102819576A (en) | Data mining method and system based on microblog | |
CN106886873A (en) | The conjunction folk prescription method and conjunction single system of a kind of e-commerce order | |
CN105205081A (en) | Article recommendation method and device | |
CN105404540A (en) | Robot remote upgrading method, system and remote server | |
IN2015DE02745A (en) | ||
CN104268134A (en) | Subjective and objective classifier building method and system | |
CN104166725A (en) | Phishing website detection method | |
CN105550253A (en) | Method and device for obtaining type relation | |
CN105704691A (en) | Scam short message recognition method and device | |
CN103093218B (en) | The method of automatic identification form types and device | |
Ando et al. | Globalization and domestic operations: Applying the JC/JD method to Japanese manufacturing firms | |
CN205038674U (en) | Logistics management system based on computer | |
CN102662962B (en) | Dynamic display method based on webpage elements | |
CN107992508B (en) | Chinese mail signature extraction method and system based on machine learning | |
CN103218420A (en) | Method and device for extracting page titles | |
CN103177374A (en) | Service recommending method and service recommending system | |
CN103853536B (en) | The method and apparatus that Service tracing is realized based on state transition diagram |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20121212 |