CN103853821B - Method for constructing multiuser collaboration oriented data mining platform - Google Patents
Method for constructing multiuser collaboration oriented data mining platform Download PDFInfo
- Publication number
- CN103853821B CN103853821B CN201410059806.8A CN201410059806A CN103853821B CN 103853821 B CN103853821 B CN 103853821B CN 201410059806 A CN201410059806 A CN 201410059806A CN 103853821 B CN103853821 B CN 103853821B
- Authority
- CN
- China
- Prior art keywords
- data
- component
- user
- mining
- implement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method for constructing a multiuser collaboration oriented data mining platform. According to the method, flexible workflow and a multiuser collaboration mechanism are integrated, a working space oriented to the collaborated data mining of three kinds of user roles, namely data acquisition staff, data analysis staff and result auditing staff, is provided, and the whole work flow is realized by components comprising a data acquisition component, a data preprocessing component, a data modeling component, a result visualization display component and a model evaluation component. The flexible workflow formed by the components and arrows can be created and manipulated by different use roles in different user views in a drag-and-drop manner. Aiming at the complexity, namely continuous repeating, continuous revising and continuous iterating, of data mining, the method has the advantages that data mining work can be greatly simplified, and data can be prevented from leaking so as to guarantee the safety of the data.
Description
Technical field
The present invention relates to a kind of construction method of the data mining platform of integrated resiliency workflow, facing multiple users cooperation,
Belong to data mining technology field.
Background technology
Data mining (data mining) be a kind of from the history service data of magnanimity, carry through mathematical analysis pattern
Take out and contain in the process of potential information therein.Data mining is constantly repetition, constantly modification, a mistake for continuous iteration
Journey, main inclusion:Data acquisition, data prediction, data analysiss, result visualization show and the flow process such as model evaluation.At present,
Data mining is widely used in commercial fields such as bank, telecommunications, insurance, traffic, retails.
, there is problems with existing data mining platform:Lack revocable, can reform, preservable elasticity user's work
Make space so that user must settle at one go when carrying out data mining, bring inconvenience;Lack can change, can iteration, can
The procedure component of intermediate result output is so that user can not be best understood by and manipulate its data analysis process;Towards alone
The excavation mechanism at family so that user set data collector, data analyst, three roles of result audit crew,
Cannot be cooperated in whole analysis process, be also easy to lead to leaking of data and analysis result, cause problem of data safety.
Content of the invention
Goal of the invention:For problems of the prior art, the present invention provides one kind to be related to elastic working stream, multi-user
The construction method of the data mining platform of cooperation.
There is provided a kind of can be revoked, can reform, can preserve based on Web by the data mining platform that the inventive method builds
Elastic user workspace.In user workspace, data acquisition personnel can upload, update, delete data set;Data
Analysis personnel can set up and manipulate the data analysiss flow process of oneself;Result audit crew can carry out Result examination and
Reply.
Technical scheme:A kind of construction method of the data mining platform of facing multiple users cooperation, provides a kind of data-oriented
Collector, data analyst and the three kinds of user role cooperations of result audit crew carry out the work space of data mining, whole
Individual workflow is realized with component, including:Data acquisition component, data prediction component, data modeling component, result visualization
Display member and model evaluation component.Different user roles uses different Users, it is possible to use the mode of dragging is built
Stand and operate the data analysiss flow process of oneself, described data acquisition personnel carry out the upper of data by described data acquisition component
Pass, update and deletion action, described data analyst is pressed flow process order and utilized data prediction component, data modeling structure successively
Part, result visualization component and model evaluation component carry out the data such as data acquisition, data prediction, modeling, model evaluation and divide
Analysis operation, described result audit crew is entered to Result by described result visualization component in described user workspace
Row examines and gives an written reply.
Described user workspace is the pattern manipulation interface of a towed, including:Prioritizing component area and flow process
Create two, area part, described prioritizing component area is a series of region displaying type of extension data digging flow components,
Described flow process creates the region that area is that user sets up and manipulate data analysiss flow process.
Described data analysiss flow process is a kind of elastic working stream being made up of component and arrow.In the analysis of any one data
In flow process, during user can be adjusted execution parameter on component node, change flow performing direction at any time and derive
Between the operation such as operation result.
Data mining platform construction method comprises the following steps;
Step 1:Design and Implement data acquisition component.Carry out data acquisition in the following two cases:In data base
Gathered data and web upload mode gathered data.
Gathered data in data base, is connected by Java data base and realizes, and the data access of data mining platform is real
When be converted in data base corresponding data query.
Web uploads mode gathered data, by monitoring the data upload requests of web client, sets up client data
The socket of storage server connects, and reuses the file system that Java I/O stream writes a dataset into data storage server
In.
When two kinds of data acquisition components implement, all need to be saved in the metadata information of corresponding for data set data
In the data base of system, and externally provide unified access interface.
Step 2:Design and Implement data prediction component.Statistical analysiss are carried out to data set by R language, with figure
Mode is to the basic description information of user's demonstrating data collection;Encapsulation interpolation is filled up, is recorded the mathematical method removing data correction,
Processing data missing values are provided, process repeated data, process noise data and process the data prediction links such as abnormal data
User interface.
Step 3:Design and Implement data modeling component.By the encapsulation classification of R language, cluster, association and time serieses etc.
Data mining model;There is provided graphical interfaces interface to user setup corresponding model analysiss parameter.
Step 4:Design and Implement result visualization display member.By R language by data mining results and model evaluation
Result is presented to user in modes such as figure, lists;It is pushed to result audit crew during by Ajax polling technique by fructufy.
Step 5:Design and Implement model evaluation component.There is provided accurate rate, error rate and confusion matrix by using R language
Etc. multiple model evaluation methods;The user that model analysiss parameter and model metadata information are saved in system database is provided to connect
Mouthful.
Step 6:Design and Implement user workspace.Realize the pattern manipulation interface of a towed by JQuery,
Create two, area part including component prioritizing component area and flow process;Store User operation log by stack data structures,
The user interface provide revocation, reforming and saving workspace.
Step 7:Define and realize data digging flow.The data mining component being designed with step 1 to step 5 as node,
Define the workflow being made up of several nodes and arrow;There is provided adjustment node execution parameter, change flow performing direction and
Derive the user interfaces such as intermediate calculation results.
Step 8:Integrated and deployment Mining Platform.The data mining component that step 1 to step 5 is designed provides JSON form
Configuration interface, the user interface of the function of customizing Mining Platform in the way of editing configuration file is provided.
The present invention adopts technique scheme, has the advantages that:For the continuous repetition of data mining, constantly repair
Change, the complexity of continuous iteration, there is provided a kind of elastic data excacation space of facing multiple users cooperation.Not only can pole
Big simplification data mining work, be also prevented from data leaks it is ensured that the safety of data.
Brief description
Fig. 1 is the structural principle block diagram of the facing multiple users data mining platform of the embodiment of the present invention.
Specific embodiment
With reference to specific embodiment, it is further elucidated with the present invention it should be understood that these embodiments are merely to illustrate the present invention
Rather than restriction the scope of the present invention, after having read the present invention, the various equivalences to the present invention for the those skilled in the art
The modification of form all falls within the application claims limited range.
In the embodiment of the present invention, data mining platform construction method comprises the following steps;
Step 1:Design and Implement data acquisition component.For the big quantization (volume) of data set, variation
(variety) and the complex characteristics such as rapid (velocity), it is divided into following two situations to implement:Data base gathers
Data and web upload mode gathered data.
Gathered data in data base, connects (JDBC) by Java data base and realizes, by the data of data mining platform
Access and be converted in data base corresponding data query SQL in real time.
Web uploads mode gathered data, by monitoring the data upload requests of web client, sets up client data
The socket of storage server connects, and reuses the file system that Java I/O stream writes a dataset into data storage server
In.
When two kinds of data acquisition components implement, all need to be saved in the metadata information of corresponding for data set data
In the data base of system, and externally provide unified access interface.
Step 2:Design and Implement data prediction component.Statistical analysiss are carried out to data set by R language, with figure
Mode is to the basic description information of user's demonstrating data collection;Encapsulation interpolation is filled up, is recorded the mathematical method removing data correction,
Processing data missing values are provided, process repeated data, process noise data and process the data prediction links such as abnormal data
User interface.
Step 3:Design and Implement data modeling component.By the encapsulation classification of R language, cluster, association and time serieses etc.
Data mining model;There is provided graphical interfaces interface to user setup corresponding model analysiss parameter.
Step 4:Design and Implement result visualization display member.By R language by data mining results and model evaluation
Result is presented to user in modes such as figure, lists;It is pushed to result audit crew during by Ajax polling technique by fructufy.
Step 5:Design and Implement model evaluation component.By R language, the model establishing before is estimated;There is provided
Model analysiss parameter and model metadata information are saved in the user interface of system database.
Step 6:Design and Implement user workspace.Realize the pattern manipulation interface of a towed by JQuery,
Create two, area part including component prioritizing component area and flow process;Store User operation log by stack data structures,
The user interface provide revocation, reforming and saving workspace.
Step 7:Define and realize data digging flow.The data mining component being designed with step 1 to step 5 as node,
Define the workflow being made up of several nodes and arrow;There is provided adjustment node execution parameter, change flow performing direction and
Derive the user interfaces such as intermediate calculation results.
Step 8:Integrated and deployment Mining Platform.The data mining component that step 1 to step 5 is designed provides JSON form
Configuration interface, the user interface of the function of customizing Mining Platform in the way of editing configuration file is provided.
As shown in figure 1, data mining platform data-oriented collector according to the present invention, data analyst and result
Three kinds of user roles of audit crew carry out collaboration data excavation, and provide a kind of user workspace of componentization, including data
Acquisition member, data prediction component, data modeling component, result visualization display member and model evaluation component.
Different user roles uses different Users, it is possible to use the number of oneself is set up and operated to the mode of dragging
According to analysis process, data acquisition personnel carry out the upload of data, renewal and deletion action, described data by data acquisition component
Analysis personnel are commented using data prediction component, data modeling component, result visualization component and model successively by flow process order
Estimate component and carry out the data analysis operation such as data acquisition, data prediction, modeling, model evaluation, result audit crew is in user
By result visualization component, Result is examined in work space and given an written reply.
Data analysiss flow process is a kind of elastic working stream being made up of component and arrow.In any one data analysis process
On, transport in the middle of execution parameter, change flow performing direction and the derivation that user can be adjusted on component node at any time
Calculate the operation such as result.
Claims (1)
1. a kind of data mining platform of facing multiple users cooperation construction method it is characterised in that:A kind of data-oriented is provided
Collector, data analyst and the three kinds of user role cooperations of result audit crew carry out the work space of data mining, tool
Body comprises the following steps:
Step 1:Design and Implement data acquisition component:Carry out data acquisition in the following two cases:Data base gathers
Data and web upload mode gathered data;
Gathered data in data base, is connected by Java data base and realizes, by the data access of data mining platform in real time
It is converted in data base corresponding data query;
Web uploads mode gathered data, by monitoring the data upload requests of web client, sets up the storage of client data
The socket of server connects, and reuses Java I/O stream and writes a dataset in the file system of data storage server;
When two kinds of data acquisition components implement, all need for the metadata information of corresponding for data set data to be saved in system
Data base in, and externally provide unified access interface;
Step 2:Design and Implement data prediction component:Statistical analysiss are carried out to data set by R language, graphically
Basic description information to user's demonstrating data collection;Encapsulation interpolation is filled up, is recorded the mathematical method removing data correction, provides
Processing data missing values, the user processing repeated data, processing noise data and process the data prediction link of abnormal data
Interface;
Step 3:Design and Implement data modeling component:By the encapsulation classification of R language, cluster, association and seasonal effect in time series data
Mining model;There is provided graphical interfaces interface to user setup corresponding model analysiss parameter;
Step 4:Design and Implement result visualization display member:By R language by data mining results and model evaluation result
It is presented to user in the way of figure, list;It is pushed to result audit crew during by Ajax polling technique by fructufy;
Step 5:Design and Implement model evaluation component:There is provided the many of accurate rate, error rate and confusion matrix by using R language
Plant model evaluation method;User interface model analysiss parameter and model metadata information being saved in system database is provided;
Step 6:Design and Implement user workspace:Realize the pattern manipulation interface of a towed by JQuery, including
Component prioritizing component area and flow process create two, area part;Store User operation log by stack data structures, provide
Revocation, the user interface reformed and save workspace;
Step 7:Define and realize data digging flow:The data mining component being designed with step 1 to step 5 as node, definition
The workflow being made up of several nodes and arrow;Adjustment node execution parameter, change flow performing direction and derivation are provided
The user interface of intermediate calculation results;
Step 8:Integrated and deployment Mining Platform:The data mining component that step 1 to step 5 is designed provides joining of JSON form
Put interface, the user interface of the function of customizing Mining Platform in the way of editing configuration file is provided.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410059806.8A CN103853821B (en) | 2014-02-21 | 2014-02-21 | Method for constructing multiuser collaboration oriented data mining platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410059806.8A CN103853821B (en) | 2014-02-21 | 2014-02-21 | Method for constructing multiuser collaboration oriented data mining platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103853821A CN103853821A (en) | 2014-06-11 |
CN103853821B true CN103853821B (en) | 2017-02-22 |
Family
ID=50861476
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410059806.8A Active CN103853821B (en) | 2014-02-21 | 2014-02-21 | Method for constructing multiuser collaboration oriented data mining platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103853821B (en) |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572929A (en) * | 2014-12-26 | 2015-04-29 | 深圳市科漫达智能管理科技有限公司 | Data mining method and device |
CN104731953A (en) * | 2015-03-31 | 2015-06-24 | 河海大学 | R-based building method of data preprocessing system |
CN105159688A (en) * | 2015-10-14 | 2015-12-16 | 浙江大学 | Programmable information visualization interaction type design method |
CN105468736A (en) * | 2015-11-23 | 2016-04-06 | 国云科技股份有限公司 | Plug-in and component based data preprocessing system and realization method therefor |
CN105550365A (en) * | 2016-01-15 | 2016-05-04 | 中国科学院自动化研究所 | Visualization analysis system based on text topic model |
CN106446238A (en) * | 2016-10-10 | 2017-02-22 | 合肥红珊瑚软件服务有限公司 | Web data mining system based on XML |
CN108228359B (en) * | 2016-12-15 | 2020-11-03 | 北京京东尚科信息技术有限公司 | Method and system for integrating web program and R program to process data |
CN106599325A (en) * | 2017-01-18 | 2017-04-26 | 河海大学 | Method for constructing data mining visualization platform based on R and HighCharts |
WO2019033401A1 (en) * | 2017-08-18 | 2019-02-21 | 深圳怡化电脑股份有限公司 | Software development method and device |
CN107944146A (en) * | 2017-11-28 | 2018-04-20 | 河海大学 | Polynary hydrology Time Series Matching model building method based on principal component analysis |
CN108304557A (en) * | 2018-02-07 | 2018-07-20 | 霍尔果斯智融未来信息科技有限公司 | A kind of multiple person cooperational data digging method |
CN108563706A (en) * | 2018-03-27 | 2018-09-21 | 昆山和君纵达数据科技有限公司 | A kind of collection big data intelligent service system and its operation method |
CN108694448A (en) * | 2018-05-08 | 2018-10-23 | 成都卡莱博尔信息技术股份有限公司 | PHM platforms |
CN109558395A (en) * | 2018-10-17 | 2019-04-02 | 中国光大银行股份有限公司 | Data processing system and data digging method |
CN109491289A (en) * | 2018-11-15 | 2019-03-19 | 国家计算机网络与信息安全管理中心 | A kind of dynamic early-warning method and device for data center's dynamic environment monitoring |
CN110909039A (en) * | 2019-10-25 | 2020-03-24 | 北京华如科技股份有限公司 | Big data mining tool and method based on drag type process |
CN112069244B (en) * | 2020-08-28 | 2022-07-29 | 福建博思软件股份有限公司 | Method and storage device based on visualization web page data mining |
CN112148747A (en) * | 2020-09-08 | 2020-12-29 | 银清科技有限公司 | Transaction system log analysis method and device based on R language |
CN112632146B (en) * | 2020-12-03 | 2023-04-07 | 成都大数据产业技术研究院有限公司 | Multi-person collaborative visual data mining system |
CN114597890A (en) * | 2022-01-27 | 2022-06-07 | 国网冀北电力有限公司经济技术研究院 | Construction method of holographic data system of power transmission line |
CN116737803B (en) * | 2023-08-10 | 2023-11-17 | 天津神舟通用数据技术有限公司 | Visual data mining arrangement method based on directed acyclic graph |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101324901A (en) * | 2008-08-06 | 2008-12-17 | 中国电信股份有限公司 | Method, platform and system for excavating data |
CN100476819C (en) * | 2006-12-27 | 2009-04-08 | 章毅 | Data mining system based on Web and control method thereof |
-
2014
- 2014-02-21 CN CN201410059806.8A patent/CN103853821B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100476819C (en) * | 2006-12-27 | 2009-04-08 | 章毅 | Data mining system based on Web and control method thereof |
CN101324901A (en) * | 2008-08-06 | 2008-12-17 | 中国电信股份有限公司 | Method, platform and system for excavating data |
Non-Patent Citations (2)
Title |
---|
WEKA数据挖掘平台及其二次开发;陈慧萍等;《计算机工程与应用》;20081015;第44卷(第19期);全文 * |
基于云计算的大数据挖掘平台;何清等;《中兴通讯技术》;20130831;第19卷(第4期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN103853821A (en) | 2014-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103853821B (en) | Method for constructing multiuser collaboration oriented data mining platform | |
US20180357343A1 (en) | Optimization methods for physical models | |
US11816555B2 (en) | System and method for chaining discrete models | |
US20140157417A1 (en) | Methods and systems for architecture-centric threat modeling, analysis and visualization | |
CN103679384A (en) | Method for workflow cooperative office work | |
CN105069025A (en) | Intelligent aggregation visualization and management and control system for big data | |
WO2012074516A1 (en) | Systems and methods for reducing reservoir simulator model run time | |
CN106709017A (en) | Big data-based aid decision making method | |
CN104281525B (en) | A kind of defect data analysis method and the method utilizing its reduction Software Testing Project | |
CN109858823B (en) | Main and distribution network power failure plan selection method and device | |
CN102646137A (en) | Automatic entity basic information generation system and method based on Markov model | |
CN109740872A (en) | The diagnostic method and system of a kind of area's operating status | |
JP2008544407A (en) | Technical methods and tools for capability-based multiple family of systems planning | |
CN105631612A (en) | System and method of evaluating individual performance and capability of public servant based on big data | |
CN113821538B (en) | Stream data processing system based on metadata | |
El‐Ghandour et al. | Survey of information technology applications in construction | |
CN113852204A (en) | Three-dimensional panoramic monitoring system and method for transformer substation with digital twin | |
CN106802928A (en) | Power network historical data management method and its system | |
CN106991516A (en) | A kind of investment planning method and system based on power network resources | |
CN104200338A (en) | Line loss statistics and decision analysis system | |
Laksmiwati et al. | Modeling unpredictable data and moving object in disaster management information system based on spatio-temporal data model | |
CN105373996A (en) | Modeling system based on approval data | |
Seah et al. | Flux. Land: A Data-driven Toolkit for Urban Flood Adaptation | |
Kataeva et al. | Applying graph grammars for the generation of process models and their logs | |
CN118035223A (en) | Multi-scheme optimization method for complex system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |