CN111339375A - Universal big data model configuration and analysis method - Google Patents

Universal big data model configuration and analysis method Download PDF

Info

Publication number
CN111339375A
CN111339375A CN202010198405.6A CN202010198405A CN111339375A CN 111339375 A CN111339375 A CN 111339375A CN 202010198405 A CN202010198405 A CN 202010198405A CN 111339375 A CN111339375 A CN 111339375A
Authority
CN
China
Prior art keywords
model
analysis
data
training
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010198405.6A
Other languages
Chinese (zh)
Inventor
李明江
万欢
刘敏
辛国安
郑毅
黄小非
刘欢
杜宏伟
闫宾
李新建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Offshore Oil Corp CNOOC
CNOOC Energy Technology and Services Ltd
Original Assignee
China National Offshore Oil Corp CNOOC
CNOOC Energy Technology and Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Offshore Oil Corp CNOOC, CNOOC Energy Technology and Services Ltd filed Critical China National Offshore Oil Corp CNOOC
Priority to CN202010198405.6A priority Critical patent/CN111339375A/en
Publication of CN111339375A publication Critical patent/CN111339375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a general big data model configuration and analysis method, which adopts the modes of constructing a data set, an algorithm base, configuring a model template, background scheduling, early warning pushing and the like, uniformly manages the data set, cleaning rules, the algorithm base (algorithm and parameters), potential factors (target factors), an algorithm model and the like, and automatically (or regularly) executes tasks by configuring a scheduling execution scheme for a training/prediction model. The scheduling center is responsible for executing a big data analysis task and carrying out big data analysis processing on historical/real-time data. The early warning pushing center pushes early warning information, the center is configured visually, and visual display of big data analysis results is achieved. The model management method realizes the bidirectional sharing of the analysis model, exports the analysis model, and provides a model interface specification for an external system to use; the analysis model of the external system realizes the application of the external model through importing and configuring.

Description

Universal big data model configuration and analysis method
Technical Field
The invention is applied to big data analysis, and is a general big data model configuration and analysis method.
Background
In conventional big data analysis, the analysis processes usually adopted are data preparation, manual data cleaning, writing and calling of corresponding algorithm codes, parameter selection, training and the like for specific applications. In the big data analysis, the functions of all the steps are required to be redone every time the work is carried out, and the work is started from zero. Therefore, the data preparation and data cleaning workload is large, the code development repeatability is large, a large amount of repeated development work is caused, the development period is long, the working cost is high, and the working efficiency is low. In addition, business personnel are not matched with software developers, and big data analysis is difficult to develop.
The existing partial method has the following functional defects:
first, there is no integrated configuration, analysis, and presentation function.
And secondly, the early warning pushing function is not provided. Some big data analysis results have timeliness requirements, and the analysis results need to be processed immediately, so that related personnel cannot be informed in time.
Third, there is no analytical model sharing function.
Therefore, a general big data model configuration and analysis method is developed, a big data analysis process is standardized, big data analysis code development can be simplified, the development period is greatly shortened, and the development cost is saved.
Disclosure of Invention
The patent provides a general big data model configuration and analysis method. And uniformly managing the data set, the cleaning rule, the analysis method and parameters, the potential factors, the target factors and other factors in the analysis model by adopting a mode of configuring an analysis model template. And (3) configuring a scheduling execution scheme through the prediction model, automatically (or regularly) executing tasks, and automatically training to obtain the analysis model. The scheduling center is responsible for processing big data analysis tasks, carrying out big data analysis on massive data, predicting trends and the like, mining valuable information, finding abnormal conditions and the like. The early warning information is pushed by the early warning pushing center, the analysis result is constructed by the analysis result visualization configuration center, and the visualization display of the big data analysis result is realized. The bidirectional sharing function of the analysis model is provided, the analysis model can be exported, and the model interface specification is provided for an external system to use. The external system analysis model can be imported and managed to be applied as an analysis model in the platform.
The technical scheme of the invention is as follows:
a general big data model configuration and analysis method is as follows:
1. building analytical model configurations
The first step is as follows: determining the name of the analysis model, selecting an analysis algorithm from an algorithm library, and configuring algorithm parameters.
The second step is that: one or more metadata tables are selected from the data set, and a data column to be analyzed is selected as the data set for data analysis. And configuring data processing such as data screening, grouping, sequencing and the like, and using the finally obtained data as basic data of model analysis.
The third step: filling in data cleaning rules, carrying out reexamination and verification on the analysis basic data, processing invalid values and missing values, deleting repeated information, calculating and processing data columns, and carrying out simple screening, grouping and sequencing on the data.
The fourth step: and selecting a potential factor column as a data sample, and except unsupervised learning such as clustering and the like, designating a characteristic column and selecting a target factor.
The fifth step: training analytical model validation
And selecting a data set for training from the mass data according to the data set configuration, training, analyzing and comparing evaluation indexes, selecting optimal algorithm parameters, and finally generating an analysis model.
2. Big data analysis
The scheduling center is responsible for processing a big data analysis task, and carrying out big data analysis on mass data, and model parameters can be adjusted to achieve the best effect. And converting the trained model into an actual prediction model, and designating real-time data to perform prediction early warning analysis.
3. Pushing early warning information
And the early warning pushing center monitors the execution condition of each model, processes and pushes early warning information in real time for the abnormal analysis result of the big data analysis, and reminds and notifies related personnel.
4. Previewing analysis results
And for the model execution condition, the visual data of the analysis result can be viewed from the customized interface. The model version number, batch number and detailed analysis results can also be viewed.
5. Shared analytical model
Providing a two-way sharing function of the analytical model. For excellent analytical models of other systems, they can be referenced and introduced, and then become analytical models within the platform. Meanwhile, the analytical model in the platform can be exported, and the analytical model and the software interface specification are provided for other systems to use.
The invention has the advantages and beneficial effects that: the method adopts the modes of constructing a data set, an algorithm library, configuring a model template, scheduling a background, early warning and pushing and the like, uniformly manages the data set, cleaning rules, the algorithm library (algorithm and parameters), potential factors (target factors), an algorithm model and the like, and automatically (or regularly) executes tasks by configuring a scheduling execution scheme for a training/prediction model. The scheduling center is responsible for executing a big data analysis task and carrying out big data analysis processing on historical/real-time data. The early warning pushing center pushes early warning information, the center is configured visually, and visual display of big data analysis results is achieved. The model management method realizes the bidirectional sharing of the analysis model, exports the analysis model, and provides a model interface specification for an external system to use; the analysis model of the external system realizes the application of the external model through importing and configuring.
Drawings
FIG. 1 is a big data model configuration and analysis flow diagram.
FIG. 2 is a flow diagram of an analytical model configuration.
For a person skilled in the art, other relevant figures can be obtained from the above figures without inventive effort.
Detailed Description
In order to make the technical solution of the present invention better understood, the technical solution of the present invention is further described below with reference to specific examples.
Examples
A general big data model configuration and analysis method is disclosed, and the big data model configuration and analysis flow is shown in figure 1.
First, data set construction
Aiming at enterprise data (structured and unstructured), an enterprise data set is constructed, business data sets are respectively established according to business classification, a main data standard is unified, a data source and a data standard are unified for data application, and corresponding data are selected through the data sets in big data model training, verification and application analysis.
Second, algorithm library construction
1. Algorithm library establishing method
A unified processing class is developed aiming at common big data algorithms such as big data linear regression, logistic regression, random forest classification, clustering algorithm, decision tree classification, decision tree regression, neural network and the like, and calling parameters, description and the like of each method are specified, so that a user does not need to care about an implementation mode in the processing class and repeatedly develop. And establishing a big data algorithm library, and providing algorithm selection when a big data model is constructed.
2. Algorithm management method
The algorithm of the algorithm library is managed by adding, modifying and the like;
and for an external new algorithm, importing the new algorithm into the algorithm library according to the requirements of the algorithm library such as naming, calling conditions and the like, and providing algorithm selection when a big data model is constructed.
Thirdly, establishing an analysis model
1. Configuring analytical models
The analytical model configuration flow is shown in fig. 2.
(1) Model name
And inputting the name of the analysis model according to the business analysis condition.
(2) Selection analysis algorithm
And selecting linear regression, logistic regression, random forest classification, clustering algorithm, decision tree classification, decision tree regression, neural network and other big data algorithms from the algorithm library according to the requirements of the analysis model.
Selecting an analysis algorithm, configuring processing parameters of the algorithm, and setting parameter description to help a user to reasonably configure the model. For individual instantiation requirements which cannot be met by the algorithm library, a custom analysis algorithm program can be added in the algorithm library management, or the custom analysis algorithm program is uploaded and parameters are configured, and the custom analysis algorithm is automatically imported into the algorithm library.
(3) Selecting a data set
One or more metadata tables are selected from the enterprise data set, and a data column to be analyzed is checked out to be used as a data set for data analysis. A plurality of data sets are used, setting methods such as SQL association conditions are provided, and complex requirements are met.
(4) Setting data cleansing rules
In order to eliminate dirty data acquired in the metadata acquisition process, data records which only need to be analyzed are screened out, records where illegal columns are located are eliminated, and aggregation processing such as averaging and summarizing is conducted on key columns. And one or more cleaning SQL (structured query language) are configured to process the selected data set for multiple times, so that the data set requirement required by big data analysis is met.
(5) Selecting potential factors and target factors
Listing the data items from the data set, the user may select one or more potential factors, as well as select a target factor. And (4) finding the association relation between the potential factors and the target factors by analyzing an algorithm.
Clustering does not require selection of a target factor, and the algorithm is used to classify a plurality of potential factors.
2. Training analytical model
After the analysis model is configured, a training model execution mode is configured, wherein the training model execution mode comprises a manual execution mode and a scheduling execution mode, the manual mode is one-time training, the scheduling execution model is used for automatically training multiple batches of complex data sets, and parameters such as batch training time can be set.
After configuration is completed, the system operates in a background, data preparation and cleaning are automatically carried out according to the selected mass data set, training is carried out according to the selected algorithm and configuration parameters, evaluation indexes are analyzed and compared, optimal algorithm parameters are selected, and finally an analysis model is generated.
According to the training result, the model parameters can be adjusted for the imperfect model, and the training can be repeated. All training models provide query functions such as training version numbers, batch numbers, detailed analysis results and the like.
Fourth, dispatching center
The method is provided with a scheduling management center for monitoring and managing all task nodes in the cluster and realizing load balance of the task nodes. And the node faults are monitored in real time, and automatic fault migration can be realized. And distributing a new big data analysis task to the cluster according to the node resource utilization condition. And configuring a big data analysis task in a visual mode, providing a big data analysis task scheduling graph, and displaying a load curve graph of a task node in real time.
Five, big data analysis
After the big data training model is verified to meet the production requirements, a user can select to convert the training model into a prediction model function, set real-time data as a data source, automatically process a big data analysis task by a dispatching center, and perform big data trend analysis, prediction early warning and other analysis on massive data.
Sixthly, early warning pushing center
The method is provided with an early warning pushing center and provides a multi-way pushing mode comprising message pushing, short messages, mails, APPs, QQQs and the like in a platform. The target crowd can be flexibly set and pushed, the message can be sent in a group mode, the message can be sent at a fixed point, and the message can be sent to the user group. The message content template can be flexibly configured, and the method can support pure text messages and HTML rich text messages. And configuring the early warning push message in a visual mode, and providing early warning message push state query.
Seventh, push the early warning information
And the early warning pushing center monitors the execution condition of each model, processes and pushes early warning information in real time for the abnormal analysis result of the big data analysis, and reminds and notifies related personnel.
Eighthly, visually displaying analysis results
1. Analysis result visualization configuration center
And dragging and constructing an analysis result visualization interface template in a what you see is what you get mode in an analysis result visualization configuration center, and supporting the display customization of lists, charts and scrolling panels.
2. Visual display of analysis results
And for the model execution condition, the visual data of the analysis result can be viewed from the customized interface.
Nine, sharing analysis model
Providing a two-way sharing function of the analytical model. For excellent analytical models of other systems, they can be referenced and introduced, and then converted into analytical models within the platform. Meanwhile, all analysis models in the system can be exported, and the analysis models and software interface specifications are provided for other business systems to use, so that a big data analysis model program does not need to be developed.
The invention has been described in an illustrative manner, and it is to be understood that any simple variations, modifications or other equivalent changes which can be made by one skilled in the art without departing from the spirit of the invention fall within the scope of the invention.

Claims (1)

1. A general big data model configuration and analysis method is characterized in that:
(1) configuration method for constructing analysis model
The first step is as follows: determining the name of an analysis model, selecting an analysis algorithm from an algorithm library, and configuring algorithm parameters;
the second step is that: selecting one or more metadata tables from the data set, selecting a data column to be analyzed as a data set for data analysis, configuring data processing such as data screening, grouping, sorting and the like, and finally obtaining data as basic data of model analysis;
the third step: filling in a data cleaning rule, carrying out reexamination and verification on the analysis basic data, processing invalid values and missing values, deleting repeated information, calculating and processing data columns, and screening, grouping and sequencing the data;
the fourth step: selecting a potential factor column as a data sample, designating a characteristic column except unsupervised learning such as clustering and the like, and selecting a target factor;
(2) big data analysis configuration and execution method
The first step is as follows: configuring a training model execution mode, wherein the training model execution mode comprises a manual execution mode and a scheduling execution mode, the manual mode is one-time training, and the scheduling execution mode is used for performing automatic batch training on incremental data;
the second step is that: after the configuration is completed, the system operates in a background, data preparation and cleaning are automatically carried out according to the selected mass data set, training is carried out according to the selected algorithm and configuration parameters, evaluation indexes are analyzed and compared, optimal algorithm parameters are selected, and an analysis model is finally generated;
the third step: according to the training result, adjusting model parameters of the imperfect model, repeatedly training, and after each training execution, generating a training version number, a batch number and a detailed analysis result for inquiry;
the fourth step: converting the trained model into an actual prediction model, and performing prediction early warning analysis on specified prediction (real-time) data;
(3) pushing early warning information
The early warning pushing center monitors the execution condition of each model, processes and pushes early warning information in real time for the abnormal analysis result of the big data analysis, and reminds and notifies related personnel;
(4) previewing analysis results
For the execution condition of the model, visual data of the analysis result can be checked from a customized interface, and the version number, batch number and detailed analysis result of the model can also be checked;
(5) shared analytical model
The two-way sharing function of the analysis model is provided, the excellent analysis model of other systems can be referenced and introduced, then the excellent analysis model is formed into the analysis model in the platform, meanwhile, the analysis model in the platform can be exported, and the analysis model and the software interface specification are provided for other systems to use.
CN202010198405.6A 2020-03-19 2020-03-19 Universal big data model configuration and analysis method Pending CN111339375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010198405.6A CN111339375A (en) 2020-03-19 2020-03-19 Universal big data model configuration and analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010198405.6A CN111339375A (en) 2020-03-19 2020-03-19 Universal big data model configuration and analysis method

Publications (1)

Publication Number Publication Date
CN111339375A true CN111339375A (en) 2020-06-26

Family

ID=71184169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010198405.6A Pending CN111339375A (en) 2020-03-19 2020-03-19 Universal big data model configuration and analysis method

Country Status (1)

Country Link
CN (1) CN111339375A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025509A (en) * 2016-02-01 2017-08-08 腾讯科技(深圳)有限公司 Decision system and method based on business model
CN112269610A (en) * 2020-10-26 2021-01-26 南京燚麒智能科技有限公司 Method and device for executing batch model algorithm
CN113407180A (en) * 2021-05-28 2021-09-17 济南浪潮数据技术有限公司 Configuration page generation method, system, equipment and medium
CN113515500A (en) * 2021-05-24 2021-10-19 苏州维众数据技术有限公司 Visual data processing system and processing method
CN113742315A (en) * 2021-08-17 2021-12-03 广州工业智能研究院 Manufacturing big data processing platform and method
CN113936183A (en) * 2021-09-10 2022-01-14 南方电网深圳数字电网研究院有限公司 Data prediction method and device based on model training
CN114510519A (en) * 2022-01-25 2022-05-17 北京航天云路有限公司 Visual analysis method and system based on industrial big data model
CN116821200A (en) * 2023-07-04 2023-09-29 大师兄(上海)云数据服务有限公司 Visual analysis system and visual analysis method for artificial intelligent cloud data
CN117112539A (en) * 2023-10-23 2023-11-24 北京万界数据科技有限责任公司 Machine learning-oriented data model management system
CN117688485A (en) * 2024-02-02 2024-03-12 北京中卓时代消防装备科技有限公司 Fire disaster cause analysis method and system based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779087A (en) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 A kind of general-purpose machinery learning data analysis platform
CN108846567A (en) * 2018-06-04 2018-11-20 广东京信软件科技有限公司 Regulatory analysis method and system are advised up and down based on the multifactor associated enterprise of big data
CN109213482A (en) * 2018-06-28 2019-01-15 清华大学天津高端装备研究院 The graphical application platform of artificial intelligence and application method based on convolutional neural networks
CN109284298A (en) * 2018-11-09 2019-01-29 上海晏鼠计算机技术股份有限公司 A kind of contents production system handled based on machine learning and big data
CN110378463A (en) * 2019-07-15 2019-10-25 北京智能工场科技有限公司 A kind of artificial intelligence model standardized training platform and automated system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779087A (en) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 A kind of general-purpose machinery learning data analysis platform
CN108846567A (en) * 2018-06-04 2018-11-20 广东京信软件科技有限公司 Regulatory analysis method and system are advised up and down based on the multifactor associated enterprise of big data
CN109213482A (en) * 2018-06-28 2019-01-15 清华大学天津高端装备研究院 The graphical application platform of artificial intelligence and application method based on convolutional neural networks
CN109284298A (en) * 2018-11-09 2019-01-29 上海晏鼠计算机技术股份有限公司 A kind of contents production system handled based on machine learning and big data
CN110378463A (en) * 2019-07-15 2019-10-25 北京智能工场科技有限公司 A kind of artificial intelligence model standardized training platform and automated system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025509B (en) * 2016-02-01 2021-06-18 腾讯科技(深圳)有限公司 Decision making system and method based on business model
CN107025509A (en) * 2016-02-01 2017-08-08 腾讯科技(深圳)有限公司 Decision system and method based on business model
CN112269610A (en) * 2020-10-26 2021-01-26 南京燚麒智能科技有限公司 Method and device for executing batch model algorithm
CN113515500B (en) * 2021-05-24 2023-06-30 苏州维众数据技术有限公司 Visual data processing system and processing method
CN113515500A (en) * 2021-05-24 2021-10-19 苏州维众数据技术有限公司 Visual data processing system and processing method
CN113407180A (en) * 2021-05-28 2021-09-17 济南浪潮数据技术有限公司 Configuration page generation method, system, equipment and medium
CN113742315A (en) * 2021-08-17 2021-12-03 广州工业智能研究院 Manufacturing big data processing platform and method
CN113936183A (en) * 2021-09-10 2022-01-14 南方电网深圳数字电网研究院有限公司 Data prediction method and device based on model training
CN114510519A (en) * 2022-01-25 2022-05-17 北京航天云路有限公司 Visual analysis method and system based on industrial big data model
CN116821200A (en) * 2023-07-04 2023-09-29 大师兄(上海)云数据服务有限公司 Visual analysis system and visual analysis method for artificial intelligent cloud data
CN117112539A (en) * 2023-10-23 2023-11-24 北京万界数据科技有限责任公司 Machine learning-oriented data model management system
CN117112539B (en) * 2023-10-23 2024-01-05 北京万界数据科技有限责任公司 Machine learning-oriented data model management system
CN117688485A (en) * 2024-02-02 2024-03-12 北京中卓时代消防装备科技有限公司 Fire disaster cause analysis method and system based on deep learning
CN117688485B (en) * 2024-02-02 2024-04-30 北京中卓时代消防装备科技有限公司 Fire disaster cause analysis method and system based on deep learning

Similar Documents

Publication Publication Date Title
CN111339375A (en) Universal big data model configuration and analysis method
CN105183625B (en) A kind of daily record data treating method and apparatus
US11847130B2 (en) Extract, transform, load monitoring platform
US20040221258A1 (en) Method and apparatus for generating custom status display
CN112000849A (en) Unified label library management method, device, equipment and storage medium
US20180293308A1 (en) User interface search tool for locating and summarizing data
WO2017161316A1 (en) Analytics engine for detecting medical fraud, waste, and abuse
CN112527774A (en) Data center building method and system and storage medium
CN102609789A (en) Information monitoring and abnormality predicting system for library
CN111309712A (en) Optimized task scheduling method, device, equipment and medium based on data warehouse
CN112651817A (en) Intelligent financial decision big data analysis system
CN107577692B (en) Method for configuring MOM data warehouse and providing UI for MOM data warehouse configuration
Ramos-Gutiérrez et al. Discovering configuration workflows from existing logs using process mining
Wöstmann et al. Conception of a reference architecture for machine learning in the process industry
JP2021093126A (en) Methods, systems and computer program products for monitoring field device states in a process control system
US20120159455A1 (en) Rating interestingness of profiling data subsets
US8066194B2 (en) System and method for managing information
CN110601866A (en) Flow analysis system, data acquisition device, data processing device and method
CN116151632A (en) Data architecture method
Polancic et al. Comparative assessment of open source software using easy accessible data
CN115328908A (en) Visualization operation generation solution method based on Flink
US20140067874A1 (en) Performing predictive analysis
US20140149186A1 (en) Method and system of using artifacts to identify elements of a component business model
CN116136842A (en) Information resource integration method based on big data mode
Miškuf et al. Application of business intelligence solutions from microsoft and IBM on manufacturing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200626

RJ01 Rejection of invention patent application after publication