CN108763460A - A kind of machine learning method and system based on SQL - Google Patents
A kind of machine learning method and system based on SQL Download PDFInfo
- Publication number
- CN108763460A CN108763460A CN201810524549.9A CN201810524549A CN108763460A CN 108763460 A CN108763460 A CN 108763460A CN 201810524549 A CN201810524549 A CN 201810524549A CN 108763460 A CN108763460 A CN 108763460A
- Authority
- CN
- China
- Prior art keywords
- sql
- training
- machine learning
- parameter
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Abstract
The invention discloses a kind of machine learning method and system based on SQL, is related to data analysis and excavation applications, includes the following steps:S1:The data set of active user is marked, the data set includes training set, test set, parameter set;S2:Characteristic processing is carried out to training set and test set according to characteristic processing flow;S3:The parameter combination for waiting for training is converted according to parameter set;S4:A parameter combination is taken out, and SQL embedded methods is called to carry out model training, selects current optimal models;S5:Cycle executes S4, until parameter combination use finishes in S3;S6:Use model.This programme reduces layman's understanding and the threshold using machine learning algorithm, reduces data analysis and excavates the workload of research and development of software personnel.
Description
Technical field
The present invention relates to data analysis and excavation applications more particularly to a kind of machine learning methods and system based on SQL.
Background technology
Currently, in field of artificial intelligence, the problems such as specific data analysis, data mining, it will usually undergo
Data cleansing, Feature Conversion, model training, model evaluation, model such as use at five key links.However machine learning algorithm kind
Class is huge, and quantity reaches hundreds of, and theory deduction is more difficult, and algorithm renewal speed is very fast, and the problem model that algorithms of different is applicable
It differs greatly, if it is data mining technology is used in production environment, also relates to the engineering deployment issue of model.This for
The technical staff of the non-internets industry such as data, economy, medicine, chemistry, communication and the computer technology learner just to get started, such as
What attempts to solve the problems, such as that some in the art are a highly difficult job by machine learning.Therefore how these to be reduced
Layman recognizes and is a urgent demand using the threshold of machine learning algorithm.
Invention content
It is an object of the invention to:A kind of machine learning method and system based on SQL are provided, layman is solved and recognizes
Know and high using the threshold of machine learning algorithm, and the problem of data analysis and the heavy workload of excavation research and development of software personnel.
The technical solution adopted by the present invention is as follows:
A kind of machine learning method and system based on SQL, include the following steps:
S1:The data set of active user is marked, the data set includes training set, test set, parameter set;
S2:Characteristic processing is carried out to training set and test set according to characteristic processing flow;
S3:The parameter combination for waiting for training is converted according to parameter set;
S4:A parameter combination is taken out, and SQL embedded methods is called to execute model instruction to the training set after characteristic processing
Practice, selects current optimal models;
S5:Cycle executes S4, until parameter combination use finishes in S3;
S6:Use model.
Further, training set, test set and the parameter set in the step S1 are by user is directly specified or SQL statement
It is indirectly specified.
Further, the training set can also be the new data set generated in step S1 to S6.
Further, the parameter combination in the step S3 is a tuple-set, i.e. the flute card of parameter set in step S1
You are long-pending.
Further, the step S4 is as follows:
S401:A tuple without putting back to is taken out from parameter combination;
S402:SQL embedded methods are called to carry out model training to training set according to the value of tuple;
S403:Current optimal models are selected according to specified model evaluation method.
Further, S6 is as follows:
S601:Forecast set is acted on into current optimal models;
S602:Optimal models are published to SQL embedded methods library, are used convenient for follow-up;
S603:Optimal models are exported into local hard drive or database, are convenient for model sharing.
In conclusion by adopting the above-described technical solution, the beneficial effects of the invention are as follows:
1, in the present invention, by using the side for being combined SQL (structured query language) grammers with machine-learning process
Formula, using SQL query statement encapsulation training set, test set, parameter set, model training and evaluation process, and by machine learning
Specific algorithm needed for process is encapsulated into SQL function, need not coordinate other programming skill mounter learning processes again
Each step, the difficulty using data analysis and digging technology is reduced, especially for only having grasped traditional SQL technical ability
Technical staff can quick left-hand seat;Simultaneously as the result of calculation of machine learning can be rapidly inserted into SQL result sets, this is conducive to carry
High traditional database report developer, the working efficiency of data analyst and their abundant report data.
2, the present invention in, by the way that machine learning is combined with Database Systems, by the model training of machine-learning process,
Model persistence, model publication are unified to SQL statement, are conducive to the rapid build of model, issue, share, advantageously reduce number
According to the workload of processing and analysis personnel's research and development of software.
Description of the drawings
Fig. 1 is the machine learning method flow chart the present invention is based on SQL;
Fig. 2 is 1 machine learning method flow chart of the embodiment of the present invention;
Fig. 3 is feature of present invention processing method figure;
Fig. 4 is inventive algorithm classification figure.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.
As shown in Figure 1, a kind of machine learning method and system based on SQL, include the following steps:
S1:The data set of active user is marked, the data set includes training set, test set, parameter set;
Specifically, data set marking machine learn grammar it is as follows:
SET{TRD|PAD|TED}[d1]{SQL|UD}
S2:Characteristic processing is carried out to training set and test set according to characteristic processing flow;
Specifically, characteristic processing machine learn grammar it is as follows:
TRANSFORM{TRD|[TRD.field1,TRD.field2,TRD.field3,…,TRD.fieldn]}WITH
{feature_handler}[PAD]
Wherein feature_handler indicates feature extraction processing method, the feature extracting method being related to such as Fig. 3 institutes
Showing, TRD.field1 indicates first feature of training set, and so on, TRD.fieldn indicates n-th of feature of training set.
S3:Parameter combination to be trained is converted according to parameter set;
S4:A parameter combination is taken out, and SQL embedded methods is called to execute model instruction to the training set after characteristic processing
Practice, selects current optimal models;
It is as follows:
S401:A tuple without putting back to is taken out from parameter combination;
S402:SQL embedded methods are called to carry out model training to training set according to the value of tuple;
S403:Current optimal models are selected according to specified model evaluation method.
Specifically, model training machine learning grammer is as follows:
FIT{TRD}WITH{algorithm}[PAD][EVALUATE BY{evaluate_method}][EXPORTED]
[model]
Wherein algorithm indicates specific machine learning training algorithm, the algorithm classification being related to as shown in figure 4,
Evaluate_method indicates that common model evaluation instruction, model indicate trained model name.
S5:Cycle executes S4, until parameter combination use finishes in S3;
S6:Use model.
It is as follows:
S601:Forecast set is acted on into current optimal models;
S602:Optimal models are published to SQL embedded methods library, are used convenient for follow-up;
S603:Optimal models are exported into local hard drive or database, are convenient for model sharing.
Specifically, model is as follows using machine learning grammer:
USE { model } DEPLOY [deployName] | and EXPLORTED [path] | PREDICT { TED } } wherein
DeployName indicates that Issuance model name, path indicate specific address derived from model.
Further, training set, test set and the parameter set in the step S1 are by user is directly specified or SQL statement
It is indirectly specified.
Further, the training set can also be the new data set generated in step S1 to S6.
Further, the parameter combination in the step S3 is a tuple-set, i.e. the flute card of parameter set in step S1
You are long-pending.
Further, the step S4 is as follows:
S401:A tuple without putting back to is taken out from parameter combination;
S402:SQL embedded methods are called to carry out model training to training set according to the value of tuple;
S403:Current optimal models are selected according to specified model evaluation method.
Further, the step S6 is as follows:
S601:Forecast set is acted on into current optimal models;
S602:Optimal models are published to SQL embedded methods library, are used convenient for follow-up;
S603:Optimal models are exported into local hard drive or database, are convenient for model sharing.
By the present invention in that the mode being combined with machine-learning process with SQL (structured query language) grammers, utilizes
SQL query statement encapsulates training set, test set, parameter set, and machine-learning process is encapsulated into SQL function, by engineering
Algorithmic derivation process is practised to be unified for acquisition data set, characteristic processing (as shown in Figure 3), parameter, model training and model is specified to make
With, at the same by the dependent variable type of algorithm, argument types, the attributes such as whether supervise and concluded (as shown in Figure 4), and by its
It is incorporated into database, the training and use of model is finally executed with SQL statement.
Embodiment 1
As shown in debtor's history data table 1 (bankloan):
S1:The data set of active user is marked, the data set includes training set, test set, parameter set;
SET TRD select*from bankloan where empID<9000
9000 datas before bankloan tables are labeled as training set TRD.
SET TED select*from bankloan where empID>=9000
Correspondingly, 1000 datas after bankloan tables are labeled as test set TED.
SET PAD bankloanPAD[iterNum,regParam][[0.3,0.4,0.5],[0.01,0.03]]
Flag parameters collection PAD, its entitled iterNum of parameter, including 0.3,0.4,0.5 three parameter value;
The entitled regParam of parameter, including two parameters of 0.01,0.03.
The setting of parameter indicates iterations according to the engineering experiences of a large amount of machine learning, iterNum,
RegParam indicates that regularization coefficient, more detailed parameter all take default value.
S2:Characteristic processing is carried out to training set and test set according to characteristic processing flow;
TRANSFORM TRD.age WITH feature discrete methods;
TRANSFORM TRD.edu, TRD.mariStat WITH types are converted;
TRANSFORM TRD.salary WITH method for normalizing;
1 training set TRD data of table are handled using data mining characteristic processing means in 3, i.e.,:
(1) right【Age】Carry out discretization operations;
(2) right【Educational level】【Marital status】Type conversion is carried out, senior middle school, training, sheet are respectively represented with 1,2,3,4,5
Section, master, doctor indicate unmarried with 0 accordingly, and 1 indicates married;
(3) right【Annual pay】Operation is normalized;
And so on, it is identical that forecast set TED data carry out characteristic processing flow;
Debtor's history data is as shown in table 2 after characteristic processing:
S3:Parameter combination to be trained is converted according to parameter set;
S4:A parameter combination is taken out, and SQL embedded methods is called to execute model instruction to the training set after characteristic processing
Practice, selects current optimal models;
It is as follows:
S401:A tuple without putting back to is taken out from parameter combination;
S402:SQL embedded methods are called to carry out model training to training set according to the value of tuple;
S403:Current optimal models are selected according to specified model evaluation method.
S5:Cycle executes S4, until parameter combination use finishes in S3;
FIT TRD WITH svm bankloanPAD EVALUATE f1NAME bkmodel
Model training is executed to the training set TRD after characteristic processing using svm algorithms, wherein svm algorithms specify two
Parameter is iterNum (iterations) and regParam (regularization parameter) respectively, and the value set of iterNum is 0.3,04,
0.5, regParam value set is 0.01,0.03;
EVALUAT specifies model evaluation method, and specific training step, which is training set TRD, will execute 6 training, because
A total of 6 kinds of situations of parameter combination:(0.3,0.01), (0.3,0.03), (0.4,0.01), (0.4,0.03), (0.5,0.01),
(0.5,0.03), each training process without put back to take out a parameter combination, call svm methods execute calculating process, and according to
Specified model evaluation method selects current optimal model, is named as bkmodel.
S6:Use model.
It is as follows:
S601:Forecast set is acted on into current optimal models;
USE bkmodel PREDICT select*from bankloan where empID>=9000
By the empID of bankloan>Then=9000 data are all taken out to be acted on model bkmodel as test set
To the data set.
S602:Optimal models are published to SQL embedded methods library, are used convenient for follow-up;
USE bkmodel DEPLOY
The model bkmodel of this training is issued, bkmodel is equally named as, to facilitate execute model next time
It can directly be used when prediction task, process no longer is trained to same training set.
S603:Optimal models are exported into local hard drive or database, are convenient for model sharing.
USE bkmodel EXPORTED“D:\Lab\Models”
Model bkmodel is exported into local D disks Models files, can thus facilitate model sharing and backup.
Symbol meaning explanation involved in this programme is as shown in table 3:
Wherein TRD, PAD, TED then must satisfy following format if it is by User Defined data:
[k1, k2 ..., kn] [vset1, vset2 ..., vsetn] (vsetn=[v1, v2, v3 ... vi], i>=1)
[k1, k2 ...] expression parameter name or field name, first vset1 indicate the value range of k1, correspondingly n-th
A vset indicates the value range of kn.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
All any modification, equivalent and improvement etc., should all be included in the protection scope of the present invention made by within refreshing and principle.
Claims (6)
1. a kind of machine learning method and system based on SQL, which is characterized in that include the following steps:
S1:The data set of active user is marked, the data set includes training set, test set, parameter set;
S2:Characteristic processing is carried out to training set and test set according to characteristic processing flow;
S3:The parameter combination for waiting for training is converted according to parameter set;
S4:A parameter combination is taken out, and SQL embedded methods is called to execute model training, choosing to the training set after characteristic processing
Go out current optimal models;
S5:Cycle executes S4, until parameter combination use finishes in S3;
S6:Use model.
2. a kind of machine learning method and system based on SQL according to claim 1, it is characterised in that:The step S1
In training set, test set and parameter set it is directly specified or SQL statement is specified indirectly by user.
3. a kind of machine learning method and system based on SQL according to claim 1, it is characterised in that:The training set
It can also be the new data set generated in step S1 to S6.
4. a kind of machine learning method and system based on SQL according to claim 1, it is characterised in that:The step S3
In parameter combination be a tuple-set, i.e. the cartesian product of parameter set in step S1.
5. a kind of machine learning method and system based on SQL according to claim 4, which is characterized in that the step S4
It is as follows:
S401:A tuple without putting back to is taken out from parameter combination;
S402:SQL embedded methods are called to carry out model training to training set according to the value of tuple;
S403:Current optimal models are selected according to specified model evaluation method.
6. a kind of machine learning method and system based on SQL according to claim 1, which is characterized in that the step S6
It is as follows:
S601:Forecast set is acted on into current optimal models;
S602:Optimal models are published to SQL embedded methods library, are used convenient for follow-up;
S603:Optimal models are exported into local hard drive or database, are convenient for model sharing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810524549.9A CN108763460A (en) | 2018-05-28 | 2018-05-28 | A kind of machine learning method and system based on SQL |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810524549.9A CN108763460A (en) | 2018-05-28 | 2018-05-28 | A kind of machine learning method and system based on SQL |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108763460A true CN108763460A (en) | 2018-11-06 |
Family
ID=64002798
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810524549.9A Pending CN108763460A (en) | 2018-05-28 | 2018-05-28 | A kind of machine learning method and system based on SQL |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108763460A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918504A (en) * | 2019-02-12 | 2019-06-21 | 成都佳发教育科技有限公司 | One kind is goed over examination papers methods of marking and system |
CN111090680A (en) * | 2019-11-08 | 2020-05-01 | 中国海洋石油集团有限公司 | Shared logging data mining method |
CN112698971A (en) * | 2020-12-30 | 2021-04-23 | 平安科技(深圳)有限公司 | Rule engine based parameter conversion method, device, equipment and medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090193039A1 (en) * | 2008-01-28 | 2009-07-30 | Apollo Data Technologies, Llc | Data driven system for data analysis and data mining |
CN102413127A (en) * | 2011-11-09 | 2012-04-11 | 中国电力科学研究院 | Database generalization safety protection method |
CN105630957A (en) * | 2015-12-24 | 2016-06-01 | 北京大学 | User management application behavior-based application quality distinguishing method and system |
CN106056427A (en) * | 2016-05-25 | 2016-10-26 | 中南大学 | Spark-based big data hybrid model mobile recommending method |
CN106096557A (en) * | 2016-06-15 | 2016-11-09 | 浙江大学 | A kind of semi-supervised learning facial expression recognizing method based on fuzzy training sample |
CN106484914A (en) * | 2016-10-26 | 2017-03-08 | 国云科技股份有限公司 | A kind of modular assembly method for quickly realizing data mining analysis |
CN106779087A (en) * | 2016-11-30 | 2017-05-31 | 福建亿榕信息技术有限公司 | A kind of general-purpose machinery learning data analysis platform |
CN107844836A (en) * | 2017-10-24 | 2018-03-27 | 信雅达系统工程股份有限公司 | A kind of system and learning method based on machine learning |
-
2018
- 2018-05-28 CN CN201810524549.9A patent/CN108763460A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090193039A1 (en) * | 2008-01-28 | 2009-07-30 | Apollo Data Technologies, Llc | Data driven system for data analysis and data mining |
CN102413127A (en) * | 2011-11-09 | 2012-04-11 | 中国电力科学研究院 | Database generalization safety protection method |
CN105630957A (en) * | 2015-12-24 | 2016-06-01 | 北京大学 | User management application behavior-based application quality distinguishing method and system |
CN106056427A (en) * | 2016-05-25 | 2016-10-26 | 中南大学 | Spark-based big data hybrid model mobile recommending method |
CN106096557A (en) * | 2016-06-15 | 2016-11-09 | 浙江大学 | A kind of semi-supervised learning facial expression recognizing method based on fuzzy training sample |
CN106484914A (en) * | 2016-10-26 | 2017-03-08 | 国云科技股份有限公司 | A kind of modular assembly method for quickly realizing data mining analysis |
CN106779087A (en) * | 2016-11-30 | 2017-05-31 | 福建亿榕信息技术有限公司 | A kind of general-purpose machinery learning data analysis platform |
CN107844836A (en) * | 2017-10-24 | 2018-03-27 | 信雅达系统工程股份有限公司 | A kind of system and learning method based on machine learning |
Non-Patent Citations (1)
Title |
---|
WZY0623: "MADlib-基于SQL的数据挖掘解决方案(2)-MADlib基础", 《HTTPS://BLOG.CSDN.NET/WZY0623/ARTICLE/DETAILS/78845020》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918504A (en) * | 2019-02-12 | 2019-06-21 | 成都佳发教育科技有限公司 | One kind is goed over examination papers methods of marking and system |
CN111090680A (en) * | 2019-11-08 | 2020-05-01 | 中国海洋石油集团有限公司 | Shared logging data mining method |
CN112698971A (en) * | 2020-12-30 | 2021-04-23 | 平安科技(深圳)有限公司 | Rule engine based parameter conversion method, device, equipment and medium |
CN112698971B (en) * | 2020-12-30 | 2023-08-18 | 平安科技(深圳)有限公司 | Parameter conversion method, device, equipment and medium based on rule engine |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111240662B (en) | Spark machine learning system and method based on task visual drag | |
CN103441900B (en) | Centralized cross-platform automatization test system and control method thereof | |
CN108763460A (en) | A kind of machine learning method and system based on SQL | |
CN104778124A (en) | Automatic testing method for software application | |
CN111813661B (en) | Global service data drive automatic test method, device, equipment and medium | |
US20130054627A1 (en) | Systems And Methods For Providing A Data Glossary Management System | |
CN110442510A (en) | A kind of page properties acquisition methods, device and computer equipment, storage medium | |
CN114201616A (en) | Knowledge graph construction method and system based on multi-source database | |
CN111159429B (en) | Knowledge graph-based data analysis method and device, equipment and storage medium | |
CN113947468B (en) | Data management method and platform | |
CN111260969B (en) | Data mining course teaching practice system and teaching practice method based on system | |
CN104850638B (en) | ETL concurrent process decision-making technique and device | |
Anquetil et al. | Decomposing god classes at siemens | |
CN111984253B (en) | Method and device for adding programming roles based on graphical programming tool | |
CN111680478B (en) | Report generation method, device, equipment and storage medium based on configuration software | |
AU2016218953A1 (en) | Data structure, model for populating a data structure and method of programming a processing device utilising a data structure | |
Van De Water et al. | Farm Your ML-based Query Optimizer’s Food!–Human-Guided Training Data Generation– | |
CN115983736B (en) | Master-slave relation-based master data modeling method | |
CN109376162A (en) | Table data processing method, terminal device and computer readable storage medium | |
US11740986B2 (en) | System and method for automated desktop analytics triggers | |
CN108804095A (en) | A kind of attribute definition method of monitored picture pixel | |
CN103810304A (en) | Stainless steel order grouping method and system based on rules | |
Chen et al. | Lingua Manga: A Generic Large Language Model Centric System for Data Curation | |
CN115016773A (en) | BS-based software development engine system and construction method thereof | |
Yazici et al. | Normalizing relational database schemas using mathematica |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181106 |
|
RJ01 | Rejection of invention patent application after publication |