CN109657803A - The building of machine learning model - Google Patents

The building of machine learning model Download PDF

Info

Publication number
CN109657803A
CN109657803A CN201810245188.4A CN201810245188A CN109657803A CN 109657803 A CN109657803 A CN 109657803A CN 201810245188 A CN201810245188 A CN 201810245188A CN 109657803 A CN109657803 A CN 109657803A
Authority
CN
China
Prior art keywords
model
data
machine learning
function
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810245188.4A
Other languages
Chinese (zh)
Other versions
CN109657803B (en
Inventor
丁远普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN201810245188.4A priority Critical patent/CN109657803B/en
Priority to PCT/CN2019/078619 priority patent/WO2019179408A1/en
Publication of CN109657803A publication Critical patent/CN109657803A/en
Application granted granted Critical
Publication of CN109657803B publication Critical patent/CN109657803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This disclosure relates to the building of machine learning model, including syntax parsing is carried out to received SQL statement, extract function name;If the function name is mapped to trained function, initial parameter, training field mark and training data table mark are obtained from the SQL statement;Algorithm corresponding with the function name is obtained from Spark MLlib, and the algorithm is initialized using the initial parameter, obtains initial model;It is identified according to the training field, is identified in corresponding training data table from the training data table and extract data, as training data;The initial model is trained using the training data, obtains machine learning model corresponding with the function name, the convenience and ease for use of machine learning are able to ascend according to the construction method of the machine learning model of the embodiment of the present disclosure and device.

Description

The building of machine learning model
Technical field
This disclosure relates to the construction method and device of database technical field more particularly to a kind of machine learning model.
Background technique
Spark is a kind of distributed computing framework, provide a comprehensive, unified frame for manage have dissimilarity The big data of the data set and data source (batch data or real-time flow data) of matter (text data, chart data etc.) is handled Demand.
SparkSQL is that (Structured Query Language, structuring are looked into for a kind of distributed SQL based on Spark Ask language) query engine, super large data set can be inquired using SparkSQL, statistics and analysis.
Spark MLlib (Machine Learning lib, machine learning library) is a kind of machine learning based on Spark Library is made of some general learning algorithms and tool, including classification, recurrence, cluster, collaborative filtering and dimensionality reduction etc..Meanwhile Spark MLlib further includes the optimization primitive and high-rise pipeline API (Application Programming of bottom Interface, application programming interface), the API of the language such as Scala, Python and Java is provided, API can be passed through Carry out model training and prediction.
Summary of the invention
In view of this, being able to ascend engineering the present disclosure proposes a kind of construction method of machine learning model and device The convenience and ease for use of habit.
According to the one side of the disclosure, a kind of construction method of machine learning model is provided, comprising: to received SQL Sentence carries out syntax parsing, extracts function name;If the function name is mapped to trained function, obtained from the SQL statement Initial parameter, training field mark and training data table mark;Calculation corresponding with the function name is obtained from Spark MLlib Method, and the algorithm is initialized using the initial parameter, obtain initial model;It is identified according to the training field, from described Training data table identifies in corresponding training data table and extracts data, as training data;Using the training data to described Initial model is trained, and obtains machine learning model corresponding with the function name.
According to another aspect of the present disclosure, a kind of construction device of machine learning model, including executive plan mould are provided Block and data memory module, the executive plan module are used to carry out syntax parsing to received SQL statement, extract function name; If the function name is mapped to trained function, initial parameter, training field mark and training number are obtained from the SQL statement It is identified according to table;Algorithm corresponding with the function name is obtained from Spark MLlib, and institute is initialized using the initial parameter Algorithm is stated, initial model is obtained;It is identified according to the training field, the training data table mark described in the data memory module Know in corresponding training data table and extract data, as training data;The initial model is carried out using the training data Training, obtains machine learning model corresponding with the function name.
According to another aspect of the present disclosure, the construction device for providing a kind of machine learning model includes: processor;For The memory of storage processor executable instruction;Wherein, the processor is configured to executing the above method.
According to another aspect of the present disclosure, a kind of non-volatile computer readable storage medium storing program for executing is provided, is stored thereon with Computer program instructions, wherein the computer program instructions realize the above method when being executed by processor.
It can be from Spark MLlib according to the construction method of the machine learning model of various aspects of the present disclosure embodiment and device Middle calling algorithm simultaneously carries out model training, and corresponding machine learning model is obtained by way of pure SQL, compared to passing through API The mode of interface carries out machine learning, eliminates a large amount of programing work, improves the convenience and ease for use of machine learning.
According to below with reference to the accompanying drawings to detailed description of illustrative embodiments, the other feature and aspect of the disclosure will become It is clear.
Detailed description of the invention
Comprising in the description and constituting the attached drawing of part of specification and specification together illustrates the disclosure Exemplary embodiment, feature and aspect, and for explaining the principles of this disclosure.
Fig. 1 shows the flow chart of the construction method of the machine learning model according to one embodiment of the disclosure.
Fig. 2 shows the configuration diagrams according to the database server of one embodiment of the disclosure.
Fig. 3 shows the flow chart of the construction method of the machine learning model according to one embodiment of the disclosure.
Fig. 4 shows the flow chart of the construction method of the machine learning model according to one embodiment of the disclosure.
Fig. 5 shows the block diagram of the construction device of the machine learning model according to one embodiment of the disclosure.
Fig. 6 shows the block diagram of the construction device of the machine learning model according to one embodiment of the disclosure.
Specific embodiment
Various exemplary embodiments, feature and the aspect of the disclosure are described in detail below with reference to attached drawing.It is identical in attached drawing Appended drawing reference indicate element functionally identical or similar.Although the various aspects of embodiment are shown in the attached drawings, remove It non-specifically points out, it is not necessary to attached drawing drawn to scale.
Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.
In addition, giving numerous details in specific embodiment below to better illustrate the disclosure. It will be appreciated by those skilled in the art that without certain details, the disclosure equally be can be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.
Fig. 1 shows the flow chart of the construction method of the machine learning model according to one embodiment of the disclosure.This method can be with It is executed by database server, as shown in Figure 1, the construction method of the machine learning model includes:
S11 carries out syntax parsing to received SQL statement, extracts function name.
SQL refers to structured query language, is a kind of data base querying and programming language, for accessing data base set System.Access operation to database may include: increase, deletion, reading and the change etc. to data, can pass through SQL statement To realize above-mentioned access operation.SQL statement is a kind of descriptive language, specifies access task, and database server needs An executive plan is specified according to SQL statement, which illustrates how to complete the access task.
In one possible implementation, database server can receive SQL statement from client, and client can be with It is deployed on the database server, can also be deployed on other servers, with no restrictions to this disclosure.Show at one In example, the mode that client obtains SQL statement can be the SQL statement in client acquisition input frame.
Database server is built upon the server on the basis of Database Systems, can be by running one in a local network A or multiple servers and data base management system management software collectively constitute, and database server can provide for client Data service.In the embodiments of the present disclosure, database server has SQL statement analytic ability, can be by SQL statement cutting Statement block is determined to execute sequence, forms executive plan.In one example, database server can dispose SparkSQL Module is able to carry out the parsing of SQL statement by the SparkSQL module, is inquired large data sets, statistics and analysis.Fig. 2 The configuration diagram of database server according to one embodiment of the disclosure is shown.As shown in Fig. 2, database server includes mould Type memory module, data memory module and executive plan module etc..
In one possible implementation, database server can first pre-process SQL statement to obtain standard SQL statement, then syntax parsing is carried out to stsndard SQL sentence, extract function name.In one example, pretreatment may include disappearing Except the space before and after SQL statement, the continuous blank character (including space, TAB and new line) in SQL statement is substituted for list A space, the capital and small letter (SQL statement is become to lowercase versions or patterns of capitalization entirely) of unified SQL statement, in the tail of SQL statement Afterwards plus terminating symbol " ENDOFSQL " etc..
During database server carries out syntax parsing to SQL statement, SQL statement can be split, and really The meaning of each section is made, database server can extract function name in the process.
Machine learning function can be used to indicate that the function used in machine learning.It is extracted from SQL statement if above-mentioned Function name has been mapped to machine learning function, shows that machine learning will be carried out.In one example, machine learning function can be with For customized function.
In one possible implementation, machine learning function includes training function and anticipation function two types.Its In, the related command of training machine learning model is completed using training function;It is organic that use is completed using anticipation function The related command that device learning model is predicted.It should be noted that training function and anticipation function are only machine learning function An example, machine learning function can also be other functions that may be used in machine-learning process, not to this disclosure It is limited.
Due to different types of machine learning function, the data of acquisition, the treatment process to data, obtained result is not Together, therefore, database server can generate different executive plans based on the type of machine learning function.
In one possible implementation, the type function and letter of machine learning function are stored in database server Several map informations, then database server can search corresponding machine according to the function name proposed from SQL statement The type function of learning function, to specify corresponding executive plan, for example, then specifying training machine study if training function The executive plan of model;If anticipation function, then the executive plan predicted using existing machine learning model is specified.
Wherein, the corresponding executive plan of training function can be described in following steps S12-S15.
S12 obtains initial parameter, training field if the function name is mapped to trained function from the SQL statement Mark and training data table mark.
The trained function is the function of training machine learning model.Training function representation Certain function summary, training function can To include multiple for training the function of different machine learning models, these functions can according to need addition, modify and delete It removes, with no restrictions to this disclosure.
Initial parameter can be used to indicate that used model parameter during model initialization, and initial parameter can basis It needs to be configured, with no restrictions to this disclosure.
As shown in Fig. 2, data memory module is stored with training data table, the data in training data table can be used for training Machine learning model.It can be with recognition training tables of data by training data table mark.Training field is training machine learning model Data in training data table corresponding field, by training field mark can be with recognition training field.
S13 obtains algorithm corresponding with the function name from Spark MLlib, and initial using the initial parameter Change the algorithm, obtains initial model.
In one possible implementation, database server is stored with function name and the algorithm path of trained function (algorithm path can be used to indicate that the calling station of some algorithm in Spark MLlib, for example, the position of the affiliated class of algorithm) Corresponding relationship, database server can determine the corresponding algorithm path of the function name, and press by searching for the corresponding relationship According to the algorithm path, algorithm corresponding with the function name is obtained from Spark MLlib.
S14 is identified according to the training field, is identified in corresponding training data table from the training data table and is extracted number According to as training data.
Database server can identify according to training data table and determine training data table, and instruct from the training data table Data are extracted in the corresponding field of segment identification of practising handwriting, as training data.
S15 is trained the initial model using the training data, obtains machine corresponding with the function name Learning model.
In one possible implementation, database server can be determined according to algorithm types and be carried out to initial model Training or unsupervised training.
In one example, for SQL statement select LogisticRegression (' lr_model_t01', Label, col1, col2, col3, '-MaxIter 20') from mltable, it is entitled that database server extracts function LogisticRegression.Assuming that function name LogisticRegression is mapped to trained function, then database server Acquisition-MaxIter 20 is used as initial parameter from the SQL statement, and label, col1, col2 and col3 are as training field mark Know, mltable is identified as training data table.Database server can obtain function name from Spark MLlib The corresponding algorithm of LogisticRegression.It is assumed that training field mark label, col1, col2 and col3 are corresponded respectively to Label, col1, col2 and col3 field, training data table identify mltable and correspond to mltable table, function name LogisticRegression corresponds to LogisticRegression algorithm, and database server can use initial parameter- MaxIter 20 initializes LogisticRegression algorithm, obtains initial model, from label, col1 of mltable table, Col2 and col3 field is extracted data and is trained using training data to initial model as training data, obtains and function The corresponding machine learning model of name LogisticRegression.
The embodiment of the present disclosure carries out machine learning by way of SQL, compared to carrying out machine by way of api interface Study, eliminates a large amount of programing work, improves the convenience and ease for use of machine learning.
In addition, JDBC (connection of Java DataBase Connectivity, java database) is a kind of for executing SQL The Java API of sentence can provide unified access for a variety of relational databases, class that it is write by one group with Java language and Interface composition.JDBC provides a kind of benchmark, can construct more advanced tool and interface accordingly, enable database development personnel Enough write database application.
SparkSQL itself can be called by JDBC standard interface, according to the machine learning mould of the embodiment of the present disclosure The construction method of type can be also called by JDBC standard interface by after the process SQLization of machine learning, improve standard Change degree.
In one possible implementation, database server can deposit the machine learning model obtained by training Storage is in HDFS file system.HDFS (Hadoop Distributed File System, distributed file system) is a kind of It is suitble to operate in the distributed file system on common hardware, fault-tolerance and handling capacity are higher, are suitble on large-scale dataset Using.Since model file is larger, model file be can store on HDFS, and data server can be directly from HDFS text Part system calls machine learning model.As shown in Fig. 2, HDFS file system can be deployed in model memory module.
In one possible implementation, mould corresponding with the machine learning model can be generated in database server Type table, record has the location information and parameter information of the machine learning model in the model table.Wherein, location information can be with For indicating machine learning model in the storage location of HDFS.Database server can quickly transfer machine according to location information Learning model avoids and searches matched process in the huge data of HDFS file system, improves machine learning model Transfer speed.Parameter information can be used to indicate that the configuration variables inside model, can be with the function of Definition Model.Such as artificial mind Through in the weight in network, the supporting vector in support vector machines, linear regression or logistic regression coefficient, in K mean algorithm K value etc..Database server can manage machine learning model according to parameter information.In one example, as shown in Fig. 2, Model table can store in data memory module.
In one possible implementation, if the function name is mapped to trained function, database server can be with Model table mark is obtained from SQL statement, generates corresponding with model table mark model table, and by the machine learning model Location information and parameter information be recorded in the model table and identify in corresponding model table.In one example, for SQL statement select LogisticRegression('lr_model_t01',label,col1,col2,col3,'-MaxIter 20') From mltable, database server can be generated the model table that model table is identified as lr_model_t01, and by engineering The location information and parameter information for practising model are recorded in the model table and are identified as in the model table of lr_model_t01.
Fig. 3 shows the flow chart of the construction method of the machine learning model according to one embodiment of the disclosure.As shown in figure 3, After extracting function name, the construction method of the machine learning model further include:
S16 determines the corresponding letter of the function name according to the mapping table of the function name query function name and type function Several classes of types, the type function include training function and anticipation function.
If not finding the function name of extraction, database server specifies an existing execution in executive plan module Plan, and execute.
If corresponding type function is training function, executive plan shown in above-mentioned steps S12-S15 is executed;
If corresponding type function is anticipation function, the corresponding executive plan of process shown in following Fig. 4 is executed.
Fig. 4 shows the flow chart of the construction method of the machine learning model according to one embodiment of the disclosure.As shown in figure 4, The construction method of the machine learning model further include:
S17 obtains model table mark, pre- glyphomancy if the function name is mapped to anticipation function from the SQL statement Segment identification and prediction data table mark.
Model table mark can be used for identification model table, and model table mark can be model table name.As shown in Fig. 2, data Library server can identify according to model table and obtain corresponding model table from data memory module, from the model table, obtain The location information of machine learning model, loads machine learning model according to the positional information.
In one possible implementation, as shown in Fig. 2, the data that prediction data table is stored in database server are deposited It stores up in module.
S18 is identified in corresponding prediction data table from the prediction data table according to the prediction field identification and is extracted number According to as test data.
Database server can identify according to prediction data table and determine prediction data table, and predict in the prediction data table Data are extracted in the corresponding field of field identification, as test data.
S19 is identified in corresponding model table from the model table, obtains the location information of machine learning model, and according to The location information loads machine learning model.
The test data is inputted in loaded machine learning model, obtains prediction data by S20.
In one example, for SQL statement be select LogisticRegressionPrediction (' lr_ Model_t01', col1, col2, col3, ' id', ' pred01') from mltable, it is entitled that database server extracts function LogisticRegressionPrediction.Assuming that function name LogisticRegressionPrediction is mapped to prediction Function, then database server obtained from the SQL statement lr_model_t01 as model table identify, col1, col2 and Col3 is identified as prediction field identification, mltable as prediction data table.It is corresponding that hypothesized model table identifies lr_model_t01 In lr_model_t01 table, predict that field identification col1, col2 and col3 correspond respectively to col1, col2 and col3 field, in advance Measured data table identifies mltable and corresponds to mltable table, and database server can be using col1, col2 from mltable table Data are extracted with col3 field, as test data, the location information of machine learning model is obtained from lr_model_t01 table, And machine learning model is loaded according to the location information, test data is inputted in loaded machine learning model, is obtained pre- Measured data.
The embodiment of the present disclosure carries out the prediction in machine learning by way of SQL, and the mode compared to api interface is saved A large amount of programing work, improves the convenience and ease for use of machine learning.
In one possible implementation, SQL statement further includes association identification, and database server can also be from SQL Association identification is obtained in sentence.After obtaining prediction data, prediction result table is generated, in the prediction result table described in storage Prediction data, and the prediction result table and the affiliated prediction data table of the test data are closed by the association identification Connection.In this way, can establish the association between prediction data table and prediction result table, facilitates and subsequent machine learning model is commented Estimate, compare.
In one example, SQL statement be select LogisticRegressionPrediction (' lr_model_ T01', col1, col2, col3, ' id', ' pred01') from mltable, database server can be from prediction data table mark Know the prediction data table for being mltable and obtain test data, test data input machine learning model is obtained into prediction data, it will Prediction data is stored in prediction result table and is identified as in the prediction result table of pred01.The available id conduct of database server Association identification is identified as the prediction data table of mltable by id interaction prediction tables of data and prediction result table is identified as The prediction result table of pred01.
Fig. 5 shows the block diagram of the construction device of the machine learning model according to one embodiment of the disclosure.As shown in figure 5, should The construction device 500 of machine learning model includes executive plan module 501 and data memory module 502, the executive plan mould Block is used for:
Syntax parsing is carried out to received SQL statement, extracts function name;
If the function name is mapped to trained function, initial parameter, training field mark are obtained from the SQL statement It is identified with training data table;
Algorithm corresponding with the function name is obtained from Spark MLlib, and institute is initialized using the initial parameter Algorithm is stated, initial model is obtained;
It is identified according to the training field, the mark of the training data table described in the data memory module 502 is corresponding Data are extracted in training data table, as training data;
The initial model is trained using the training data, obtains machine learning corresponding with the function name Model.
In one possible implementation, the executive plan module 501 is also used to:
According to the mapping table of the function name query function name and type function, the corresponding function class of the function name is determined Type, the type function include training function and anticipation function, and the anticipation function is to be predicted using machine learning model Function.
In one possible implementation, the construction device 500 of the machine learning model further includes model memory module 503, the executive plan module 501 is also used to for the machine learning model to be stored in the model memory module 503 In HDFS file system.
In one possible implementation, the executive plan module 501 is also used to generate and the machine learning mould Type corresponding model table records the position for having the model memory module 503 to store the machine learning model in the model table Confidence breath and parameter information, the model table are stored in the data memory module 502.
In one possible implementation, if the executive plan module 501 be also used to the function name be mapped to it is pre- Function is surveyed, then obtains model table mark, prediction field identification and prediction data table mark from the SQL statement;
According to the prediction field identification, the mark of the prediction data table described in the data memory module 502 is corresponding Data are extracted in prediction data table, as test data;
The model table described in the data memory module 502 identifies in corresponding model table, obtains the model storage Module 503 stores the location information of the machine learning model, and loads machine learning model according to the positional information;
The test data is inputted in loaded machine learning model, obtains prediction data.
In one possible implementation, the SQL statement further includes association identification, the executive plan module 501 It is also used to generate prediction result table, the prediction data is stored in the prediction result table, and pass through the association identification pair The prediction result table and the affiliated prediction data table of the test data are associated, and the prediction result table is stored in the number According in memory module 502.
By carrying out syntax parsing to received SQL statement, function name is extracted, if the function name is mapped to trained letter Number then obtains initial parameter, training field mark and training data table mark, from Spark MLlib from the SQL statement Algorithm corresponding with the function name is obtained, and the algorithm is initialized using the initial parameter, obtains initial model;According to The training field mark, identifies in corresponding training data table from the training data table and extracts data, as training data; The initial model is trained using the training data, obtains machine learning model corresponding with the function name, root Algorithm can be called from Spark MLlib according to the construction device of the machine learning model of the embodiment of the present disclosure and carry out model instruction Practice, corresponding machine learning model is obtained by way of pure SQL, compared to carrying out machine learning by way of api interface, A large amount of programing work is eliminated, the convenience and ease for use of machine learning are improved.
Fig. 6 is a kind of block diagram of construction device 900 for machine learning model shown according to an exemplary embodiment. Referring to Fig. 6, which may include processor 901, the machine readable storage medium 902 for being stored with machine-executable instruction.Place Reason device 901 can be communicated with machine readable storage medium 902 via system bus 903.Also, processor 901 passes through read machine Machine-executable instruction corresponding with the construction logic of machine learning model is described above to execute in readable storage medium storing program for executing 902 The construction method of machine learning model.
Machine readable storage medium 902 referred to herein can be any electronics, magnetism, optics or other physical stores Device may include or store information, such as executable instruction, data, etc..For example, machine readable storage medium may is that RAM (Radom Access Memory, random access memory), volatile memory, nonvolatile memory, flash memory, storage are driven Dynamic device (such as hard disk drive), solid state hard disk, any kind of storage dish (such as CD, dvd) or similar storage are situated between Matter or their combination.
The presently disclosed embodiments is described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In the principle, practical application or technological improvement to the technology in market for best explaining each embodiment, or lead this technology Other those of ordinary skill in domain can understand each embodiment disclosed herein.

Claims (14)

1. a kind of construction method of machine learning model, which is characterized in that the described method includes:
Syntax parsing is carried out to received SQL statement, extracts function name;
If the function name is mapped to trained function, initial parameter, training field mark and instruction are obtained from the SQL statement Practice tables of data mark;
Algorithm corresponding with the function name is obtained from Spark MLlib, and the calculation is initialized using the initial parameter Method obtains initial model;
It is identified according to the training field, is identified in corresponding training data table from the training data table and extract data, as Training data;
The initial model is trained using the training data, obtains machine learning mould corresponding with the function name Type.
2. the method according to claim 1, wherein after the extraction function name, the method also includes:
According to the mapping table of the function name query function name and type function, the corresponding type function of the function name is determined, The type function includes training function and anticipation function.
3. according to the method described in claim 2, it is characterized in that, the method also includes:
The machine learning model is stored in HDFS file system.
4. according to the method described in claim 3, it is characterized in that, the method also includes:
Model table corresponding with the machine learning model is generated, record has the position of the machine learning model in the model table Confidence breath and parameter information.
5. according to the method described in claim 4, it is characterized in that, this method further include:
If the function name is mapped to anticipation function, from the SQL statement obtain model table mark, prediction field identification and Prediction data table mark;
According to the prediction field identification, is identified in corresponding prediction data table from the prediction data table and extract data, as Test data;
It is identified in corresponding model table from the model table, obtains location information, and load engineering according to the positional information Practise model;
The test data is inputted in loaded machine learning model, obtains prediction data.
6. according to the method described in claim 5, the method is also it is characterized in that, the SQL statement further includes association identification Include:
Prediction result table is generated, the prediction data is stored in the prediction result table, and by the association identification to institute It states prediction result table and the affiliated prediction data table of the test data is associated.
7. a kind of construction device of machine learning model, which is characterized in that described device includes that executive plan module and data are deposited Module is stored up, the executive plan module is used for:
Syntax parsing is carried out to received SQL statement, extracts function name;
If the function name is mapped to trained function, initial parameter, training field mark and instruction are obtained from the SQL statement Practice tables of data mark;
Algorithm corresponding with the function name is obtained from Spark MLlib, and the calculation is initialized using the initial parameter Method obtains initial model;
It is identified according to the training field, the training data table described in the data memory module identifies corresponding training data Data are extracted in table, as training data;
The initial model is trained using the training data, obtains machine learning mould corresponding with the function name Type.
8. device according to claim 7, which is characterized in that the executive plan module is also used to
According to the mapping table of the function name query function name and type function, the corresponding type function of the function name is determined, The type function includes training function and anticipation function.
9. device according to claim 8, which is characterized in that described device further includes model memory module, the execution Schedule module is also used to for the machine learning model being stored in the HDFS file system of the model memory module.
10. device according to claim 9, which is characterized in that the executive plan module is also used to generate and the machine The corresponding model table of device learning model, record has the model memory module to store the machine learning model in the model table Location information and parameter information, the model table is stored in the data memory module.
11. device according to claim 10, which is characterized in that if the executive plan module is also used to the function name It is mapped to anticipation function, then obtains model table mark, prediction field identification and prediction data table mark from the SQL statement;
According to the prediction field identification, the prediction data table described in the data memory module identifies corresponding prediction data Data are extracted in table, as test data;
The model table described in the data memory module identifies in corresponding model table, obtains the model memory module storage The location information of the machine learning model, and machine learning model is loaded according to the positional information;
The test data is inputted in loaded machine learning model, obtains prediction data.
12. device according to claim 11, which is characterized in that the SQL statement further includes association identification, the execution Schedule module is also used to generate prediction result table, the prediction data is stored in the prediction result table, and pass through the pass Connection mark is associated the prediction result table and the affiliated prediction data table of the test data, the prediction result table storage In the data memory module.
13. a kind of construction device of machine learning model characterized by comprising
Processor and machine readable storage medium, the machine readable storage medium are stored with machine-executable instruction, the place Reason device executes the machine-executable instruction to realize method described in any one of claim 1 to 6.
14. a kind of machine readable storage medium, which is characterized in that the machine readable storage medium is stored with the executable finger of machine It enables, for the machine-executable instruction when being called and being executed by processor, the machine-executable instruction promotes the processor Realize method described in any one of claim 1 to 6.
CN201810245188.4A 2018-03-23 2018-03-23 Construction of machine learning models Active CN109657803B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810245188.4A CN109657803B (en) 2018-03-23 2018-03-23 Construction of machine learning models
PCT/CN2019/078619 WO2019179408A1 (en) 2018-03-23 2019-03-19 Construction of machine learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810245188.4A CN109657803B (en) 2018-03-23 2018-03-23 Construction of machine learning models

Publications (2)

Publication Number Publication Date
CN109657803A true CN109657803A (en) 2019-04-19
CN109657803B CN109657803B (en) 2020-04-03

Family

ID=66110182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810245188.4A Active CN109657803B (en) 2018-03-23 2018-03-23 Construction of machine learning models

Country Status (2)

Country Link
CN (1) CN109657803B (en)
WO (1) WO2019179408A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851500A (en) * 2019-11-07 2020-02-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN111523676A (en) * 2020-04-17 2020-08-11 第四范式(北京)技术有限公司 Method and device for assisting machine learning model to be online
CN112559603A (en) * 2021-02-23 2021-03-26 腾讯科技(深圳)有限公司 Feature extraction method, device, equipment and computer-readable storage medium
CN114741372A (en) * 2022-03-24 2022-07-12 北京柏睿数据技术股份有限公司 Method for realizing in-library artificial intelligence and database system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106066934A (en) * 2016-05-27 2016-11-02 山东大学苏州研究院 A kind of Alzheimer based on Spark platform assistant diagnosis system in early days
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107222472A (en) * 2017-05-26 2017-09-29 电子科技大学 A kind of user behavior method for detecting abnormality under Hadoop clusters
CN107480435A (en) * 2017-07-31 2017-12-15 广东精点数据科技股份有限公司 A kind of automatic searching machine learning system and method applied to clinical data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10262263B2 (en) * 2015-12-11 2019-04-16 International Business Machines Corporation Retrieving database score contextual information
CN105912500B (en) * 2016-03-30 2017-11-14 百度在线网络技术(北京)有限公司 Machine learning model generation method and device
CN105930413A (en) * 2016-04-18 2016-09-07 北京百度网讯科技有限公司 Training method for similarity model parameters, search processing method and corresponding apparatuses
CN106295338B (en) * 2016-07-26 2020-04-14 北京工业大学 SQL vulnerability detection method based on artificial neuron network
CN107330522B (en) * 2017-07-04 2021-06-08 北京百度网讯科技有限公司 Method, device and system for updating deep learning model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106066934A (en) * 2016-05-27 2016-11-02 山东大学苏州研究院 A kind of Alzheimer based on Spark platform assistant diagnosis system in early days
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107222472A (en) * 2017-05-26 2017-09-29 电子科技大学 A kind of user behavior method for detecting abnormality under Hadoop clusters
CN107480435A (en) * 2017-07-31 2017-12-15 广东精点数据科技股份有限公司 A kind of automatic searching machine learning system and method applied to clinical data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
祝威廉: "《SQL脚本实现算法模型的训练,预测》", 13 January 2018 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851500A (en) * 2019-11-07 2020-02-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN110851500B (en) * 2019-11-07 2022-10-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN111523676A (en) * 2020-04-17 2020-08-11 第四范式(北京)技术有限公司 Method and device for assisting machine learning model to be online
CN111523676B (en) * 2020-04-17 2024-04-12 第四范式(北京)技术有限公司 Method and device for assisting machine learning model to be online
CN112559603A (en) * 2021-02-23 2021-03-26 腾讯科技(深圳)有限公司 Feature extraction method, device, equipment and computer-readable storage medium
CN114741372A (en) * 2022-03-24 2022-07-12 北京柏睿数据技术股份有限公司 Method for realizing in-library artificial intelligence and database system
WO2023178977A1 (en) * 2022-03-24 2023-09-28 北京柏睿数据技术股份有限公司 Method for implementing in-database artificial intelligence, and database system

Also Published As

Publication number Publication date
CN109657803B (en) 2020-04-03
WO2019179408A1 (en) 2019-09-26

Similar Documents

Publication Publication Date Title
US11941016B2 (en) Using specified performance attributes to configure machine learning pipepline stages for an ETL job
US9626623B2 (en) Method of automated discovery of new topics
CN109657803A (en) The building of machine learning model
US9916368B2 (en) Non-exclusionary search within in-memory databases
CN107491487A (en) A kind of full-text database framework and bitmap index establishment, data query method, server and medium
CN111597243B (en) Method and system for abstract data loading based on data warehouse
CN103970902A (en) Method and system for reliable and instant retrieval on situation of large quantities of data
CN106844369B (en) Objectification SQL sentence construction method and apparatus
US20130198117A1 (en) Systems and methods for semantic data integration
CN104969221B (en) Semi-structured data in formatted data base
CN110956271B (en) Multi-stage classification method and device for mass data
CN110321360A (en) The processing method and relevant device of list data
US20150355888A1 (en) Acquiring identification of an application lifecycle management entity associated with similar code
US9324036B1 (en) Framework for calculating grouped optimization algorithms within a distributed data store
CN111984659B (en) Data updating method, device, computer equipment and storage medium
US20180336235A1 (en) Reconciled data storage system
CN111090668B (en) Data retrieval method and device, electronic equipment and computer readable storage medium
KR102345410B1 (en) Big data intelligent collecting method and device
KR20210034547A (en) Multi-source type interoperability and/or information retrieval optimization
US11645283B2 (en) Predictive query processing
US11093509B2 (en) Data processing system for curating search result facets
US9959295B1 (en) S-expression based computation of lineage and change impact analysis
CN109376154B (en) Data reading and writing method and data reading and writing system
Papadakis et al. A hyper-box approach using relational databases for large scale machine learning
Fahrudin et al. Implementation of Big Data Analytics for Machine Learning Model Using Hadoop and Spark Environment on Resizing Iris Dataset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant