CN109657803B - Construction of machine learning models - Google Patents

Construction of machine learning models Download PDF

Info

Publication number
CN109657803B
CN109657803B CN201810245188.4A CN201810245188A CN109657803B CN 109657803 B CN109657803 B CN 109657803B CN 201810245188 A CN201810245188 A CN 201810245188A CN 109657803 B CN109657803 B CN 109657803B
Authority
CN
China
Prior art keywords
model
machine learning
data
training
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810245188.4A
Other languages
Chinese (zh)
Other versions
CN109657803A (en
Inventor
丁远普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN201810245188.4A priority Critical patent/CN109657803B/en
Priority to PCT/CN2019/078619 priority patent/WO2019179408A1/en
Publication of CN109657803A publication Critical patent/CN109657803A/en
Application granted granted Critical
Publication of CN109657803B publication Critical patent/CN109657803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to the construction of machine learning models, including parsing a received SQL statement to extract a function name; if the function name is mapped to a training function, acquiring initial parameters, training field identifications and training data table identifications from the SQL statement; acquiring an algorithm corresponding to the function name from Spark MLlib, and initializing the algorithm by adopting the initial parameters to obtain an initial model; extracting data from a training data table corresponding to the training data table identification according to the training field identification to serve as training data; the initial model is trained by adopting the training data to obtain a machine learning model corresponding to the function name, and the method and the device for constructing the machine learning model can improve the convenience and the usability of machine learning.

Description

Construction of machine learning models
Technical Field
The disclosure relates to the technical field of databases, in particular to a method and a device for constructing a machine learning model.
Background
Spark is a distributed computing framework that provides a comprehensive, unified framework for managing the need for big data processing with data sets and data sources of different nature (text data, chart data, etc.) (either batch data or real-time streaming data).
Spark SQL is a Spark-based distributed SQL (Structured Query Language) Query engine, and Spark SQL can be used to Query, count and analyze a huge data set.
Spark MLlib (Machine Learning library) is a Spark-based Machine Learning library, and is composed of some general Learning algorithms and tools, including classification, regression, clustering, collaborative filtering, dimension reduction, and the like. Meanwhile, the Spark MLlib further includes a bottom layer optimization primitive and a high layer pipeline API (Application programming interface), provides APIs in languages of Scala, Python, Java, and the like, and can perform model training and prediction through the API.
Disclosure of Invention
In view of this, the present disclosure provides a method and an apparatus for constructing a machine learning model, which can improve convenience and usability of machine learning.
According to an aspect of the present disclosure, there is provided a method for constructing a machine learning model, including: analyzing the syntax of the received SQL statement, and extracting a function name; if the function name is mapped to a training function, acquiring initial parameters, training field identifications and training data table identifications from the SQL statement; acquiring an algorithm corresponding to the function name from Spark MLlib, and initializing the algorithm by adopting the initial parameters to obtain an initial model; extracting data from a training data table corresponding to the training data table identification according to the training field identification to serve as training data; and training the initial model by adopting the training data to obtain a machine learning model corresponding to the function name.
According to another aspect of the present disclosure, a device for constructing a machine learning model is provided, which includes an execution planning module and a data storage module, wherein the execution planning module is configured to perform syntax parsing on a received SQL statement and extract a function name; if the function name is mapped to a training function, acquiring initial parameters, training field identifications and training data table identifications from the SQL statement; acquiring an algorithm corresponding to the function name from Spark MLlib, and initializing the algorithm by adopting the initial parameters to obtain an initial model; extracting data from a training data table corresponding to the training data table identification in the data storage module according to the training field identification to serve as training data; and training the initial model by adopting the training data to obtain a machine learning model corresponding to the function name.
According to another aspect of the present disclosure, there is provided a machine learning model building apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to perform the above method.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the above-described method.
According to the method and the device for constructing the machine learning model, the algorithm can be called from Spark MLlib, model training is carried out, the corresponding machine learning model is obtained in a pure SQL mode, compared with machine learning carried out in an API (application program interface) mode, a large amount of programming work is omitted, and convenience and usability of machine learning are improved.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a flowchart of a method of constructing a machine learning model according to an embodiment of the present disclosure.
Fig. 2 shows an architectural diagram of a database server according to an embodiment of the present disclosure.
Fig. 3 shows a flowchart of a method of constructing a machine learning model according to an embodiment of the present disclosure.
Fig. 4 shows a flowchart of a method of constructing a machine learning model according to an embodiment of the present disclosure.
Fig. 5 shows a block diagram of a machine learning model building apparatus according to an embodiment of the present disclosure.
Fig. 6 shows a block diagram of a machine learning model building apparatus according to an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Fig. 1 shows a flowchart of a method of constructing a machine learning model according to an embodiment of the present disclosure. The method can be executed by a database server, as shown in fig. 1, and the method for constructing the machine learning model includes:
s11, parses the received SQL statement, and extracts the function name.
SQL refers to a structured query language, a database query and programming language, used to access database systems. The access operation to the database may include: the access operation can be realized through SQL statements on the addition, deletion, reading, modification and the like of data. The SQL statement is a descriptive language that specifies the access tasks, and the database server needs to specify an execution plan from the SQL statement that indicates how to complete the access tasks.
In one possible implementation, the database server may receive the SQL statement from the client, and the client may be deployed on the database server or on another server, which is not limited in this disclosure. In one example, the way the client retrieves the SQL statement may be that the client retrieves the SQL statement in an input box.
The database server is a server established on the basis of a database system, and can be composed of one or more servers operating in a local area network and database management system management software, and the database server can provide data services for clients. In the embodiment of the disclosure, the database server has the SQL statement parsing capability, and can divide the SQL statement into statement blocks, determine the execution sequence, and form an execution plan. In one example, the database server may deploy a SparkSQL module that can perform parsing of SQL statements, query, count, and analyze large data sets. Fig. 2 shows an architectural diagram of a database server according to an embodiment of the present disclosure. As shown in fig. 2, the database server includes a model storage module, a data storage module, an execution plan module, and the like.
In a possible implementation manner, the database server may first preprocess the SQL statement to obtain a standard SQL statement, and then perform syntax parsing on the standard SQL statement to extract the function name. In one example, preprocessing may include eliminating a space before and after the SQL statement, replacing consecutive blank characters in the SQL statement (including a space, TAB, and carriage return line feed) with a single space, unifying the case of the SQL statement (changing the SQL statement all in lower case or upper case), adding an end symbol "endsql" after the end of the SQL statement, and so on.
In the process of parsing the SQL statement by the database server, the SQL statement can be split, the meaning of each part can be determined, and the function name can be extracted by the database server in the process.
Machine learning functions may be used to represent functions employed in machine learning. If the function name extracted from the SQL statement is mapped to a machine learning function, it indicates that machine learning is going to be performed. In one example, the machine learning function may be a custom function.
In one possible implementation, the machine learning function includes both types of training functions and prediction functions. Wherein the related commands for training the machine learning model are accomplished using a training function; the prediction function is used to complete the relevant commands that are predicted using the existing machine learning model. It should be noted that the training function and the prediction function are only an example of the machine learning function, and the machine learning function may also be other functions that may be used in the machine learning process, which is not limited to this disclosure.
Because different types of machine learning functions, the obtained data and the processing process of the data have different results, the database server can generate different execution plans based on the types of the machine learning functions.
In a possible implementation manner, the database server stores mapping information of function types and function names of the machine learning functions, and the database server may search the corresponding function types of the machine learning functions according to the function names proposed from the SQL statements to specify corresponding execution plans, for example, if the function is a training function, an execution plan for training a machine learning model is specified; if the prediction function is the prediction function, an execution plan for performing prediction by using the existing machine learning model is specified.
The execution plan corresponding to the training function may be as described in steps S12-S15.
S12, if the function name is mapped to the training function, acquiring initial parameters, training field identification and training data table identification from the SQL statement.
The training function is a function of training a machine learning model. The training function represents a class of functions, and the training function may include a plurality of functions for training different machine learning models, and these functions may be added, modified, and deleted as needed, which is not limited by this disclosure.
The initial parameters may be used to represent model parameters used in the model initialization process, and the initial parameters may be set as needed, which is not limited in this disclosure.
As shown in FIG. 2, the data storage module stores a training data table, and data in the training data table can be used for training the machine learning model. The training data table may be identified by the training data table identification. The training fields are corresponding fields of data of the training machine learning model in a training data table, and the training fields can be identified through training field identification.
S13, obtaining an algorithm corresponding to the function name from Spark MLlib, and initializing the algorithm by using the initial parameters to obtain an initial model.
In a possible implementation manner, the database server stores a correspondence between a function name of a training function and an algorithm path (the algorithm path may be used to represent a call location of a certain algorithm in the Spark MLlib, for example, a location of a class to which the algorithm belongs), and the database server may determine the algorithm path corresponding to the function name by searching the correspondence, and acquire the algorithm corresponding to the function name from the Spark MLlib according to the algorithm path.
And S14, extracting data from the training data table corresponding to the training data table identification according to the training field identification to serve as training data.
The database server may determine the training data table according to the training data table identifier, and extract data from a field corresponding to the training field identifier in the training data table as training data.
And S15, training the initial model by adopting the training data to obtain a machine learning model corresponding to the function name.
In one possible implementation, the database server may determine whether to train the initial model supervised or unsupervised depending on the type of algorithm.
In one example, for the SQL statement select Logistic Regulation ('lr _ model _ t01', label, col1, col2, col3, '-MaxIter 20') from mltable, the database server extraction function name is Logistic Regulation. Assuming that a function name Logistic regression is mapped to a training function, the database server acquires-MaxIter 20 from the SQL statement as an initial parameter, label, col1, col2 and col3 as training field identifications, and mltable as a training data table identification. The database server may obtain an algorithm corresponding to the function name LogisticRegression from Spark MLlib. Assuming that the training field identifiers, namely label, col1, col2 and col3 correspond to label, col1, col2 and col3 fields respectively, the training data table identifier mltable corresponds to mltable, the function name Logistic regression corresponds to Logistic regression algorithm, the database server can initialize the Logistic regression algorithm by using an initial parameter-MaxIter 20 to obtain an initial model, data is extracted from the label, col1, col2 and col3 fields of the mltable and used as training data, and the initial model is trained by using the training data to obtain a machine learning model corresponding to the function name Logistic regression.
The machine learning is carried out in the SQL mode, and compared with the machine learning carried out in the API interface mode, a large amount of programming work is omitted, and convenience and usability of the machine learning are improved.
In addition, JDBC (Java DataBase Connectivity) is a Java API for executing SQL statements, which provides uniform access to multiple relational databases, and is composed of a set of classes and interfaces written in the Java language. JDBC provides a benchmark by which more advanced tools and interfaces can be built to enable database developers to write database applications.
sparkSQL can be called through a JDBC standard interface, and after the machine learning process is SQL-based according to the method for constructing the machine learning model disclosed by the embodiment of the disclosure, the sparkSQL can be called through the JDBC standard interface, so that the standardization degree is improved.
In one possible implementation, the database server may store the machine learning model obtained through training on the HDFS file system. The HDFS (Distributed File System) is a Distributed File System suitable for operating on general-purpose hardware, has high fault tolerance and throughput, and is suitable for application on a large-scale data set. Because the model file is large, the model file can be stored on the HDFS, and the data server can directly call the machine learning model from the HDFS file system. As shown in fig. 2, the HDFS file system may be deployed in a model storage module.
In one possible implementation manner, the database server may generate a model table corresponding to the machine learning model, and the model table records position information and parameter information of the machine learning model. The location information may be used to indicate a storage location of the machine learning model in the HDFS. The database server can rapidly call the machine learning model according to the position information, so that the process of searching for the matching in huge data of an HDFS file system is avoided, and the calling speed of the machine learning model is improved. The parametric information may be used to represent configuration variables within the model, which may define the functionality of the model. Such as weights in artificial neural networks, support vectors in support vector machines, coefficients in linear or logistic regression, K values in K-means algorithms, etc. The database server may manage the machine learning model based on the parameter information. In one example, as shown in FIG. 2, the model table may be stored in a data storage module.
In a possible implementation manner, if the function name is mapped to a training function, the database server may further obtain a model table identifier from an SQL statement, generate a model table corresponding to the model table identifier, and record the location information and the parameter information of the machine learning model in the model table corresponding to the model table identifier. In one example, for a SQL statement select logistic regression ('lr _ model _ t01', label, col1, col2, col3, '-MaxIter 20') from mltable, the database server may generate a model table identified as lr _ model _ t01 and record the location information and parameter information of the machine learning model in the model table identified as lr _ model _ t 01.
Fig. 3 shows a flowchart of a method of constructing a machine learning model according to an embodiment of the present disclosure. As shown in fig. 3, after extracting the function name, the method for constructing the machine learning model further includes:
s16, according to the function name, inquiring a mapping table of the function name and the function type, and determining the function type corresponding to the function name, wherein the function type comprises a training function and a prediction function.
If the extracted function name is not found, the database server appoints an existing execution plan in the execution plan module and executes the existing execution plan.
If the corresponding function type is a training function, executing the execution plan shown in the above steps S12-S15;
if the corresponding function type is a prediction function, an execution plan corresponding to the flow shown in fig. 4 below is executed.
Fig. 4 shows a flowchart of a method of constructing a machine learning model according to an embodiment of the present disclosure. As shown in fig. 4, the method for constructing a machine learning model further includes:
s17, if the function name is mapped to the prediction function, obtaining model table identification, prediction field identification and prediction data table identification from the SQL statement.
The model table identification may be used to identify the model table, and the model table identification may be a model table name. As shown in fig. 2, the database server may obtain a corresponding model table from the data storage module according to the model table identifier, obtain the location information of the machine learning model from the model table, and load the machine learning model according to the location information.
In one possible implementation, as shown in fig. 2, the prediction data table is stored in a data storage module of the database server.
And S18, extracting data from the prediction data table corresponding to the prediction data table identification according to the prediction field identification to be used as test data.
The database server can determine the prediction data table according to the prediction data table identification, and extract data from the field corresponding to the prediction field identification in the prediction data table as test data.
And S19, acquiring the position information of the machine learning model from the model table corresponding to the model table identification, and loading the machine learning model according to the position information.
And S20, inputting the test data into the loaded machine learning model to obtain prediction data.
In one example, for a SQL statement that is a select Logistic RegistrationPrediction ('lr _ model _ t01', col1, col2, col3, 'id', 'pred01') from mltable, the database server extraction function name is Logistic RegistrationPrediction. Assuming that the function name LogisticRegressionPrediction is mapped to a prediction function, the database server acquires lr _ model _ t01 from the SQL statement as a model table identifier, col1, col2 and col3 as prediction field identifiers, and mltable as a prediction data table identifier. Assuming that the model table identifier lr _ model _ t01 corresponds to lr _ model _ t01 table, the prediction field identifiers col1, col2 and col3 correspond to col1, col2 and col3 fields respectively, the prediction data table identifier mltable corresponds to mltable, the database server may extract data from the col1, col2 and col3 fields of the mltable as test data, obtain the location information of the machine learning model from the lr _ model _ t01 table, load the machine learning model according to the location information, and input the test data into the loaded machine learning model to obtain the prediction data.
According to the method and the device, prediction in machine learning is carried out in an SQL mode, compared with an API (application program interface) mode, a large amount of programming work is omitted, and convenience and usability of machine learning are improved.
In a possible implementation manner, the SQL statement further includes an association identifier, and the database server may further obtain the association identifier from the SQL statement. And after the prediction data are obtained, generating a prediction result table, storing the prediction data in the prediction result table, and associating the prediction result table with the prediction data table to which the test data belong through the association identifier. Therefore, the association between the prediction data table and the prediction result table can be established, and the subsequent evaluation, comparison and the like of the machine learning model are facilitated.
In one example, the SQL statement is select logistic regression prediction ('lr _ model _ t01', col1, col2, col3, 'id', 'pred01') from mltable, and the database server may obtain test data from a prediction data table identified as mltable in the prediction data table, input the test data into a machine learning model to obtain prediction data, and store the prediction data in a prediction result table identified as pred 01. The database server may obtain an id as an association identifier by which the prediction data table identified as mltable and the prediction result table identified as pred01 are associated.
Fig. 5 shows a block diagram of a machine learning model building apparatus according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 for constructing a machine learning model includes an execution planning module 501 and a data storage module 502, and the execution planning module is configured to:
analyzing the syntax of the received SQL statement, and extracting a function name;
if the function name is mapped to a training function, acquiring initial parameters, training field identifications and training data table identifications from the SQL statement;
acquiring an algorithm corresponding to the function name from Spark MLlib, and initializing the algorithm by adopting the initial parameters to obtain an initial model;
extracting data from a training data table corresponding to the training data table identifier in the data storage module 502 according to the training field identifier to serve as training data;
and training the initial model by adopting the training data to obtain a machine learning model corresponding to the function name.
In one possible implementation, the execution planning module 501 is further configured to:
and inquiring a mapping table of the function name and the function type according to the function name, and determining the function type corresponding to the function name, wherein the function type comprises a training function and a prediction function, and the prediction function is a function predicted by adopting a machine learning model.
In a possible implementation manner, the machine learning model constructing apparatus 500 further includes a model storage module 503, and the execution planning module 501 is further configured to store the machine learning model on the HDFS file system of the model storage module 503.
In a possible implementation manner, the execution planning module 501 is further configured to generate a model table corresponding to the machine learning model, where the model table records location information and parameter information of the machine learning model stored by the model storage module 503, and the model table is stored in the data storage module 502.
In a possible implementation manner, the execution planning module 501 is further configured to obtain a model table identifier, a prediction field identifier, and a prediction data table identifier from the SQL statement if the function name is mapped to a prediction function;
extracting data from a prediction data table corresponding to the prediction data table identifier in the data storage module 502 according to the prediction field identifier, and using the data as test data;
obtaining the position information of the machine learning model stored in the model storage module 503 from the model table corresponding to the model table identifier in the data storage module 502, and loading the machine learning model according to the position information;
and inputting the test data into the loaded machine learning model to obtain prediction data.
In a possible implementation manner, the SQL statement further includes an association identifier, and the execution planning module 501 is further configured to generate a prediction result table, store the prediction data in the prediction result table, and associate the prediction result table with a prediction data table to which the test data belongs through the association identifier, where the prediction result table is stored in the data storage module 502.
The method comprises the steps of performing syntax analysis on a received SQL statement, extracting a function name, if the function name is mapped to a training function, obtaining initial parameters, training field identifications and training data table identifications from the SQL statement, obtaining an algorithm corresponding to the function name from Spark MLlib, and initializing the algorithm by adopting the initial parameters to obtain an initial model; extracting data from a training data table corresponding to the training data table identification according to the training field identification to serve as training data; the initial model is trained by adopting the training data to obtain the machine learning model corresponding to the function name, the construction device of the machine learning model according to the embodiment of the disclosure can call an algorithm from Spark MLlib and perform model training, and the corresponding machine learning model is obtained in a pure SQL mode.
Fig. 6 is a block diagram illustrating an apparatus 900 for building a machine learning model according to an example embodiment. Referring to fig. 6, the apparatus 900 may include a processor 901, a machine-readable storage medium 902 having stored thereon machine-executable instructions. The processor 901 and the machine-readable storage medium 902 may communicate via a system bus 903. Also, the processor 901 performs the above-described method of constructing a machine learning model by reading machine-executable instructions in the machine-readable storage medium 902 corresponding to the construction logic of the machine learning model.
The machine-readable storage medium 902 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (12)

1. A method for constructing a machine learning model, which is applied to spark SQL, and comprises the following steps:
analyzing the syntax of the received SQL statement, and extracting a function name;
inquiring a mapping table of the function name and the function type according to the function name, and determining the function type corresponding to the function name, wherein the function type comprises a training function and a prediction function;
if the function name is mapped to a training function, acquiring initial parameters, training field identifications and training data table identifications from the SQL statement;
acquiring an algorithm corresponding to the function name from Spark MLlib, and initializing the algorithm by adopting the initial parameters to obtain an initial model;
extracting data from a training data table corresponding to the training data table identification according to the training field identification to serve as training data;
and training the initial model by adopting the training data to obtain a machine learning model corresponding to the function name.
2. The method of claim 1, further comprising:
storing the machine learning model on an HDFS file system.
3. The method of claim 2, further comprising:
and generating a model table corresponding to the machine learning model, wherein the model table records the position information and the parameter information of the machine learning model.
4. The method of claim 3, further comprising:
if the function name is mapped to a prediction function, acquiring a model table identifier, a prediction field identifier and a prediction data table identifier from the SQL statement;
extracting data from a prediction data table corresponding to the prediction data table identification according to the prediction field identification to serve as test data;
acquiring position information from the model table corresponding to the model table identification, and loading a machine learning model according to the position information;
and inputting the test data into the loaded machine learning model to obtain prediction data.
5. The method of claim 4, wherein the SQL statement further includes an association identifier, the method further comprising:
and generating a prediction result table, storing the prediction data in the prediction result table, and associating the prediction result table with the prediction data table to which the test data belongs through the association identifier.
6. The device for constructing the machine learning model is applied to spark sql, and comprises an execution planning module and a data storage module, wherein the execution planning module is used for:
analyzing the syntax of the received SQL statement, and extracting a function name;
inquiring a mapping table of the function name and the function type according to the function name, and determining the function type corresponding to the function name, wherein the function type comprises a training function and a prediction function;
if the function name is mapped to a training function, acquiring initial parameters, training field identifications and training data table identifications from the SQL statement;
acquiring an algorithm corresponding to the function name from Spark MLlib, and initializing the algorithm by adopting the initial parameters to obtain an initial model;
extracting data from a training data table corresponding to the training data table identification in the data storage module according to the training field identification to serve as training data;
and training the initial model by adopting the training data to obtain a machine learning model corresponding to the function name.
7. The apparatus of claim 6, further comprising a model storage module, the execution planning module further to store the machine learning model on an HDFS file system of the model storage module.
8. The apparatus of claim 7, wherein the execution planning module is further configured to generate a model table corresponding to the machine learning model, the model table having recorded therein location information and parameter information of the machine learning model stored in the model storage module, the model table being stored in the data storage module.
9. The apparatus of claim 8, wherein the execution planning module is further configured to obtain a model table identifier, a prediction field identifier, and a prediction data table identifier from the SQL statement if the function name maps to a prediction function;
extracting data from a prediction data table corresponding to the prediction data table identification in the data storage module according to the prediction field identification to serve as test data;
obtaining the position information of the machine learning model stored in the model storage module from the model table corresponding to the model table identification in the data storage module, and loading the machine learning model according to the position information;
and inputting the test data into the loaded machine learning model to obtain prediction data.
10. The apparatus according to claim 9, wherein the SQL statement further includes an association identifier, and the execution planning module is further configured to generate a prediction result table, store the prediction data in the prediction result table, and associate the prediction result table with a prediction data table to which the test data belongs through the association identifier, and the prediction result table is stored in the data storage module.
11. An apparatus for constructing a machine learning model, comprising:
a processor and a machine-readable storage medium having stored thereon machine-executable instructions, the processor executing the machine-executable instructions to implement the method of any one of claims 1 to 5.
12. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of any one of claims 1 to 5.
CN201810245188.4A 2018-03-23 2018-03-23 Construction of machine learning models Active CN109657803B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810245188.4A CN109657803B (en) 2018-03-23 2018-03-23 Construction of machine learning models
PCT/CN2019/078619 WO2019179408A1 (en) 2018-03-23 2019-03-19 Construction of machine learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810245188.4A CN109657803B (en) 2018-03-23 2018-03-23 Construction of machine learning models

Publications (2)

Publication Number Publication Date
CN109657803A CN109657803A (en) 2019-04-19
CN109657803B true CN109657803B (en) 2020-04-03

Family

ID=66110182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810245188.4A Active CN109657803B (en) 2018-03-23 2018-03-23 Construction of machine learning models

Country Status (2)

Country Link
CN (1) CN109657803B (en)
WO (1) WO2019179408A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851500B (en) * 2019-11-07 2022-10-28 北京集奥聚合科技有限公司 Method for generating expert characteristic dimension required by machine learning modeling
CN111523676B (en) * 2020-04-17 2024-04-12 第四范式(北京)技术有限公司 Method and device for assisting machine learning model to be online
CN112559603B (en) * 2021-02-23 2021-05-18 腾讯科技(深圳)有限公司 Feature extraction method, device, equipment and computer-readable storage medium
CN114741372B (en) * 2022-03-24 2022-11-15 北京柏睿数据技术股份有限公司 Method for realizing in-library artificial intelligence and database system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106066934A (en) * 2016-05-27 2016-11-02 山东大学苏州研究院 A kind of Alzheimer based on Spark platform assistant diagnosis system in early days
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107222472A (en) * 2017-05-26 2017-09-29 电子科技大学 A kind of user behavior method for detecting abnormality under Hadoop clusters
CN107480435A (en) * 2017-07-31 2017-12-15 广东精点数据科技股份有限公司 A kind of automatic searching machine learning system and method applied to clinical data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10262263B2 (en) * 2015-12-11 2019-04-16 International Business Machines Corporation Retrieving database score contextual information
CN105912500B (en) * 2016-03-30 2017-11-14 百度在线网络技术(北京)有限公司 Machine learning model generation method and device
CN105930413A (en) * 2016-04-18 2016-09-07 北京百度网讯科技有限公司 Training method for similarity model parameters, search processing method and corresponding apparatuses
CN106295338B (en) * 2016-07-26 2020-04-14 北京工业大学 SQL vulnerability detection method based on artificial neuron network
CN107330522B (en) * 2017-07-04 2021-06-08 北京百度网讯科技有限公司 Method, device and system for updating deep learning model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106066934A (en) * 2016-05-27 2016-11-02 山东大学苏州研究院 A kind of Alzheimer based on Spark platform assistant diagnosis system in early days
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107222472A (en) * 2017-05-26 2017-09-29 电子科技大学 A kind of user behavior method for detecting abnormality under Hadoop clusters
CN107480435A (en) * 2017-07-31 2017-12-15 广东精点数据科技股份有限公司 A kind of automatic searching machine learning system and method applied to clinical data

Also Published As

Publication number Publication date
WO2019179408A1 (en) 2019-09-26
CN109657803A (en) 2019-04-19

Similar Documents

Publication Publication Date Title
CN112199366B (en) Data table processing method, device and equipment
CN109657803B (en) Construction of machine learning models
US10515002B2 (en) Utilizing artificial intelligence to test cloud applications
US11474817B2 (en) Provenance-based reuse of software code
US11269822B2 (en) Generation of automated data migration model
CN111459985A (en) Identification information processing method and device
CN110287192B (en) Search application data processing method and device, computer equipment and storage medium
CN112699055B (en) Software automatic test method and system with lower maintenance cost
CN109710220B (en) Relational database query method, relational database query device, relational database query equipment and storage medium
CN111984659B (en) Data updating method, device, computer equipment and storage medium
US7805462B2 (en) Portfolio management methods, systems, and computer programs
CN113821251A (en) Code optimization method, device, equipment and storage medium based on artificial intelligence
CN112181951B (en) Heterogeneous database data migration method, device and equipment
US11501177B2 (en) Knowledge engineering and reasoning on a knowledge graph
CN117592450A (en) Panoramic archive generation method and system based on employee information integration
CN112069269A (en) Big data and multidimensional feature-based data tracing method and big data cloud server
CN107430633A (en) The representative content through related optimization being associated to data-storage system
CN112182413B (en) Intelligent recommendation method and server based on big teaching data
CN112051987A (en) Service data processing method, device and equipment, and program generating method and device
CN106055625B (en) Method and device for executing service
CN111143582A (en) Multimedia resource recommendation method and device for updating associative words in real time through double indexes
CN111222833A (en) Algorithm configuration combination platform based on data lake server
CN115052035B (en) Message pushing method, device and storage medium
US20180246956A1 (en) Systematic iterative analysis of unstructured data files
CN113382090B (en) Data sharing method and system based on heterogeneous data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant