WO2022011946A1 - Data prediction method, apparatus, computer device, and storage medium - Google Patents

Data prediction method, apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2022011946A1
WO2022011946A1 PCT/CN2020/135601 CN2020135601W WO2022011946A1 WO 2022011946 A1 WO2022011946 A1 WO 2022011946A1 CN 2020135601 W CN2020135601 W CN 2020135601W WO 2022011946 A1 WO2022011946 A1 WO 2022011946A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
model
modeling
prediction
server
Prior art date
Application number
PCT/CN2020/135601
Other languages
French (fr)
Chinese (zh)
Inventor
于沃良
麻晓珍
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022011946A1 publication Critical patent/WO2022011946A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a data prediction method, apparatus, computer equipment and storage medium.
  • the purpose of the embodiments of the present application is to provide a data prediction method, device, computer equipment and storage medium, so as to solve the problems in the prior art that the establishment of a data mining model is complex and the mining efficiency of the established data mining model is low.
  • the embodiments of the present application provide a data prediction method, which adopts the following technical embodiments:
  • a data prediction method comprising the following steps:
  • Receive a data prediction request determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in the data processing server, wherein the full data table is associated by at least two initial data tables form;
  • a data prediction model file is generated based on the prediction resource and the data mining model, and sent to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table from all the The data storage server obtains the characteristic value of the corresponding predicted input feature and inputs it into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
  • the generation process of the data mining model includes:
  • the corresponding modeling resources are configured in the server, and the model framework to be trained is determined from the model server according to the model algorithm information, and the modeling input features and modeling target variables are extracted based on the training data table;
  • the modeling resource is used to train the model through the model framework to be trained, the modeling input feature and the modeling target variable, and the data mining model is generated.
  • the embodiments of the present application also provide a data prediction device, which adopts the following technical embodiments:
  • a data prediction device comprising: a data prediction information acquisition module, a prediction configuration module, a data prediction module and a model generation module;
  • the data prediction information acquisition module is configured to receive a data prediction request, determine model information and first user information according to the data prediction request, and acquire a prediction data table from a full data table in the data processing server, wherein the full data A table is formed by associating at least two initial data tables;
  • the prediction configuration module is configured to obtain the data mining model pre-generated by the model generation module from the model server according to the model information, and configure corresponding prediction resources in the model server according to the first user information;
  • the data prediction module is configured to generate a data prediction model file based on the prediction resource and the data mining model, and send it to at least one data storage server to run the data mining model on the data storage server, according to
  • the prediction data table obtains the eigenvalues of the corresponding prediction input features from the data storage server and inputs them into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
  • the model generation module is specifically configured to receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain a training data table required for modeling from the full data table, according to The second user information configures corresponding modeling resources in the model server, determines the model framework to be trained from the model server according to the model algorithm information, and extracts modeling input based on the training data table.
  • Model features and modeling target variables and based on the modeling resources, perform model training through the model framework to be trained, the modeling input features, and the modeling target variables to generate the data mining model.
  • the embodiments of the present application also provide a computer device, which adopts the following technical embodiments:
  • a computer device comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and the processor implements the following steps when executing the computer-readable instructions:
  • Receive a data prediction request determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in the data processing server, wherein the full data table is associated by at least two initial data tables form;
  • a data prediction model file is generated based on the prediction resource and the data mining model, and sent to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table from all the The data storage server obtains the characteristic value of the corresponding predicted input feature and inputs it into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
  • the generation process of the data mining model includes:
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical embodiments:
  • a computer-readable storage medium where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the processor is caused to perform the following steps:
  • Receive a data prediction request determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in the data processing server, wherein the full data table is associated by at least two initial data tables form;
  • a data prediction model file is generated based on the prediction resource and the data mining model, and sent to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table from all the The data storage server obtains the characteristic value of the corresponding predicted input feature and inputs it into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
  • the generation process of the data mining model includes:
  • the corresponding modeling resources are configured in the server, and the model framework to be trained is determined from the model server according to the model algorithm information, and the modeling input features and modeling target variables are extracted based on the training data table;
  • the modeling resource is used to train the model through the model framework to be trained, the modeling input feature and the modeling target variable, and the data mining model is generated.
  • the data prediction method, device, computer equipment and storage medium mainly have the following beneficial effects:
  • one-click modeling can be realized according to the user's modeling request.
  • the training data table required for modeling is obtained from the full data table in the data processing server through the modeling request, and the model algorithm information and user information are determined.
  • the modeling input features, modeling target variables and the model framework to be trained are determined.
  • configure the corresponding modeling resources based on the configured modeling resources, through the model framework to be trained, modeling input features and modeling
  • the target variable is used for model training to generate a data mining model.
  • the user does not need to have a detailed understanding of the model algorithm, which greatly reduces the training threshold of the data mining model.
  • one-click deployment data prediction of the model can be realized according to the user's data prediction request, and the prediction data table is obtained from the full data table in the data processing server according to the data prediction request, and the model information and user information are determined, and then Determine the data mining model and forecasting resources, generate a data forecasting model file based on the configured forecasting resources and data mining model, and send the data forecasting model file to at least one data storage server, run the data mining model on the data storage server, and realize
  • the data prediction can well ensure the security of the data and prevent the leakage problem caused by the data transmission, and this embodiment is performed in a state that the user does not feel it, and the user experience is better.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a data prediction method according to the present application.
  • FIG. 3 is a flowchart of an embodiment of a process for generating a data mining model according to the present application
  • Fig. 4 is a specific example of the generation process of the data mining model according to the present application.
  • Fig. 5 is a specific example of the data prediction method according to the present application.
  • FIG. 6 is a schematic structural diagram of an embodiment of a data prediction apparatus according to the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
  • the terminal devices 101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4
  • the server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .
  • the data prediction method provided by the embodiments of the present application is generally executed by a server, and accordingly, the data mining model generating apparatus and the data prediction apparatus are generally set in the server.
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • FIG. 2 shows a flowchart of an embodiment of a data prediction method according to the present application, the data prediction method comprising the following steps:
  • S201 Receive a data prediction request, determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in a data processing server, wherein the full data table consists of at least two initial data tables Table association is formed;
  • S203 Generate a data prediction model file based on the prediction resource and the data mining model, and send it to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table
  • the characteristic value of the corresponding predicted input feature is obtained from the data storage server and input into the data mining model, the data value of the target variable to be predicted is obtained, and the data prediction is completed.
  • a user can initiate a data prediction request through a WEB page of the client, and the WEB server receives the data prediction request.
  • the data prediction request may include the first user information and model information of the client.
  • the first user information includes the user name information of the request initiator and the user name information of the prediction data storage end, etc.
  • the prediction data storage end is a Hadoop cluster
  • the first user information will include the user name (HDuser) in the Hadoop cluster; model
  • the information can be selected and generated by the user from a plurality of preset model options on the data prediction request initiating interface of the client.
  • the data prediction method includes: when it is determined that the user enters the data prediction request initiating interface, in the The data prediction request initiates the interface or pops up a new interface to provide a model selection box for the user to select the model required for data prediction to generate model information.
  • the model performance parameters of each model are displayed at the same time, so that the user can select an appropriate model according to actual needs.
  • a BI system runs in the data processing server, and a full-scale data table can be generated through the BI system.
  • the BI system obtains data from multiple data sources, analyzes the obtained data, and analyzes the obtained data according to different data sources. Generate multiple initial data tables from sources or different topics, and then associate and integrate multiple initial data tables to generate full data tables, and obtain the field content that can support data analysis and the content that needs to be predicted, and the obtained field content is available.
  • the content to be predicted refers to the target variable to be predicted in the data prediction process. Depending on the target variable to be predicted, the corresponding selected prediction input features are also different.
  • a new data table is created by selecting the content of the field used as the prediction input feature by the BI system, that is, the prediction data table. Therefore, the data table obtained from the full data table in this embodiment is non-full data
  • the table may specifically be a hive table with no upper limit, and the model server will read data according to the non-full data table newly created by the BI system when performing data prediction.
  • the model server can separately perform data prediction for the data prediction requests submitted by multiple users. Therefore, it is necessary to allocate corresponding data prediction resources to the data prediction process of each user, so as to realize the multi-user data prediction process. Synchronous processing improves data prediction efficiency.
  • a user can initiate a modeling request through a WEB page of the client, and the WEB server receives the modeling request, and the modeling request can include the second user information of the client and model algorithm information.
  • the second user information includes the user name information of the request initiator and the user name information of the training data storage end.
  • the training data storage end is a Hadoop cluster
  • the second user information will include the user name (HDuser) in the Hadoop cluster; the model algorithm
  • the information can be edited and generated by the user on the client, or generated by the user selecting one or more algorithms from a plurality of preset algorithm options on the modeling request initiation interface of the user.
  • the method includes when it is determined that the user is When the client enters the modeling request initiation interface, an algorithm selection box or edit box is provided on the modeling request initiation interface or a new interface pops up on the client, so that the user can determine the model algorithm required for modeling to generate model algorithm information.
  • a BI (Business Intelligence, business intelligence) system runs in the data processing server, and a full data table is generated through the BI system.
  • the BI system acquires data from multiple data sources, and analyzes the acquired data , generate multiple initial data tables according to different data sources or different topics, and then associate and integrate multiple initial data tables to generate a full data table, and obtain the field content that can support data analysis and the content that needs to be predicted.
  • the content of the field can be used as a follow-up modeling entry feature.
  • the content to be predicted refers to the modeling target variable used in the modeling process.
  • multiple modeling entry features and modeling target variables form a corresponding relationship. , correspondingly, according to the different modeling target variables, the corresponding modeling input features are also different.
  • a new data table is created by selecting the field content used for modeling and entering the model through the BI system, that is, a training data table. Therefore, in this embodiment, the training data table obtained from the full data table is not
  • the full data table can be a hive table with an upper limit.
  • the upper limit of the data in the hive table in this embodiment is 300,000, and the model server will perform training according to the non-full data table newly created by the BI system when performing model training. data read.
  • the model server may separately perform model training for the modeling requests submitted by multiple users, and realize the multi-user model training by allocating corresponding modeling resources to the model training of each user. Synchronous processing improves model training efficiency.
  • the configuring the corresponding modeling resources in the model server according to the second user information includes: acquiring data from a database corresponding to the model server with the second user according to a preset time interval information corresponding to the information of the modeling task to be executed, and generate a modeling resource configuration request; query whether the idle resources of the model server meet the needs of model training according to the modeling resource configuration request, and if so, the obtained pending Perform the modeling task to allocate the corresponding modeling resources, otherwise reject the current modeling resource configuration request.
  • the modeling task to be executed in the database corresponding to the model server is re-acquired after a preset time interval, so as to execute the process of configuring modeling resources.
  • the database corresponding to the model server adopts a relational database management system, which can store modeling task information.
  • each modeling task will Queued and stored in the database corresponding to the model server, so as to be executed by the model server in sequence.
  • configuring the modeling resources described in this embodiment includes creating a separate container for each modeling task, and the model training process is performed in the corresponding container, so that the model training processes of multiple model tasks can be isolated from each other.
  • the model server specifically uses Kubernetes to create and manage containers.
  • Kubernetes can be used to manage containerized applications on multiple hosts, making the deployment of containerized applications simple and efficient.
  • Kubernetes provides application deployment, planning, and updating.
  • the maintenance mechanism the core feature is the ability to manage containers autonomously to ensure that containers run in accordance with the user's desired state. In Kubernetes, all containers run in Pods, and a Pod can host one or more related containers.
  • the querying according to the modeling resource configuration request whether the idle resources of the model server meet the requirements of model training, and if so, allocating corresponding modeling resources to the acquired modeling tasks to be executed is specifically: Send a request to create a Pod to the Kubernetes Master according to the second user information corresponding to the modeling task to be executed. If the model server has available resources, and the available resources meet the needs of model training, then according to the second user information Create a corresponding directory in the model server, create a Pod, and generate the IP and Port corresponding to the Pod, where the IP and Port are used to perform model training calls, and Kubernetes Pods are used to allocate independent modeling resources for each modeling task. , and start the Docker (container) service associated with the created directory to complete the container creation and configuration of modeling resources.
  • the training data table contains field contents used for modeling in-mould features, and the modeling in-mould features correspond to modeling target variables, so that the modeling in-mould features and modeling targets can be determined
  • the model algorithm information contains the identification information of the model algorithm required for model training, so that the model server can determine the required model algorithm according to the identification information, so as to obtain the model frame with training.
  • the method further includes: performing authentication and signature verification on the information contained in the modeling request, and if passed, generating a modeling request with a unique identifier task, and determine whether there is a modeling task submitted by the same user in the database corresponding to the model server, if so, terminate the generated modeling task, otherwise the generated modeling task will be stored in the model server corresponding to in the database, and send the generated unique identification of the modeling task to the user.
  • the authentication and signature verification is to pre-distribute the token and key to the user, when receiving the modeling request, query the corresponding key according to the token in the request, and use the key + parameters to calculate the md5 signature information, and calculate Check whether the result is consistent with the signature in the request to ensure that the modeling request is legitimate.
  • step S303 when performing model training, specifically access the data storage server according to the training data table to query the training data, obtain the eigenvalues of the modeling-in-model feature and the numerical value of the corresponding modeling target variable, and store the eigenvalues of the modeling-in-model feature Input the model framework for training, determine whether the training requirements are met by comparing the output results of the model framework with the values of the modeling target variables, and stop training when the training requirements are met, output the model performance indicators, and send the model generated to the user. information.
  • the model server may specifically be an artificial intelligence server (Artificial Intelligence Server, AI Server).
  • AI Server Artificial Intelligence Server
  • the user does not need to perform operations, and the AI Server can perform training and hyperparameters for the specified model algorithm. After adjustment, the optimal model is finally trained, which lowers the threshold for using machine learning.
  • AI Server will analyze each modeling input feature, such as the statistics of the mean, variance, maximum, minimum, and overall data distribution, and calculate the model based on these statistics.
  • the training level for different levels, select different parameter configurations to realize hyperparameter adjustment.
  • the data storage server may be deployed in the form of a Hadoop cluster.
  • accessing the data storage server according to the training data table is specifically accessing the Hadoop cluster to query the training data. After the training data is queried, the training data will be stored in the Hadoop cluster. It is sent from the Hadoop cluster to the model server, and after the model training is completed, the training data is deleted from the model server.
  • model performance index output after the model training is completed may be stored in a database corresponding to the model server for query by the BI system on the data processing server side.
  • the database corresponding to the model server can also be used to record the running status information of the model task, including whether the model task is executed, the model training status information after the model task is executed, and the data mining model obtained after the model training is completed.
  • the performance index parameter is convenient for the data processing server side (such as the BI system) to monitor the running status of the modeling task through the database corresponding to the model server, and at the same time, it is convenient to query the performance of the data mining model from the database corresponding to the model server. index parameter.
  • the generation process of the data mining model in this embodiment further includes receiving a request for regularly querying the status of the modeling task, and accessing the model server to query the training status of the model according to the request for querying the status of the modeling task, wherein the query The training status can be updated to the database corresponding to the model server.
  • the BI system is run with the data processing server below, the user sends a modeling request through the WEB interface, the model server is an AI Server, the AI Server uses Kubernetes services, and the database (DB, Data Base) corresponding to the AI Server A relational database management system (MySQL) is used, and the data storage server is a Hadoop cluster as an example.
  • the generation process of the data mining model is described through a complete specific example, and the specific process is as follows:
  • the user logs in to the BI system through the user terminal, obtains the training data table containing the modeling input features from the full data table through the BI system, determines the modeling target variables and model algorithm, and generates a modeling request based on these contents; the modeling request passes through The WEB interface is submitted to the AI Server; the AI Server performs authentication and signature verification on the information contained in the modeling request, and judges whether there is a modeling task of the same user in the AI DB (the database corresponding to the AI Server), and creates it if not.
  • the AI Server performs authentication and signature verification on the information contained in the modeling request, and judges whether there is a modeling task of the same user in the AI DB (the database corresponding to the AI Server), and creates it if not.
  • For modeling tasks generate the unique identifier of the modeling task and store it in the AI DB, otherwise the modeling task will not be created; after the modeling task is created, the unique identifier of the modeling task is fed back to the user; after the modeling task is created, the user
  • the status of the rotation training modeling task is uniquely identified according to the modeling task, so that the AI Server periodically triggers the operation of reading the modeling task and related information (such as the user name UM of the client and the user name HDuser of the Hadoop cluster) from the AI DB.
  • HDuser initiates a request to create a Pod to the Kubernetes Master; if the AI Server has no available resources, the request to create a Pod will be rejected.
  • AI Server sends the training data table to the Hadoop cluster, and queries data from the Hadoop cluster according to the training data table, and the Hadoop cluster feeds back the queried data set to the AI Server for model training.
  • the model training status is queried regularly, and the model training status is updated to the AI DB synchronously.
  • the model training indicators are obtained, the data set in the AI Server is deleted, and the model training indicators are Feedback to AI DB.
  • the configuring the corresponding prediction resource in the model server according to the first user information includes: acquiring the first user in the database corresponding to the model server according to a preset time interval.
  • the prediction task allocates the corresponding prediction resources, otherwise the prediction resource configuration request is rejected.
  • the to-be-executed data prediction task in the database corresponding to the model server is re-acquired, so as to execute the process of configuring the prediction resource.
  • the database corresponding to the model server adopts a relational database management system, which can store data prediction task information.
  • each data prediction task will Queued and stored in the database corresponding to the model server, so as to be executed by the model server in sequence.
  • the data prediction method when acquiring the data mining model from the model server according to the model information, the data prediction method further includes synchronizing the status information of whether the data mining model is acquired to a database corresponding to the model server middle.
  • the prediction resource configuration described in this embodiment includes creating a separate container for each data prediction task, and subsequent data prediction model files are generated in the corresponding container, which can realize the mutual interaction between the generation processes of multiple data prediction model files. isolate.
  • the model server uses Kubernetes to create and manage containers. Specifically, according to the prediction resource configuration request, query whether the idle resources of the model server meet the data prediction requirements.
  • the allocation of the corresponding prediction resources for the data prediction task to be executed is specifically: sending a request to create a Pod to the Kubernetes Master according to the first user information corresponding to the data prediction task to be executed, if the model server has available resources, and the available resources satisfy the data Predicting the needs of the model file generation, then create a corresponding directory in the model server according to the first user information, and create a Pod, and generate the IP and Port corresponding to the Pod, wherein the IP and Port are used for performing data prediction calls,
  • the Pod implementation of Kubernetes allocates independent prediction resources for each data prediction task, and starts the Docker service associated with the created directory to complete container creation and configuration of prediction resources.
  • the method further includes: performing authentication and signature verification on the information contained in the data prediction request, and if passed, generating a unique identifier and determine whether there is a data prediction task of the same user in the database corresponding to the model server, if so, terminate the generated data prediction task, otherwise, store the generated data prediction task in the model In the database corresponding to the server, the generated unique identifier of the data prediction task is sent to the user.
  • the model server may be an AI Server, which stores multiple trained data prediction models for invocation, and the data storage server runs a Hadoop cluster and a Spark cluster, so
  • the above data prediction model file is a model file that can be directly run on the Spark cluster.
  • a Pyspark script is generated according to the prediction data table, the determined data prediction model and its operation configuration information, and the Pyspark script is the data prediction model file, wherein the operation configuration
  • the information includes the environment files that the data prediction model depends on when running and the HDFS path stored in the Hadoop cluster.
  • the Pyspark script is submitted to the Spark cluster through the Knox+Livy service, and Spark distributed resources are used for data processing.
  • Prediction where Knox is a gateway, it is used to verify whether the current UM has permission to use HDuser, when making predictions on tens of millions of data, upload Pyspark files to HDFS through Knox+webHDFS service, and Knox +Livy service submits Spark tasks.
  • the Spark cluster in the data prediction process, since the prediction data table contains the field content used for predicting the mold-in feature, the Spark cluster will read the data from the Hadoop cluster according to the prediction data table, and obtain the feature value of the predicted mold-in feature. , input the eigenvalues of the predicted input features into the data prediction model, and output the model results to the specified table to complete the distributed data prediction task.
  • the automatically generated Pyspark file is sent to the Spark cluster, so that the entire data prediction processing is performed on the Hadoop cluster, which can process massive data, and the data value of the target variable obtained from the data prediction is directly stored in the Hadoop cluster. , which can prevent data export, thus ensuring data security and avoiding data security problems.
  • the database corresponding to the model server can also be used to record the running status information of the data prediction task, including whether the data prediction task is executed, so that the data processing server (such as the BI system) can pass the model server.
  • the corresponding database monitors the running status of the data prediction task.
  • the data prediction method further includes receiving a request for regularly querying the data prediction task status, and accessing the model server to query the operation status of the data prediction model according to the request for querying the prediction task status. It can be updated to the database corresponding to the model server.
  • the BI system is run with the data processing server below, the user sends a modeling request through the WEB interface, the model server is an AI Server, the AI Server uses Kubernetes services, and the database (DB, Data Base) corresponding to the AI Server
  • DB Data Base
  • MySQL relational database management system
  • the data storage server is a Hadoop cluster
  • the data mining model is run through a Spark cluster
  • the data transmission between the AI Server, the Hadoop cluster and the Spark cluster is implemented through Knox+webHDFS+Livy as an example, combined with Figure 5 illustrates the data prediction method provided by the application through a specific example, and the specific process is as follows:
  • the user logs in to the BI system through the user terminal, obtains the prediction data table containing the prediction model characteristics from the full data table through the BI system, determines the target variables and model information to be predicted, and generates a data prediction request based on these contents; the data prediction request passes through The WEB interface is submitted to the AI Server; the AI Server performs authentication and signature verification on the information contained in the data prediction request, and determines whether there is a data prediction task for the same user in the AI DB (the database corresponding to the AI Server), and if not, it will be created.
  • the AI Server performs authentication and signature verification on the information contained in the data prediction request, and determines whether there is a data prediction task for the same user in the AI DB (the database corresponding to the AI Server), and if not, it will be created.
  • the data prediction task For the data prediction task, generate the unique identifier of the data prediction task and store it in the AI DB, otherwise the data prediction task will not be created; after the data prediction task is created, the unique identifier of the data prediction task is fed back to the client; after the data prediction task is created, the client According to the data prediction task, the status of the rotation training data prediction task is uniquely identified, so that the AI Server periodically triggers the operation of reading the data prediction task and related information (such as the user name UM of the client and the user name HDuser of the Hadoop cluster) from the AI DB.
  • HDuser initiates a request to create a Pod to the Kubernetes Master; if the AI Server has no available resources, the request to create a Pod will be rejected.
  • a Pod will be created, the IP and Port corresponding to the Pod will be generated, and then the AI Server will be accessed through the relevant interface to obtain it.
  • the data mining model selected by the user, and the data prediction model file (Pyspark script) is generated in the AI Server; then the Pyspark file is uploaded to HDFS through the Knox+webHDFS service, and the Knox+Livy service submits the data prediction model file to Spark Run in the cluster, send the prediction data table to the Hadoop cluster at runtime, query data from the Hadoop cluster according to the prediction data table, and perform data prediction in the Hadoop cluster based on the queried data set.
  • the Livy service regularly queries the status of the data prediction task, and synchronously updates the data prediction task status to the AI DB.
  • the prediction result is stored in the Hadoop cluster to end the data prediction.
  • one-click modeling can be realized according to the modeling request of the user.
  • the training data table required for modeling is obtained from the full data table in the data processing server through the modeling request, And determine the model algorithm information and the second user information, and then automatically obtain the modeling input features, modeling target variables and the model framework to be trained, and configure the corresponding modeling resources, based on the configured modeling resources, through the to-be-trained modeling resources.
  • Model framework, modeling input features and modeling target variables are used for model training to generate a data mining model. This embodiment does not require a detailed understanding of the model algorithm, which greatly reduces the training threshold of the data mining model.
  • one-click deployment data prediction of the model can be realized, and the prediction data table can be obtained from the full data table in the data processing server according to the data prediction request.
  • Determine the model information and the first user information and then determine the data mining model and prediction resources, generate a data prediction model file based on the configured prediction resources and data mining model, and send the data prediction model file to at least one data storage server, where in the The data mining model is run on the data storage server to realize data prediction.
  • spark can be used to directly use cluster resources to process the massive data existing in Hadoop in large batches, so that the entire processing process is carried out in the cluster, which can be very efficient. The security of the data is ensured and the leakage problem caused by the data transmission is prevented, and this embodiment is performed in a state where the user does not feel it, and the user experience is better.
  • the privacy information in the data obtained during the data mining model generation and data prediction process in the above embodiment can be stored in the nodes of the blockchain.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • the aforementioned computer-readable storage medium may be a non-volatile storage medium, or a volatile storage medium, such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), etc. Non-volatile storage media, or random access memory (Random Access Memory, RAM), etc.
  • the present application provides an embodiment of a data prediction apparatus, and the apparatus embodiment corresponds to the data prediction method embodiment shown in FIG. Specifically, the data prediction apparatus can be applied to various electronic devices.
  • the data prediction apparatus described in this embodiment includes: a data prediction information acquisition module 601 , a prediction configuration module 602 , a data prediction module 603 and a model generation module 604 .
  • the data prediction information obtaining module 601 is configured to receive a data prediction request, determine the model information and the first user information according to the data prediction request, and obtain the prediction data table from the full data table in the data processing server, wherein the The full data table is formed by associating at least two initial data tables; the prediction configuration module 602 is configured to obtain the data mining model pre-generated by the model generation module 604 from the model server according to the model information, and according to the model information The first user information configures corresponding prediction resources in the model server; the data prediction module 603 is configured to generate a data prediction model file based on the prediction resources and the data mining model, and send it to at least one data storage server , so as to run the data mining model on the data storage server, obtain the characteristic value of the corresponding prediction model feature from the data storage server according to the prediction data table and input it into the data mining model to obtain the target to be predicted.
  • the model generation module 604 is specifically configured to receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain it from the full data table.
  • the prediction configuration module 602 when configuring the corresponding prediction resource in the model server according to the first user information, is specifically configured to: acquire the database corresponding to the model server according to a preset time interval The information of the data prediction task to be executed corresponding to the first user information, generate a prediction resource configuration request; query whether the idle resources of the model server meet the data prediction requirements according to the prediction resource configuration request, and if so, obtain The to-be-executed data prediction task allocates corresponding prediction resources, otherwise the prediction resource configuration request is rejected.
  • the prediction configuration module 602 is further configured to, after the data prediction request is received, perform authentication and signature verification on the information contained in the data prediction request, and if passed, generate data with a unique identifier prediction task, and determine whether there is a data prediction task of the same user in the database corresponding to the model server, if so, terminate the generated data prediction task, otherwise, store the generated data prediction task in the model server corresponding to in the database, and send the generated unique identifier of the data prediction task to the user.
  • the model generation module 604 when configuring the corresponding modeling resources in the model server according to the second user information, is specifically configured to: obtain the corresponding modeling resources of the model server according to a preset time interval The information of the to-be-executed modeling task corresponding to the second user information in the database of the second user is generated, and a modeling resource configuration request is generated; according to the modeling resource configuration request, it is queried whether the idle resources of the model server meet the needs of model training, If satisfied, assign corresponding modeling resources to the acquired modeling task to be executed, otherwise reject the current modeling resource configuration request.
  • the model generation module 604 is further configured to perform authentication and signature verification on the information contained in the modeling request after receiving the modeling request, and if passed, generate a model with a unique identifier. model task, and judge whether there is a modeling task submitted by the same user in the database corresponding to the model server, if so, terminate the generated modeling task, otherwise, store the generated modeling task in the model server in the corresponding database, and send the generated unique identifier of the modeling task to the user.
  • the technical content involved in performing the relevant operations by the data prediction information acquisition module 601 , the prediction configuration module 602 , the data prediction module 603 and the model generation module 604 may refer to the above-mentioned embodiments of the data prediction method.
  • the related content is not expanded here, and the data prediction apparatus provided by the present application has the beneficial effects corresponding to the embodiments of the above data prediction method.
  • FIG. 7 is a basic structural block diagram of the computer device in this embodiment.
  • the computer device 7 includes a memory 71 , a processor 72 , and a network interface 73 that communicate with each other through a system bus.
  • computer-readable instructions are stored in the memory 71, and when the processor 72 executes the computer-readable instructions, it implements the steps of the data prediction method described in the above method embodiments, and has the same data prediction method as the above-mentioned data prediction method.
  • the beneficial effects corresponding to the method are not expanded here.
  • the computer device 7 having the memory 71, the processor 72, and the network interface 73 is shown in the figure, but it should be understood that it is not required to implement all the shown components, and more or more components may be implemented instead. Fewer components.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 71 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 71 may be an internal storage unit of the computer device 7 , such as a hard disk or a memory of the computer device 7 .
  • the memory 71 may also be an external storage device of the computer device 7, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 71 may also include both the internal storage unit of the computer device 7 and its external storage device.
  • the memory 71 is generally used to store the operating system and various application software installed on the computer device 7 , such as computer-readable instructions corresponding to the above-mentioned data prediction method.
  • the memory 71 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 72 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. This processor 72 is typically used to control the overall operation of the computer device 7 . In this embodiment, the processor 72 is configured to execute computer-readable instructions stored in the memory 71 or process data, for example, execute computer-readable instructions corresponding to the above-mentioned data prediction method.
  • CPU Central Processing Unit
  • controller central processing unit
  • microcontroller a microcontroller
  • microprocessor microprocessor
  • This processor 72 is typically used to control the overall operation of the computer device 7 .
  • the processor 72 is configured to execute computer-readable instructions stored in the memory 71 or process data, for example, execute computer-readable instructions corresponding to the above-mentioned data prediction method.
  • the network interface 73 may include a wireless network interface or a wired network interface, and the network interface 73 is generally used to establish a communication connection between the computer device 7 and other electronic devices.
  • the present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the above-mentioned data prediction method, and has beneficial effects corresponding to the above-mentioned data prediction method, which is not expanded here.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical embodiments of the present application can be embodied in the form of software products that are essentially or contribute to the prior art.
  • the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, etc. , CD-ROM), including several computer-readable instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Fuzzy Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A data prediction method, an apparatus, a computer device, and a storage medium, relating to the field of artificial intelligence, and the method comprising: on the basis of a data prediction request, determining model information and first user information, and acquiring a prediction data table; on the basis of the model data, acquiring a pre-generated data mining model from a model server, and on the basis of the first user information, allocating a corresponding prediction resource in the model server; and on the basis of the prediction resource and the data mining model, generating a prediction model file and sending same to a data storage server, in order to operate the data mining model on the data storage server, and on the basis of the prediction data table, acquiring corresponding data and inputting same into the data mining model to perform data prediction. In addition the method further relates to blockchain technology, and private information in the data acquired in the data mining model generation and data prediction processes may be stored in a blockchain. The present method is able to implement one-key generation of a data mining model and one-key data prediction deployment.

Description

一种数据预测方法、装置、计算机设备及存储介质A data prediction method, device, computer equipment and storage medium
本申请要求于2020年10月23日提交中国专利局、申请号为202011148696.4,发明名称为“一种数据预测方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on October 23, 2020 with the application number 202011148696.4 and the title of the invention is "A data prediction method, device, computer equipment and storage medium", the entire content of which is approved by Reference is incorporated in this application.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种数据预测方法、装置、计算机设备及存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to a data prediction method, apparatus, computer equipment and storage medium.
背景技术Background technique
随着科技的发展,人工智能已融入到生活中的方方面面,各行业借用人工智能对海量数据进行数据挖掘。发明人发现在数据挖掘过程中,一方面建模过程过于专业和复杂,训练一个可用有效的模型需要经过数据预处理、模型选择、模型效果改进等流程,对于非专业建模人员来说会遇到很大障碍,另一方面业务知识门槛高,专业建模人员对业务的理解不够导致建立的模型挖掘效率低。With the development of science and technology, artificial intelligence has been integrated into all aspects of life, and various industries use artificial intelligence to mine massive data. The inventor found that in the process of data mining, on the one hand, the modeling process is too professional and complicated, and training a usable and effective model requires data preprocessing, model selection, model effect improvement and other processes, which is difficult for non-professional modelers. On the other hand, the threshold of business knowledge is high, and professional modelers have insufficient understanding of the business, resulting in low efficiency of model mining.
发明内容SUMMARY OF THE INVENTION
本申请实施例的目的在于提出一种数据预测方法、装置、计算机设备及存储介质,以解决现有技术中建立数据挖掘模型复杂、建立的数据挖掘模型挖掘效率低的问题。The purpose of the embodiments of the present application is to provide a data prediction method, device, computer equipment and storage medium, so as to solve the problems in the prior art that the establishment of a data mining model is complex and the mining efficiency of the established data mining model is low.
为了解决上述技术问题,本申请实施例提供一种数据预测方法,采用了如下所述的技术实施例:In order to solve the above technical problems, the embodiments of the present application provide a data prediction method, which adopts the following technical embodiments:
一种数据预测方法,包括下述步骤:A data prediction method, comprising the following steps:
接收数据预测请求,根据所述数据预测请求确定模型信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;Receive a data prediction request, determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in the data processing server, wherein the full data table is associated by at least two initial data tables form;
根据所述模型信息从模型服务器中获取预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;Acquire a pre-generated data mining model from a model server according to the model information, and configure corresponding prediction resources in the model server according to the first user information;
基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到待预测的目标变量的数据值,完成数据预测;A data prediction model file is generated based on the prediction resource and the data mining model, and sent to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table from all the The data storage server obtains the characteristic value of the corresponding predicted input feature and inputs it into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
其中,所述数据挖掘模型的生成过程包括:Wherein, the generation process of the data mining model includes:
接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表;根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量;基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。Receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain a training data table required for modeling from the full data table; The corresponding modeling resources are configured in the server, and the model framework to be trained is determined from the model server according to the model algorithm information, and the modeling input features and modeling target variables are extracted based on the training data table; The modeling resource is used to train the model through the model framework to be trained, the modeling input feature and the modeling target variable, and the data mining model is generated.
为了解决上述技术问题,本申请实施例还提供一种数据预测装置,采用了如下所述的技术实施例:In order to solve the above technical problems, the embodiments of the present application also provide a data prediction device, which adopts the following technical embodiments:
一种数据预测装置,包括:数据预测信息获取模块、预测配置模块、数据预测模块和模型生成模块;A data prediction device, comprising: a data prediction information acquisition module, a prediction configuration module, a data prediction module and a model generation module;
所述数据预测信息获取模块用于接收数据预测请求,根据所述数据预测请求确定模型 信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;The data prediction information acquisition module is configured to receive a data prediction request, determine model information and first user information according to the data prediction request, and acquire a prediction data table from a full data table in the data processing server, wherein the full data A table is formed by associating at least two initial data tables;
所述预测配置模块用于根据所述模型信息从模型服务器中获取通过所述模型生成模块预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;The prediction configuration module is configured to obtain the data mining model pre-generated by the model generation module from the model server according to the model information, and configure corresponding prediction resources in the model server according to the first user information;
所述数据预测模块用于基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到待预测的目标变量的数据值,完成数据预测;The data prediction module is configured to generate a data prediction model file based on the prediction resource and the data mining model, and send it to at least one data storage server to run the data mining model on the data storage server, according to The prediction data table obtains the eigenvalues of the corresponding prediction input features from the data storage server and inputs them into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
其中,所述模型生成模块具体用于接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表,根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量,基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。Wherein, the model generation module is specifically configured to receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain a training data table required for modeling from the full data table, according to The second user information configures corresponding modeling resources in the model server, determines the model framework to be trained from the model server according to the model algorithm information, and extracts modeling input based on the training data table. Model features and modeling target variables, and based on the modeling resources, perform model training through the model framework to be trained, the modeling input features, and the modeling target variables to generate the data mining model.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术实施例:In order to solve the above technical problems, the embodiments of the present application also provide a computer device, which adopts the following technical embodiments:
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下的步骤:A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and the processor implements the following steps when executing the computer-readable instructions:
接收数据预测请求,根据所述数据预测请求确定模型信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;Receive a data prediction request, determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in the data processing server, wherein the full data table is associated by at least two initial data tables form;
根据所述模型信息从模型服务器中获取预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;Acquire a pre-generated data mining model from a model server according to the model information, and configure corresponding prediction resources in the model server according to the first user information;
基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到待预测的目标变量的数据值,完成数据预测;A data prediction model file is generated based on the prediction resource and the data mining model, and sent to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table from all the The data storage server obtains the characteristic value of the corresponding predicted input feature and inputs it into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
其中,所述数据挖掘模型的生成过程包括:Wherein, the generation process of the data mining model includes:
接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表;根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量;基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术实施例:Receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain a training data table required for modeling from the full data table; The corresponding modeling resources are configured in the server, and the model framework to be trained is determined from the model server according to the model algorithm information, and the modeling input features and modeling target variables are extracted based on the training data table; The modeling resource is used to train the model through the model framework to be trained, the modeling input feature and the modeling target variable, and the data mining model is generated. In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical embodiments:
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:A computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the processor is caused to perform the following steps:
接收数据预测请求,根据所述数据预测请求确定模型信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;Receive a data prediction request, determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in the data processing server, wherein the full data table is associated by at least two initial data tables form;
根据所述模型信息从模型服务器中获取预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;Acquire a pre-generated data mining model from a model server according to the model information, and configure corresponding prediction resources in the model server according to the first user information;
基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到 待预测的目标变量的数据值,完成数据预测;A data prediction model file is generated based on the prediction resource and the data mining model, and sent to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table from all the The data storage server obtains the characteristic value of the corresponding predicted input feature and inputs it into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
其中,所述数据挖掘模型的生成过程包括:Wherein, the generation process of the data mining model includes:
接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表;根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量;基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。Receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain a training data table required for modeling from the full data table; The corresponding modeling resources are configured in the server, and the model framework to be trained is determined from the model server according to the model algorithm information, and the modeling input features and modeling target variables are extracted based on the training data table; The modeling resource is used to train the model through the model framework to be trained, the modeling input feature and the modeling target variable, and the data mining model is generated.
与现有技术相比,本申请实施例提供的数据预测方法、装置、计算机设备及存储介质主要有以下有益效果:Compared with the prior art, the data prediction method, device, computer equipment and storage medium provided by the embodiments of the present application mainly have the following beneficial effects:
一方面,根据用户的建模请求可以实现一键式建模,具体通过建模请求从数据处理服务器中的全量数据表获取建模所需的训练数据表,并确定模型算法信息和用户信息,进而自动获得建模入模特征、建模目标变量以及待训练的模型框架,并配置对应的建模资源,基于配置的建模资源,通过待训练的模型框架、建模入模特征和建模目标变量进行模型训练,生成数据挖掘模型,建模过程中用户不需要对模型算法进行详细了解,极大地降低数据挖掘模型的训练门槛,只需要根据用户提供的数据即可实现数据挖掘模型无感训练;另一方面,根据用户的数据预测请求可实现模型的一键式部署数据预测,根据数据预测请求从数据处理服务器中的全量数据表获取预测数据表,并确定模型信息和用户信息,进而确定数据挖掘模型和预测资源,基于配置的预测资源和数据挖掘模型生成数据预测模型文件,并将数据预测模型文件发送至至少一个数据存储服务器,在所述数据存储服务器上运行数据挖掘模型,实现数据预测,能很好地保证数据的安全,防止数据传输导致的泄漏问题,且本实施例在用户无感的状态下进行,用户体验更好。On the one hand, one-click modeling can be realized according to the user's modeling request. Specifically, the training data table required for modeling is obtained from the full data table in the data processing server through the modeling request, and the model algorithm information and user information are determined. Then automatically obtain the modeling input features, modeling target variables and the model framework to be trained, and configure the corresponding modeling resources, based on the configured modeling resources, through the model framework to be trained, modeling input features and modeling The target variable is used for model training to generate a data mining model. During the modeling process, the user does not need to have a detailed understanding of the model algorithm, which greatly reduces the training threshold of the data mining model. Only the data provided by the user can realize the data mining model without feeling On the other hand, one-click deployment data prediction of the model can be realized according to the user's data prediction request, and the prediction data table is obtained from the full data table in the data processing server according to the data prediction request, and the model information and user information are determined, and then Determine the data mining model and forecasting resources, generate a data forecasting model file based on the configured forecasting resources and data mining model, and send the data forecasting model file to at least one data storage server, run the data mining model on the data storage server, and realize The data prediction can well ensure the security of the data and prevent the leakage problem caused by the data transmission, and this embodiment is performed in a state that the user does not feel it, and the user experience is better.
附图说明Description of drawings
为了更清楚地说明本申请中的实施例,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,下面描述中的附图对应于本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments of the present application. The drawings in the following description correspond to some embodiments of the present application. As far as technical personnel are concerned, other drawings can also be obtained based on these drawings without any creative effort.
图1是本申请可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是根据本申请的数据预测方法的一个实施例的流程图;FIG. 2 is a flowchart of an embodiment of a data prediction method according to the present application;
图3是根据本申请的数据挖掘模型的生成过程的一个实施例的流程图;3 is a flowchart of an embodiment of a process for generating a data mining model according to the present application;
图4是根据本申请的数据挖掘模型的生成过程的一个具体示例;Fig. 4 is a specific example of the generation process of the data mining model according to the present application;
图5是根据本申请的数据预测方法的一个具体示例;Fig. 5 is a specific example of the data prediction method according to the present application;
图6是根据本申请的数据预测装置的一个实施例的结构示意图;6 is a schematic structural diagram of an embodiment of a data prediction apparatus according to the present application;
图7是根据本申请的计算机设备的一个实施例的结构示意图。FIG. 7 is a schematic structural diagram of an embodiment of a computer device according to the present application.
具体实施方式detailed description
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请实施例,下面将结合附图,对本申请实施例中的技术实施例进行清楚、完整地描述。In order to make those skilled in the art better understand the embodiments of the present application, the technical embodiments in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .
需要说明的是,本申请实施例所提供的数据预测方法一般由服务器执行,相应地,数据挖掘模型生成装置和数据预测装置一般设置于服务器中。It should be noted that the data prediction method provided by the embodiments of the present application is generally executed by a server, and accordingly, the data mining model generating apparatus and the data prediction apparatus are generally set in the server.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
继续参考图2,其示出了根据本申请的数据预测方法的一个实施例的流程图,所述数据预测方法包括以下步骤:Continue to refer to FIG. 2 , which shows a flowchart of an embodiment of a data prediction method according to the present application, the data prediction method comprising the following steps:
S201,接收数据预测请求,根据所述数据预测请求确定模型信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;S201: Receive a data prediction request, determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in a data processing server, wherein the full data table consists of at least two initial data tables Table association is formed;
S202,根据所述模型信息从模型服务器中获取预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;S202, obtaining a pre-generated data mining model from a model server according to the model information, and configuring corresponding prediction resources in the model server according to the first user information;
S203,基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到待预测的目标变量的数据值,完成数据预测。S203: Generate a data prediction model file based on the prediction resource and the data mining model, and send it to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table The characteristic value of the corresponding predicted input feature is obtained from the data storage server and input into the data mining model, the data value of the target variable to be predicted is obtained, and the data prediction is completed.
下面对上述数据预测方法的步骤进行展开说明。The steps of the above data prediction method are described below.
对于步骤S201,本实施例中用户可通过客户端的WEB页面发起数据预测请求,由WEB服务器接收所述数据预测请求,所述数据预测请求可包含客户端的第一用户信息以及模型信息。其中,第一用户信息包含请求发起端的用户名信息和预测数据存储端的用户名信息等,比如预测数据存储端为Hadoop集群时,第一用户信息将包含Hadoop集群中的用户名(HDuser);模型信息可以由用户在客户端的数据预测请求发起界面从多个预设的模型选项中选定生成,在本实施例中,所述数据预测方法包括:当确定用户进入数据预测请求发起界面时,在数据预测请求发起界面或弹出新的界面提供模型选择框,以供用户选定数据预测所需的模型来生成模型信息。在提供给用户的界面中,若存在多个模型选项,则同时显示各模型的模型性能参数,以供用户根据实际需要选择合适的模型。For step S201, in this embodiment, a user can initiate a data prediction request through a WEB page of the client, and the WEB server receives the data prediction request. The data prediction request may include the first user information and model information of the client. The first user information includes the user name information of the request initiator and the user name information of the prediction data storage end, etc. For example, when the prediction data storage end is a Hadoop cluster, the first user information will include the user name (HDuser) in the Hadoop cluster; model The information can be selected and generated by the user from a plurality of preset model options on the data prediction request initiating interface of the client. In this embodiment, the data prediction method includes: when it is determined that the user enters the data prediction request initiating interface, in the The data prediction request initiates the interface or pops up a new interface to provide a model selection box for the user to select the model required for data prediction to generate model information. In the interface provided to the user, if there are multiple model options, the model performance parameters of each model are displayed at the same time, so that the user can select an appropriate model according to actual needs.
在本实施例中,所述数据处理服务器中运行有BI系统,通过BI系统可生成全量数据表,具体的,BI系统从多个数据源获取数据,并对获取的数据进行分析,按照不同数据源或者不同主题生成多个初始数据表,再对多个初始数据表进行关联、整合,生成全量数据表,并得到能够支撑数据分析的字段内容和需要预测的内容,其中得到的字段内容可用作后续步骤S203中的预测入模特征,需要预测的内容指数据预测过程中待预测的目标变量,在本实施例中多个预测入模特征和待预测的目标变量形成对应关系,相应的根据待预测的 目标变量的不同,相应的选取的预测入模特征也不同。在本实施例中,通过BI系统选取用作预测入模特征的字段内容创建得到新的数据表,即为预测数据表,因此本实施例中从全量数据表中获取的数据表为非全量数据表,具体可为无上限的hive表,后续所述模型服务器在进行数据预测时将根据BI系统新创建的非全量数据表进行数据读取。In this embodiment, a BI system runs in the data processing server, and a full-scale data table can be generated through the BI system. Specifically, the BI system obtains data from multiple data sources, analyzes the obtained data, and analyzes the obtained data according to different data sources. Generate multiple initial data tables from sources or different topics, and then associate and integrate multiple initial data tables to generate full data tables, and obtain the field content that can support data analysis and the content that needs to be predicted, and the obtained field content is available. As the prediction-in-model feature in the subsequent step S203, the content to be predicted refers to the target variable to be predicted in the data prediction process. Depending on the target variable to be predicted, the corresponding selected prediction input features are also different. In this embodiment, a new data table is created by selecting the content of the field used as the prediction input feature by the BI system, that is, the prediction data table. Therefore, the data table obtained from the full data table in this embodiment is non-full data The table may specifically be a hive table with no upper limit, and the model server will read data according to the non-full data table newly created by the BI system when performing data prediction.
对于步骤S202,在本实施例中,模型服务器可针对多个用户提交的数据预测请求分别进行数据预测,因此需要对每个用户的数据预测过程分配对应的数据预测资源,实现多用户数据预测的同步处理,提高数据预测效率。For step S202, in this embodiment, the model server can separately perform data prediction for the data prediction requests submitted by multiple users. Therefore, it is necessary to allocate corresponding data prediction resources to the data prediction process of each user, so as to realize the multi-user data prediction process. Synchronous processing improves data prediction efficiency.
在一些实施例中,继续参考图3,其示出了所述数据挖掘模型的生成过程的一个实施例的流程图,包括以下步骤:In some embodiments, with continued reference to FIG. 3, a flowchart of one embodiment of the data mining model generation process is shown, including the following steps:
S301,接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表;S301, receiving a modeling request, determining model algorithm information and second user information according to the modeling request, and acquiring a training data table required for modeling from the full data table;
S302,根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量;S302, configure corresponding modeling resources in the model server according to the second user information, determine the model framework to be trained from the model server according to the model algorithm information, and extract the model frame based on the training data table Modeling into-mold features and modeling target variables;
S303,基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。S303 , based on the modeling resources, perform model training by using the model framework to be trained, the modeling input features, and the modeling target variables to generate the data mining model.
其中,对于步骤S301,本实施例中用户可通过客户端的WEB页面发起建模请求,由WEB服务器接收所述建模请求,所述建模请求可包含客户端的第二用户信息以及模型算法信息。其中第二用户信息包含请求发起端的用户名信息和训练数据存储端的用户名信息等,比如训练数据存储端为Hadoop集群时,第二用户信息将包含Hadoop集群中的用户名(HDuser);模型算法信息可以由用户在客户端编辑生成,或者由用户在客户端的建模请求发起界面从多个预设的算法选项中选定一个或多个算法生成,基于此,所述方法包括当确定用户在客户端进入建模请求发起界面时,在建模请求发起界面或在客户端弹出新的界面提供算法选择框或编辑框,以供用户确定建模所需的模型算法来生成模型算法信息。For step S301, in this embodiment, a user can initiate a modeling request through a WEB page of the client, and the WEB server receives the modeling request, and the modeling request can include the second user information of the client and model algorithm information. The second user information includes the user name information of the request initiator and the user name information of the training data storage end. For example, when the training data storage end is a Hadoop cluster, the second user information will include the user name (HDuser) in the Hadoop cluster; the model algorithm The information can be edited and generated by the user on the client, or generated by the user selecting one or more algorithms from a plurality of preset algorithm options on the modeling request initiation interface of the user. Based on this, the method includes when it is determined that the user is When the client enters the modeling request initiation interface, an algorithm selection box or edit box is provided on the modeling request initiation interface or a new interface pops up on the client, so that the user can determine the model algorithm required for modeling to generate model algorithm information.
在本实施例中,数据处理服务器中运行有BI(Business Intelligence,商业智能)系统,通过BI系统生成全量数据表,具体的,BI系统从多个数据源获取数据,并对获取的数据进行分析,按照不同数据源或者不同主题生成多个初始数据表,再对多个初始数据表进行关联、整合,生成全量数据表,并得到能够支撑数据分析的字段内容和需要预测的内容,其中得到的字段内容可用作后续的建模入模特征,需要预测的内容指建模过程中用到的建模目标变量,在本实施例中多个建模入模特征和建模目标变量形成对应关系,相应的根据建模目标变量的不同,相应的选取的建模入模特征也不同。在本实施例中,通过BI系统选取用作建模入模特征的字段内容创建得到新的数据表,即为训练数据表,因此本实施例中从全量数据表中获取的训练数据表为非全量数据表,具体可为具有上限的hive表,本实施例的hive表中的数据上限为30万,后续所述模型服务器在进行模型训练时将根据BI系统新创建的非全量数据表进行训练数据读取。In this embodiment, a BI (Business Intelligence, business intelligence) system runs in the data processing server, and a full data table is generated through the BI system. Specifically, the BI system acquires data from multiple data sources, and analyzes the acquired data , generate multiple initial data tables according to different data sources or different topics, and then associate and integrate multiple initial data tables to generate a full data table, and obtain the field content that can support data analysis and the content that needs to be predicted. The content of the field can be used as a follow-up modeling entry feature. The content to be predicted refers to the modeling target variable used in the modeling process. In this embodiment, multiple modeling entry features and modeling target variables form a corresponding relationship. , correspondingly, according to the different modeling target variables, the corresponding modeling input features are also different. In this embodiment, a new data table is created by selecting the field content used for modeling and entering the model through the BI system, that is, a training data table. Therefore, in this embodiment, the training data table obtained from the full data table is not The full data table can be a hive table with an upper limit. The upper limit of the data in the hive table in this embodiment is 300,000, and the model server will perform training according to the non-full data table newly created by the BI system when performing model training. data read.
对于步骤S302,在本实施例中,所述模型服务器可针对多个用户提交的建模请求分别进行模型训练,通过对每个用户的模型训练分配对应的建模资源,实现多用户模型训练的同步处理,提高模型训练效率。For step S302, in this embodiment, the model server may separately perform model training for the modeling requests submitted by multiple users, and realize the multi-user model training by allocating corresponding modeling resources to the model training of each user. Synchronous processing improves model training efficiency.
在一些实施例中,所述根据所述第二用户信息在所述模型服务器中配置对应的建模资源包括:根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第二用户信息对应的待执行建模任务的信息,生成建模资源配置请求;根据所述建模资源配置请求查询所述模型服务器的闲置资源是否满足模型训练的需求,若满足则对获取的所述待执行建模任务分配相应的建模资源,否则拒绝当前所述建模资源配置请求。其中,在拒绝当前所述建模资源配置请求后,等待预设的时间间隔后重新获取模型服务器对应的数据库中的待执行建模任务,以执行配置建模资源的过程。In some embodiments, the configuring the corresponding modeling resources in the model server according to the second user information includes: acquiring data from a database corresponding to the model server with the second user according to a preset time interval information corresponding to the information of the modeling task to be executed, and generate a modeling resource configuration request; query whether the idle resources of the model server meet the needs of model training according to the modeling resource configuration request, and if so, the obtained pending Perform the modeling task to allocate the corresponding modeling resources, otherwise reject the current modeling resource configuration request. Wherein, after rejecting the current modeling resource configuration request, the modeling task to be executed in the database corresponding to the model server is re-acquired after a preset time interval, so as to execute the process of configuring modeling resources.
在一些实施例中,所述模型服务器对应的数据库采用关系型数据库管理系统,可存储 建模任务信息,当存在多个建模任务时,由于所述模型服务器的资源有限,各建模任务将在所述模型服务器对应的数据库中排队存储,以便后续依次被所述模型服务器执行。In some embodiments, the database corresponding to the model server adopts a relational database management system, which can store modeling task information. When there are multiple modeling tasks, due to the limited resources of the model server, each modeling task will Queued and stored in the database corresponding to the model server, so as to be executed by the model server in sequence.
进一步的,本实施例所述配置建模资源包括针对每个建模任务创建单独的容器,模型训练过程在对应的容器中进行,可实现多个模型任务的模型训练过程的相互隔离。Further, configuring the modeling resources described in this embodiment includes creating a separate container for each modeling task, and the model training process is performed in the corresponding container, so that the model training processes of multiple model tasks can be isolated from each other.
在一些实施例中,所述模型服务器具体采用Kubernetes创建和管理容器,Kubernetes可用于管理多个主机上的容器化的应用,让部署容器化的应用简单高效,Kubernetes提供了应用部署,规划,更新,维护的机制,核心的特点是能够自主的管理容器来保证容器按照用户的期望状态运行,在Kubernetes中,所有的容器均在Pod中运行,一个Pod可以承载一个或者多个相关的容器。相应的,所述根据所述建模资源配置请求查询所述模型服务器的闲置资源是否满足模型训练的需求,若满足则对获取的所述待执行建模任务分配相应的建模资源具体为:根据待执行建模任务对应的第二用户信息向Kubernetes Master发送创建Pod的请求,若所述模型服务器存在可用资源,且可用资源满足模型训练的需求,则根据所述第二用户信息在所述模型服务器中创建相应的目录,并创建Pod,生成Pod对应的IP和Port,其中IP和Port用于执行模型训练时的调用,通过Kubernetes的Pod实现为每个建模任务分配独立的建模资源,并启动与创建的目录相关联的Docker(容器)服务,完成容器创建及建模资源的配置。In some embodiments, the model server specifically uses Kubernetes to create and manage containers. Kubernetes can be used to manage containerized applications on multiple hosts, making the deployment of containerized applications simple and efficient. Kubernetes provides application deployment, planning, and updating. , the maintenance mechanism, the core feature is the ability to manage containers autonomously to ensure that containers run in accordance with the user's desired state. In Kubernetes, all containers run in Pods, and a Pod can host one or more related containers. Correspondingly, the querying according to the modeling resource configuration request whether the idle resources of the model server meet the requirements of model training, and if so, allocating corresponding modeling resources to the acquired modeling tasks to be executed is specifically: Send a request to create a Pod to the Kubernetes Master according to the second user information corresponding to the modeling task to be executed. If the model server has available resources, and the available resources meet the needs of model training, then according to the second user information Create a corresponding directory in the model server, create a Pod, and generate the IP and Port corresponding to the Pod, where the IP and Port are used to perform model training calls, and Kubernetes Pods are used to allocate independent modeling resources for each modeling task. , and start the Docker (container) service associated with the created directory to complete the container creation and configuration of modeling resources.
在本实施例中,所述训练数据表包含有用作建模入模特征的字段内容,而建模入模特征与建模目标变量相对应,由此可以确定建模入模特征和建模目标变量,同样的,模型算法信息包含有进行模型训练所需的模型算法的标识信息,使得模型服务器能够根据标识信息确定所需的模型算法,以得到带训练的模型框架。在一些实施例中,在所述接收用户的建模请求之后,所述方法还包括:对所述建模请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的建模任务,并判断所述模型服务器对应的数据库中是否存在同一用户提交的建模任务,若存在则终止生成的所述建模任务,否则将生成的所述建模任务存入所述模型服务器对应的数据库中,并发送生成的所述建模任务的唯一标识至用户。其中,进行鉴权和签名校验是对用户预先分发token和密钥后,当接收到建模请求,根据请求中的token查询对应的密钥,并用密钥+参数计算md5签名信息,将计算结果与请求中的签名对比是否一致,确保建模请求合法。In this embodiment, the training data table contains field contents used for modeling in-mould features, and the modeling in-mould features correspond to modeling target variables, so that the modeling in-mould features and modeling targets can be determined Similarly, the model algorithm information contains the identification information of the model algorithm required for model training, so that the model server can determine the required model algorithm according to the identification information, so as to obtain the model frame with training. In some embodiments, after receiving the modeling request from the user, the method further includes: performing authentication and signature verification on the information contained in the modeling request, and if passed, generating a modeling request with a unique identifier task, and determine whether there is a modeling task submitted by the same user in the database corresponding to the model server, if so, terminate the generated modeling task, otherwise the generated modeling task will be stored in the model server corresponding to in the database, and send the generated unique identification of the modeling task to the user. Among them, the authentication and signature verification is to pre-distribute the token and key to the user, when receiving the modeling request, query the corresponding key according to the token in the request, and use the key + parameters to calculate the md5 signature information, and calculate Check whether the result is consistent with the signature in the request to ensure that the modeling request is legitimate.
对于步骤S303,进行模型训练时,具体根据训练数据表访问数据存储服务器查询训练数据,得到建模入模特征的特征值和对应的建模目标变量的数值,将建模入模特征的特征值输入模型框架进行训练,通过将模型框架的输出结果与建模目标变量的数值进行对比来确定是否达到训练要求,并在达到训练要求时停止训练,并输出模型性能指标,并向用户发送模型生成信息。For step S303, when performing model training, specifically access the data storage server according to the training data table to query the training data, obtain the eigenvalues of the modeling-in-model feature and the numerical value of the corresponding modeling target variable, and store the eigenvalues of the modeling-in-model feature Input the model framework for training, determine whether the training requirements are met by comparing the output results of the model framework with the values of the modeling target variables, and stop training when the training requirements are met, output the model performance indicators, and send the model generated to the user. information.
在一些实施例中,所述模型服务器具体可为人工智能服务器(Artificial Intelligence Server,AI Server),在进行模型训练时,用户不需要进行操作,AI Server可针对指定的模型算法进行训练和超参数调整,最终训练出最优的模型,降低了机器学习的使用门槛。其中,在进行超参数调整时,AI Server会对每个建模入模特征进行分析,如均值、方差、最大值、最小值、数据整体分布等信息的统计,根据这些统计信息,计算出模型训练的等级,对于不同的等级,选择不同的参数配置,实现超参数调整。In some embodiments, the model server may specifically be an artificial intelligence server (Artificial Intelligence Server, AI Server). When performing model training, the user does not need to perform operations, and the AI Server can perform training and hyperparameters for the specified model algorithm. After adjustment, the optimal model is finally trained, which lowers the threshold for using machine learning. Among them, when adjusting the hyperparameters, AI Server will analyze each modeling input feature, such as the statistics of the mean, variance, maximum, minimum, and overall data distribution, and calculate the model based on these statistics. The training level, for different levels, select different parameter configurations to realize hyperparameter adjustment.
在一些实施例中,所述数据存储服务器可以是以Hadoop集群的形式部署,相应的,根据训练数据表访问数据存储服务器具体为访问Hadoop集群查询训练数据,在查询到训练数据后,训练数据将从Hadoop集群发送至模型服务器,在模型训练结束后,再从模型服务器中将训练数据删除。In some embodiments, the data storage server may be deployed in the form of a Hadoop cluster. Correspondingly, accessing the data storage server according to the training data table is specifically accessing the Hadoop cluster to query the training data. After the training data is queried, the training data will be stored in the Hadoop cluster. It is sent from the Hadoop cluster to the model server, and after the model training is completed, the training data is deleted from the model server.
在本实施例中,模型训练结束后输出的模型性能指标可存储在所述模型服务器对应的数据库中,以供数据处理服务器端的BI系统进行查询。In this embodiment, the model performance index output after the model training is completed may be stored in a database corresponding to the model server for query by the BI system on the data processing server side.
在一些实施例中,所述模型服务器对应的数据库还可用于记录模型任务的运行状态信息,包括模型任务是否执行、模型任务执行后的模型训练状态信息及模型训练结束后得到 的数据挖掘模型的性能指标参数,便于数据处理服务器端(比如其中的BI系统)能够通过所述模型服务器对应的数据库监控建模任务的运行状态,同时便于从所述模型服务器对应的数据库中查询数据挖掘模型的性能指标参数。相应的,本实施例中所述数据挖掘模型的生成过程还包括接收定时查询建模任务状态的请求,根据查询建模任务状态的请求访问所述模型服务器查询模型的训练状态,其中查询到的训练状态可更新至所述模型服务器对应的数据库中。In some embodiments, the database corresponding to the model server can also be used to record the running status information of the model task, including whether the model task is executed, the model training status information after the model task is executed, and the data mining model obtained after the model training is completed. The performance index parameter is convenient for the data processing server side (such as the BI system) to monitor the running status of the modeling task through the database corresponding to the model server, and at the same time, it is convenient to query the performance of the data mining model from the database corresponding to the model server. index parameter. Correspondingly, the generation process of the data mining model in this embodiment further includes receiving a request for regularly querying the status of the modeling task, and accessing the model server to query the training status of the model according to the request for querying the status of the modeling task, wherein the query The training status can be updated to the database corresponding to the model server.
下面以所述数据处理服务器运行BI系统,用户通过WEB接口发送建模请求,所述模型服务器为AI Server,所述AI Server采用Kubernetes服务,且所述AI Server对应的数据库(DB,Data Base)采用关系型数据库管理系统(MySQL),所述数据存储服务器为Hadoop集群为例,结合图4,通过一个完整的具体实例对数据挖掘模型的生成过程进行说明,具体过程如下:The BI system is run with the data processing server below, the user sends a modeling request through the WEB interface, the model server is an AI Server, the AI Server uses Kubernetes services, and the database (DB, Data Base) corresponding to the AI Server A relational database management system (MySQL) is used, and the data storage server is a Hadoop cluster as an example. With reference to Fig. 4, the generation process of the data mining model is described through a complete specific example, and the specific process is as follows:
用户通过用户端登录BI系统,通过BI系统从全量数据表中获取包含建模入模特征的训练数据表,并确定建模目标变量和模型算法,基于这些内容生成建模请求;建模请求通过WEB接口提交至AI Server;由AI Server对建模请求包含的信息进行鉴权和签名校验,并判断AI DB(AI Server对应的数据库)中是否存在同一用户的建模任务,若没有则创建建模任务,生成建模任务唯一标识并存储至AI DB中,否则不创建建模任务;创建建模任务后,将建模任务唯一标识反馈至用户端;在创建建模任务后,用户端根据建模任务唯一标识轮训建模任务的状态,使得AI Server定时触发从AI DB读取建模任务及相关信息(如用户端的用户名UM,Hadoop集群的用户名HDuser)的操作,根据UM和HDuser向Kubernetes Master发起创建Pod的请求;若AI Server无可用资源,则拒绝创建Pod的请求,若存在可用资源,则创建Pod,生成Pod对应的IP和Port,之后通过相关接口访问AI Server进行模型训练,其中模型训练的过程中,AI Server发送训练数据表至Hadoop集群,根据训练数据表从Hadoop集群中查询数据,Hadoop集群将查询到的数据集反馈至AI Server,以进行模型训练,此外,在模型训练过程中,定时查询模型训练状态,并同步更新模型训练状态至AI DB中,当模型训练状态为成功时,获取模型训练指标,并删除AI Server中的数据集,同时将模型训练指标反馈至AI DB中。The user logs in to the BI system through the user terminal, obtains the training data table containing the modeling input features from the full data table through the BI system, determines the modeling target variables and model algorithm, and generates a modeling request based on these contents; the modeling request passes through The WEB interface is submitted to the AI Server; the AI Server performs authentication and signature verification on the information contained in the modeling request, and judges whether there is a modeling task of the same user in the AI DB (the database corresponding to the AI Server), and creates it if not. For modeling tasks, generate the unique identifier of the modeling task and store it in the AI DB, otherwise the modeling task will not be created; after the modeling task is created, the unique identifier of the modeling task is fed back to the user; after the modeling task is created, the user The status of the rotation training modeling task is uniquely identified according to the modeling task, so that the AI Server periodically triggers the operation of reading the modeling task and related information (such as the user name UM of the client and the user name HDuser of the Hadoop cluster) from the AI DB. HDuser initiates a request to create a Pod to the Kubernetes Master; if the AI Server has no available resources, the request to create a Pod will be rejected. If there are available resources, a Pod will be created, the IP and Port corresponding to the Pod will be generated, and then the AI Server will be accessed through the relevant interface to model the model. Training, in the process of model training, AI Server sends the training data table to the Hadoop cluster, and queries data from the Hadoop cluster according to the training data table, and the Hadoop cluster feeds back the queried data set to the AI Server for model training. In addition, During the model training process, the model training status is queried regularly, and the model training status is updated to the AI DB synchronously. When the model training status is successful, the model training indicators are obtained, the data set in the AI Server is deleted, and the model training indicators are Feedback to AI DB.
进一步地,在一些实施例中,所述根据第一用户信息在所述模型服务器中配置对应的预测资源包括:根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第一用户信息对应的待执行数据预测任务的信息,生成预测资源配置请求;根据所述预测资源配置请求查询所述模型服务器的闲置资源是否满足数据预测的需求,若满足则对获取的所述待执行数据预测任务分配相应的预测资源,否则拒绝所述预测资源配置请求。其中,在拒绝当前预测资源配置请求后,等待预设的时间间隔后重新获取模型服务器对应的数据库中的待执行数据预测任务,以执行配置预测资源的过程。Further, in some embodiments, the configuring the corresponding prediction resource in the model server according to the first user information includes: acquiring the first user in the database corresponding to the model server according to a preset time interval. The information of the data prediction task to be executed corresponding to the information, and generate a prediction resource configuration request; query whether the idle resources of the model server meet the data prediction requirements according to the prediction resource configuration request, and if so, the obtained data to be executed The prediction task allocates the corresponding prediction resources, otherwise the prediction resource configuration request is rejected. Wherein, after rejecting the current prediction resource configuration request, after a preset time interval, the to-be-executed data prediction task in the database corresponding to the model server is re-acquired, so as to execute the process of configuring the prediction resource.
在一些实施例中,所述模型服务器对应的数据库采用关系型数据库管理系统,可存储数据预测任务信息,当存在多个数据预测任务时,由于所述模型服务器的资源有限,各数据预测任务将在所述模型服务器对应的数据库中排队存储,以便后续依次被所述模型服务器执行。在一些实施例中,在所述根据所述模型信息从模型服务器中获取数据挖掘模型时,所述数据预测方法还包括将是否获取到数据挖掘模型的状态信息同步至所述模型服务器对应的数据库中。In some embodiments, the database corresponding to the model server adopts a relational database management system, which can store data prediction task information. When there are multiple data prediction tasks, due to the limited resources of the model server, each data prediction task will Queued and stored in the database corresponding to the model server, so as to be executed by the model server in sequence. In some embodiments, when acquiring the data mining model from the model server according to the model information, the data prediction method further includes synchronizing the status information of whether the data mining model is acquired to a database corresponding to the model server middle.
在一些实施例中,本实施例所述预测资源配置包括针对每个数据预测任务创建单独的容器,后续数据预测模型文件在对应的容器中生成,可实现多个数据预测模型文件生成过程的相互隔离。In some embodiments, the prediction resource configuration described in this embodiment includes creating a separate container for each data prediction task, and subsequent data prediction model files are generated in the corresponding container, which can realize the mutual interaction between the generation processes of multiple data prediction model files. isolate.
在一些实施例中,所述模型服务器采用Kubernetes创建和管理容器,具体的,所述根据所述预测资源配置请求查询所述模型服务器的闲置资源是否满足数据预测的需求,若满足则对获取的所述待执行数据预测任务分配相应的预测资源具体为:根据待执行数据预测任务对应的第一用户信息向Kubernetes Master发送创建Pod的请求,若所述模型服务 器存在可用资源,且可用资源满足数据预测模型文件生成的需求,则根据第一用户信息在所述模型服务器中创建相应的目录,并创建Pod,生成Pod对应的IP和Port,,其中IP和Port用于执行数据预测时的调用,通过Kubernetes的Pod实现为每个数据预测任务分配独立的预测资源,并启动与创建的目录相关联的Docker服务,完成容器创建及预测资源的配置。In some embodiments, the model server uses Kubernetes to create and manage containers. Specifically, according to the prediction resource configuration request, query whether the idle resources of the model server meet the data prediction requirements. The allocation of the corresponding prediction resources for the data prediction task to be executed is specifically: sending a request to create a Pod to the Kubernetes Master according to the first user information corresponding to the data prediction task to be executed, if the model server has available resources, and the available resources satisfy the data Predicting the needs of the model file generation, then create a corresponding directory in the model server according to the first user information, and create a Pod, and generate the IP and Port corresponding to the Pod, wherein the IP and Port are used for performing data prediction calls, The Pod implementation of Kubernetes allocates independent prediction resources for each data prediction task, and starts the Docker service associated with the created directory to complete container creation and configuration of prediction resources.
进一步地,在一些实施例中,在所述接收用户的数据预测请求之后,所述方法还包括:对所述数据预测请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的数据预测任务,并判断所述模型服务器对应的数据库中是否存在同一用户的数据预测任务,若存在则终止生成的所述数据预测任务,否则将生成的所述数据预测任务存入所述模型服务器对应的数据库中,并发送生成的所述数据预测任务的唯一标识至用户。Further, in some embodiments, after receiving the user's data prediction request, the method further includes: performing authentication and signature verification on the information contained in the data prediction request, and if passed, generating a unique identifier and determine whether there is a data prediction task of the same user in the database corresponding to the model server, if so, terminate the generated data prediction task, otherwise, store the generated data prediction task in the model In the database corresponding to the server, the generated unique identifier of the data prediction task is sent to the user.
对于步骤S203,在本实施例中,所述模型服务器可为AI Server,其内存储有多个已完成训练的数据预测模型供调用,而所述数据存储服务器运行有Hadoop集群和Spark集群,所述数据预测模型文件为能够直接在Spark集群上直接运行的模型文件。具体的,在生成所述数据预测模型文件的过程中,根据所述预测数据表、确定的数据预测模型及其运行配置信息生成Pyspark脚本,所述Pyspark脚本即为数据预测模型文件,其中运行配置信息包括数据预测模型运行时依赖的环境文件和其在Hadoop集群中存放的HDFS路径等,在生成Pyspark脚本后,通过Knox+Livy服务将Pyspark脚本提交到Spark集群中,利用Spark分布式资源进行数据预测,其中Knox是一个网关,它用于验证当前的UM是否有权限使用HDuser,在对数以千万计的数据进行预测时,通过Knox+webHDFS服务将Pyspark文件上传到HDFS上,并由Knox+Livy服务进行Spark任务的提交。For step S203, in this embodiment, the model server may be an AI Server, which stores multiple trained data prediction models for invocation, and the data storage server runs a Hadoop cluster and a Spark cluster, so The above data prediction model file is a model file that can be directly run on the Spark cluster. Specifically, in the process of generating the data prediction model file, a Pyspark script is generated according to the prediction data table, the determined data prediction model and its operation configuration information, and the Pyspark script is the data prediction model file, wherein the operation configuration The information includes the environment files that the data prediction model depends on when running and the HDFS path stored in the Hadoop cluster. After the Pyspark script is generated, the Pyspark script is submitted to the Spark cluster through the Knox+Livy service, and Spark distributed resources are used for data processing. Prediction, where Knox is a gateway, it is used to verify whether the current UM has permission to use HDuser, when making predictions on tens of millions of data, upload Pyspark files to HDFS through Knox+webHDFS service, and Knox +Livy service submits Spark tasks.
在本实施例中,在数据预测过程中,由于预测数据表包含有用作预测入模特征的字段内容,Spark集群会根据预测数据表从Hadoop集群中读取数据,得到预测入模特征的特征值,将预测入模特征的特征值输入数据预测模型,输出模型结果至指定表中,完成分布式的数据预测任务。本实施例通过将自动生成的Pyspark文件发送至Spark集群,使得整个数据预测的处理都在Hadoop集群上进行,可针对海量数据进行处理,数据预测得到的目标变量的数据值直接存储在Hadoop集群中,可以防止数据导出,从而保证数据安全,避免数据安全问题。In this embodiment, in the data prediction process, since the prediction data table contains the field content used for predicting the mold-in feature, the Spark cluster will read the data from the Hadoop cluster according to the prediction data table, and obtain the feature value of the predicted mold-in feature. , input the eigenvalues of the predicted input features into the data prediction model, and output the model results to the specified table to complete the distributed data prediction task. In this embodiment, the automatically generated Pyspark file is sent to the Spark cluster, so that the entire data prediction processing is performed on the Hadoop cluster, which can process massive data, and the data value of the target variable obtained from the data prediction is directly stored in the Hadoop cluster. , which can prevent data export, thus ensuring data security and avoiding data security problems.
在一些实施例中,所述模型服务器对应的数据库还可用于记录数据预测任务的运行状态信息,包括数据预测任务是否执行,便于数据处理服务器端(比如其中的BI系统)能够通过所述模型服务器对应的数据库监控数据预测任务的运行状态。相应的,在本实施例中,所述数据预测方法还包括接收定时查询数据预测任务状态的请求,根据查询预测任务状态的请求访问模型服务器查询数据预测模型的运行状态,其中查询到的运行状态可更新至所述模型服务器对应的数据库中。In some embodiments, the database corresponding to the model server can also be used to record the running status information of the data prediction task, including whether the data prediction task is executed, so that the data processing server (such as the BI system) can pass the model server. The corresponding database monitors the running status of the data prediction task. Correspondingly, in this embodiment, the data prediction method further includes receiving a request for regularly querying the data prediction task status, and accessing the model server to query the operation status of the data prediction model according to the request for querying the prediction task status. It can be updated to the database corresponding to the model server.
下面以所述数据处理服务器运行BI系统,用户通过WEB接口发送建模请求,所述模型服务器为AI Server,所述AI Server采用Kubernetes服务,且所述AI Server对应的数据库(DB,Data Base)采用关系型数据库管理系统(MySQL),所述数据存储服务器为Hadoop集群,通过Spark集群运行数据挖掘模型,AI Server和Hadoop集群、Spark集群之间通过Knox+webHDFS+Livy实现数据传递为例,结合图5,通过一个具体实例对本申请提供的数据预测方法进行说明,具体过程如下:The BI system is run with the data processing server below, the user sends a modeling request through the WEB interface, the model server is an AI Server, the AI Server uses Kubernetes services, and the database (DB, Data Base) corresponding to the AI Server A relational database management system (MySQL) is used, the data storage server is a Hadoop cluster, the data mining model is run through a Spark cluster, and the data transmission between the AI Server, the Hadoop cluster and the Spark cluster is implemented through Knox+webHDFS+Livy as an example, combined with Figure 5 illustrates the data prediction method provided by the application through a specific example, and the specific process is as follows:
用户通过用户端登录BI系统,通过BI系统从全量数据表中获取包含预测入模特征的预测数据表,并确定待预测的目标变量和模型信息,基于这些内容生成数据预测请求;数据预测请求通过WEB接口提交至AI Server;由AI Server对数据预测请求包含的信息进行鉴权和签名校验,并判断AI DB(AI Server对应的数据库)中是否存在同一用户的数据预测任务,若没有则创建数据预测任务,生成数据预测任务唯一标识并存储至AI DB中,否则不创建数据预测任务;创建数据预测任务后,将数据预测任务唯一标识反馈至用户端;在创建数据预测任务后,用户端根据数据预测任务唯一标识轮训数据预测任务的状态,使 得AI Server定时触发从AI DB读取数据预测任务及相关信息(如用户端的用户名UM,Hadoop集群的用户名HDuser)的操作,根据UM和HDuser向Kubernetes Master发起创建Pod的请求;若AI Server无可用资源,则拒绝创建Pod的请求,若存在可用资源,则创建Pod,生成Pod对应的IP和Port,之后通过相关接口访问AI Server进行获取用户选择的数据挖掘模型,并在AI Server中生成数据预测模型文件(Pyspark脚本);然后通过Knox+webHDFS服务将Pyspark文件上传到HDFS上,并由Knox+Livy服务将数据预测模型文件提交到Spark集群中运行,运行时发送预测数据表至Hadoop集群,根据预测数据表从Hadoop集群中查询数据,在Hadoop集群中根据查询到的数据集进行数据预测,此外,在数据预测过程中,通过Knox+Livy服务定时查询数据预测任务状态,并同步更新数据预测任务状态至AI DB中,当数据预测任务结束后,将预测结果存储在Hadoop集群中,结束数据预测。The user logs in to the BI system through the user terminal, obtains the prediction data table containing the prediction model characteristics from the full data table through the BI system, determines the target variables and model information to be predicted, and generates a data prediction request based on these contents; the data prediction request passes through The WEB interface is submitted to the AI Server; the AI Server performs authentication and signature verification on the information contained in the data prediction request, and determines whether there is a data prediction task for the same user in the AI DB (the database corresponding to the AI Server), and if not, it will be created. For the data prediction task, generate the unique identifier of the data prediction task and store it in the AI DB, otherwise the data prediction task will not be created; after the data prediction task is created, the unique identifier of the data prediction task is fed back to the client; after the data prediction task is created, the client According to the data prediction task, the status of the rotation training data prediction task is uniquely identified, so that the AI Server periodically triggers the operation of reading the data prediction task and related information (such as the user name UM of the client and the user name HDuser of the Hadoop cluster) from the AI DB. HDuser initiates a request to create a Pod to the Kubernetes Master; if the AI Server has no available resources, the request to create a Pod will be rejected. If there are available resources, a Pod will be created, the IP and Port corresponding to the Pod will be generated, and then the AI Server will be accessed through the relevant interface to obtain it. The data mining model selected by the user, and the data prediction model file (Pyspark script) is generated in the AI Server; then the Pyspark file is uploaded to HDFS through the Knox+webHDFS service, and the Knox+Livy service submits the data prediction model file to Spark Run in the cluster, send the prediction data table to the Hadoop cluster at runtime, query data from the Hadoop cluster according to the prediction data table, and perform data prediction in the Hadoop cluster based on the queried data set. In addition, during the data prediction process, through Knox+ The Livy service regularly queries the status of the data prediction task, and synchronously updates the data prediction task status to the AI DB. When the data prediction task is completed, the prediction result is stored in the Hadoop cluster to end the data prediction.
根据本实施例提供的数据预测方法,一方面根据用户的建模请求可以实现一键式建模,具体通过建模请求从数据处理服务器中的全量数据表获取建模所需的训练数据表,并确定模型算法信息和第二用户信息,进而自动获得建模入模特征、建模目标变量以及待训练的模型框架,并配置对应的建模资源,基于配置的建模资源,通过待训练的模型框架、建模入模特征和建模目标变量进行模型训练,生成数据挖掘模型,本实施例不需要对模型算法进行详细了解,极大地降低数据挖掘模型的训练门槛,只需要根据用户提供的数据即可实现数据挖掘模型无感训练,另一方面基于用户的数据预测请求可实现模型的一键式部署数据预测,根据数据预测请求从数据处理服务器中的全量数据表获取预测数据表,并确定模型信息和第一用户信息,进而确定数据挖掘模型和预测资源,基于配置的预测资源和数据挖掘模型生成数据预测模型文件,并将数据预测模型文件发送至至少一个数据存储服务器,在所述数据存储服务器上运行数据挖掘模型,实现数据预测,本实施例可利用spark直接使用集群资源,对存在于Hadoop的海量数据进行大批量处理,使得整个处理过程都是在集群中进行,能很好地保证数据的安全,防止数据传输导致的泄漏问题,且本实施例在用户无感的状态下进行,用户体验更好。According to the data prediction method provided in this embodiment, on the one hand, one-click modeling can be realized according to the modeling request of the user. Specifically, the training data table required for modeling is obtained from the full data table in the data processing server through the modeling request, And determine the model algorithm information and the second user information, and then automatically obtain the modeling input features, modeling target variables and the model framework to be trained, and configure the corresponding modeling resources, based on the configured modeling resources, through the to-be-trained modeling resources. Model framework, modeling input features and modeling target variables are used for model training to generate a data mining model. This embodiment does not require a detailed understanding of the model algorithm, which greatly reduces the training threshold of the data mining model. On the other hand, based on the user's data prediction request, one-click deployment data prediction of the model can be realized, and the prediction data table can be obtained from the full data table in the data processing server according to the data prediction request. Determine the model information and the first user information, and then determine the data mining model and prediction resources, generate a data prediction model file based on the configured prediction resources and data mining model, and send the data prediction model file to at least one data storage server, where in the The data mining model is run on the data storage server to realize data prediction. In this embodiment, spark can be used to directly use cluster resources to process the massive data existing in Hadoop in large batches, so that the entire processing process is carried out in the cluster, which can be very efficient. The security of the data is ensured and the leakage problem caused by the data transmission is prevented, and this embodiment is performed in a state where the user does not feel it, and the user experience is better.
需要强调的是,为进一步保证信息的私密和安全性,上述实施例中数据挖掘模型生成和数据预测过程中获取的数据中的隐私信息可以存储于区块链的节点中。本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。It should be emphasized that, in order to further ensure the privacy and security of the information, the privacy information in the data obtained during the data mining model generation and data prediction process in the above embodiment can be stored in the nodes of the blockchain. The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的所述计算机可读存储介质可以是非易失性存储介质,也可以是易失性存储介质,比如可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the program is executed, it may include the processes of the foregoing method embodiments. Wherein, the aforementioned computer-readable storage medium may be a non-volatile storage medium, or a volatile storage medium, such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), etc. Non-volatile storage media, or random access memory (Random Access Memory, RAM), etc.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些 步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.
参考图6,作为对上述图2所示数据预测方法的实现,本申请提供了一种数据预测装置的一个实施例,该装置实施例与图2所示的数据预测方法实施例相对应,该数据预测装置具体可以应用于各种电子设备中。Referring to FIG. 6, as an implementation of the data prediction method shown in FIG. 2, the present application provides an embodiment of a data prediction apparatus, and the apparatus embodiment corresponds to the data prediction method embodiment shown in FIG. Specifically, the data prediction apparatus can be applied to various electronic devices.
具体的,本实施例所述的数据预测装置包括:数据预测信息获取模块601、预测配置模块602、数据预测模块603以及模型生成模块604。Specifically, the data prediction apparatus described in this embodiment includes: a data prediction information acquisition module 601 , a prediction configuration module 602 , a data prediction module 603 and a model generation module 604 .
其中,所述数据预测信息获取模块601用于接收数据预测请求,根据所述数据预测请求确定模型信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;所述预测配置模块602用于根据所述模型信息从模型服务器中获取通过所述模型生成模块604预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;所述数据预测模块603用于基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到待预测的目标变量的数据值,完成数据预测。所述模型生成模块604在生成所述数据挖掘模型的过程中,具体用于接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表,根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量,基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。The data prediction information obtaining module 601 is configured to receive a data prediction request, determine the model information and the first user information according to the data prediction request, and obtain the prediction data table from the full data table in the data processing server, wherein the The full data table is formed by associating at least two initial data tables; the prediction configuration module 602 is configured to obtain the data mining model pre-generated by the model generation module 604 from the model server according to the model information, and according to the model information The first user information configures corresponding prediction resources in the model server; the data prediction module 603 is configured to generate a data prediction model file based on the prediction resources and the data mining model, and send it to at least one data storage server , so as to run the data mining model on the data storage server, obtain the characteristic value of the corresponding prediction model feature from the data storage server according to the prediction data table and input it into the data mining model to obtain the target to be predicted. The data value of the variable to complete the data prediction. In the process of generating the data mining model, the model generation module 604 is specifically configured to receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain it from the full data table. A training data table required for modeling, configuring corresponding modeling resources in the model server according to the second user information, and determining a model framework to be trained from the model server according to the model algorithm information, and Extract the modeling input features and modeling target variables based on the training data table, and based on the modeling resources, carry out the modeling through the model framework to be trained, the modeling input features and the modeling target variables training to generate the data mining model.
在一些实施例中,所述预测配置模块602在根据第一用户信息在所述模型服务器中配置对应的预测资源时,具体用于:根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第一用户信息对应的待执行数据预测任务的信息,生成预测资源配置请求;根据所述预测资源配置请求查询所述模型服务器的闲置资源是否满足数据预测的需求,若满足则对获取的所述待执行数据预测任务分配相应的预测资源,否则拒绝所述预测资源配置请求。In some embodiments, when configuring the corresponding prediction resource in the model server according to the first user information, the prediction configuration module 602 is specifically configured to: acquire the database corresponding to the model server according to a preset time interval The information of the data prediction task to be executed corresponding to the first user information, generate a prediction resource configuration request; query whether the idle resources of the model server meet the data prediction requirements according to the prediction resource configuration request, and if so, obtain The to-be-executed data prediction task allocates corresponding prediction resources, otherwise the prediction resource configuration request is rejected.
在一些实施例中,所述预测配置模块602还用于在所述接收数据预测请求之后,对所述数据预测请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的数据预测任务,并判断所述模型服务器对应的数据库中是否存在同一用户的数据预测任务,若存在则终止生成的所述数据预测任务,否则将生成的所述数据预测任务存入所述模型服务器对应的数据库中,并发送生成的所述数据预测任务的唯一标识至用户。在一些实施例中,所述模型生成模块604在根据所述第二用户信息在所述模型服务器中配置对应的建模资源时,具体用于:根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第二用户信息对应的待执行建模任务的信息,生成建模资源配置请求;根据所述建模资源配置请求查询所述模型服务器的闲置资源是否满足模型训练的需求,若满足则对获取的所述待执行建模任务分配相应的建模资源,否则拒绝当前所述建模资源配置请求。In some embodiments, the prediction configuration module 602 is further configured to, after the data prediction request is received, perform authentication and signature verification on the information contained in the data prediction request, and if passed, generate data with a unique identifier prediction task, and determine whether there is a data prediction task of the same user in the database corresponding to the model server, if so, terminate the generated data prediction task, otherwise, store the generated data prediction task in the model server corresponding to in the database, and send the generated unique identifier of the data prediction task to the user. In some embodiments, when configuring the corresponding modeling resources in the model server according to the second user information, the model generation module 604 is specifically configured to: obtain the corresponding modeling resources of the model server according to a preset time interval The information of the to-be-executed modeling task corresponding to the second user information in the database of the second user is generated, and a modeling resource configuration request is generated; according to the modeling resource configuration request, it is queried whether the idle resources of the model server meet the needs of model training, If satisfied, assign corresponding modeling resources to the acquired modeling task to be executed, otherwise reject the current modeling resource configuration request.
在一些实施例中,所述模型生成模块604还用于在所述接收建模请求之后,对所述建模请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的建模任务,并判断所述模型服务器对应的数据库中是否存在同一用户提交的建模任务,若存在则终止生成的所述建模任务,否则将生成的所述建模任务存入所述模型服务器对应的数据库中,并发送生成的所述建模任务的唯一标识至用户。In some embodiments, the model generation module 604 is further configured to perform authentication and signature verification on the information contained in the modeling request after receiving the modeling request, and if passed, generate a model with a unique identifier. model task, and judge whether there is a modeling task submitted by the same user in the database corresponding to the model server, if so, terminate the generated modeling task, otherwise, store the generated modeling task in the model server in the corresponding database, and send the generated unique identifier of the modeling task to the user.
在本实施例中,所述数据预测信息获取模块601、预测配置模块602、数据预测模块603以及模型生成模块604在执行相关操作时所涉及的技术内容可参考上述数据预测方法的实施例中的相关的内容,在此不作展开,同时本申请提供的数据预测装置具有与上述数据预测方法的实施例相应的有益效果。In this embodiment, the technical content involved in performing the relevant operations by the data prediction information acquisition module 601 , the prediction configuration module 602 , the data prediction module 603 and the model generation module 604 may refer to the above-mentioned embodiments of the data prediction method. The related content is not expanded here, and the data prediction apparatus provided by the present application has the beneficial effects corresponding to the embodiments of the above data prediction method.
本申请实施例还提供一种计算机设备,如图7所示,其为本实施例计算机设备基本结构框图,所述计算机设备7包括通过系统总线相互通信连接存储器71、处理器72、网络接口73,所述存储器71中存储有计算机可读指令,所述处理器72执行所述计算机可读指令时实现上述方法实施例中所述的数据预测方法的步骤,并具有与上所述的数据预测方法相对应的有益效果,在此不作展开。An embodiment of the present application also provides a computer device, as shown in FIG. 7 , which is a basic structural block diagram of the computer device in this embodiment. The computer device 7 includes a memory 71 , a processor 72 , and a network interface 73 that communicate with each other through a system bus. , computer-readable instructions are stored in the memory 71, and when the processor 72 executes the computer-readable instructions, it implements the steps of the data prediction method described in the above method embodiments, and has the same data prediction method as the above-mentioned data prediction method. The beneficial effects corresponding to the method are not expanded here.
需要指出的是,图中仅示出了具有存储器71、处理器72、网络接口73的计算机设备7,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。It should be pointed out that only the computer device 7 having the memory 71, the processor 72, and the network interface 73 is shown in the figure, but it should be understood that it is not required to implement all the shown components, and more or more components may be implemented instead. Fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
在本实施例中,所述存储器71至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器71可以是所述计算机设备7的内部存储单元,例如该计算机设备7的硬盘或内存。在另一些实施例中,所述存储器71也可以是所述计算机设备7的外部存储设备,例如该计算机设备7上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器71还可以既包括所述计算机设备7的内部存储单元也包括其外部存储设备。本实施例中,所述存储器71通常用于存储安装于所述计算机设备7的操作系统和各类应用软件,例如对应于上述的数据预测方法的计算机可读指令等。此外,所述存储器71还可以用于暂时地存储已经输出或者将要输出的各类数据。In this embodiment, the memory 71 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 71 may be an internal storage unit of the computer device 7 , such as a hard disk or a memory of the computer device 7 . In other embodiments, the memory 71 may also be an external storage device of the computer device 7, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 71 may also include both the internal storage unit of the computer device 7 and its external storage device. In this embodiment, the memory 71 is generally used to store the operating system and various application software installed on the computer device 7 , such as computer-readable instructions corresponding to the above-mentioned data prediction method. In addition, the memory 71 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器72在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器72通常用于控制所述计算机设备7的总体操作。本实施例中,所述处理器72用于运行所述存储器71中存储的计算机可读指令或者处理数据,例如运行对应于上述的数据预测方法的计算机可读指令。In some embodiments, the processor 72 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. This processor 72 is typically used to control the overall operation of the computer device 7 . In this embodiment, the processor 72 is configured to execute computer-readable instructions stored in the memory 71 or process data, for example, execute computer-readable instructions corresponding to the above-mentioned data prediction method.
所述网络接口73可包括无线网络接口或有线网络接口,该网络接口73通常用于在所述计算机设备7与其他电子设备之间建立通信连接。The network interface 73 may include a wireless network interface or a wired network interface, and the network interface 73 is generally used to establish a communication connection between the computer device 7 and other electronic devices.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的数据预测方法的步骤,并具有与上述的数据预测方法相对应的有益效果,在此不作展开。The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the above-mentioned data prediction method, and has beneficial effects corresponding to the above-mentioned data prediction method, which is not expanded here.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术实施例本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干计算机可读指令用以使得一台终端设备(可以是 手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical embodiments of the present application can be embodied in the form of software products that are essentially or contribute to the prior art. The computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, etc. , CD-ROM), including several computer-readable instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术实施例进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the patent scope of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical embodiments described in the foregoing specific embodiments, or perform equivalents to some of the technical features therein. replace. Any equivalent structures made by using the contents of the description and drawings of this application, which are directly or indirectly used in other related technical fields, are all within the scope of protection of the patent of this application.

Claims (20)

  1. 一种数据预测方法,包括下述步骤:A data prediction method, comprising the following steps:
    接收数据预测请求,根据所述数据预测请求确定模型信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;Receive a data prediction request, determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in the data processing server, wherein the full data table is associated by at least two initial data tables form;
    根据所述模型信息从模型服务器中获取预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;Acquire a pre-generated data mining model from a model server according to the model information, and configure corresponding prediction resources in the model server according to the first user information;
    基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到待预测的目标变量的数据值,完成数据预测;A data prediction model file is generated based on the prediction resource and the data mining model, and sent to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table from all the The data storage server obtains the characteristic value of the corresponding predicted input feature and inputs it into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
    其中,所述数据挖掘模型的生成过程包括:Wherein, the generation process of the data mining model includes:
    接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表;根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量;基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。Receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain a training data table required for modeling from the full data table; The corresponding modeling resources are configured in the server, and the model framework to be trained is determined from the model server according to the model algorithm information, and the modeling input features and modeling target variables are extracted based on the training data table; The modeling resource is used to train the model through the model framework to be trained, the modeling input feature and the modeling target variable, and the data mining model is generated.
  2. 根据权利要求1所述的数据预测方法,其中,所述根据所述第二用户信息在所述模型服务器中配置对应的建模资源包括:The data prediction method according to claim 1, wherein the configuring corresponding modeling resources in the model server according to the second user information comprises:
    根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第二用户信息对应的待执行建模任务的信息,生成建模资源配置请求;Obtain the information of the to-be-executed modeling task corresponding to the second user information in the database corresponding to the model server according to a preset time interval, and generate a modeling resource configuration request;
    根据所述建模资源配置请求查询所述模型服务器的闲置资源是否满足模型训练的需求,若满足则对获取的所述待执行建模任务分配相应的建模资源,否则拒绝当前所述建模资源配置请求。Query whether the idle resources of the model server meet the requirements of model training according to the modeling resource configuration request, if so, allocate corresponding modeling resources to the acquired modeling tasks to be executed, otherwise reject the current modeling Resource configuration request.
  3. 根据权利要求2所述的数据预测方法,其中,在所述接收建模请求之后,所述方法还包括:The data prediction method according to claim 2, wherein after the receiving the modeling request, the method further comprises:
    对所述建模请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的建模任务,并判断所述模型服务器对应的数据库中是否存在同一用户提交的建模任务,若存在则终止生成的所述建模任务,否则将生成的所述建模任务存入所述模型服务器对应的数据库中,并发送生成的所述建模任务的唯一标识至用户。Perform authentication and signature verification on the information contained in the modeling request, if passed, generate a modeling task with a unique identifier, and determine whether there is a modeling task submitted by the same user in the database corresponding to the model server, if If there is, the generated modeling task is terminated, otherwise, the generated modeling task is stored in the database corresponding to the model server, and the generated unique identification of the modeling task is sent to the user.
  4. 根据权利要求2所述的数据预测方法,其中,在进行模型训练时,所述方法还包括:接收定时查询建模任务状态的请求,根据所述查询建模任务状态的请求访问所述模型服务器查询模型训练状态,并将查询到的模型训练状态实时更新至所述模型服务器对应的数据库中。The data prediction method according to claim 2, wherein when performing model training, the method further comprises: receiving a request for regularly querying the status of the modeling task, and accessing the model server according to the request for querying the status of the modeling task The model training state is queried, and the queried model training state is updated to the database corresponding to the model server in real time.
  5. 根据权利要求1至4任一项所述的数据预测方法,其中,所述根据第一用户信息在所述模型服务器中配置对应的预测资源包括:The data prediction method according to any one of claims 1 to 4, wherein the configuring corresponding prediction resources in the model server according to the first user information comprises:
    根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第一用户信息对应的待执行数据预测任务的信息,生成预测资源配置请求;Obtain the information of the data prediction task to be executed corresponding to the first user information in the database corresponding to the model server according to a preset time interval, and generate a prediction resource configuration request;
    根据所述预测资源配置请求查询所述模型服务器的闲置资源是否满足数据预测的需求,若满足则对获取的所述待执行数据预测任务分配相应的预测资源,否则拒绝所述预测资源配置请求。According to the prediction resource configuration request, it is queried whether the idle resources of the model server meet the data prediction requirements, and if so, corresponding prediction resources are allocated to the acquired data prediction task to be executed, otherwise the prediction resource configuration request is rejected.
  6. 根据权利要求5所述的数据预测方法,其中,在所述接收数据预测请求之后,所述方法还包括:The data prediction method according to claim 5, wherein after the receiving the data prediction request, the method further comprises:
    对所述数据预测请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的数据预测任务,并判断所述模型服务器对应的数据库中是否存在同一用户的数据预测任务,若存在则终止生成的所述数据预测任务,否则将生成的所述数据预测任务存入所述模型服 务器对应的数据库中,并发送生成的所述数据预测任务的唯一标识至用户。Perform authentication and signature verification on the information contained in the data prediction request, and if passed, generate a data prediction task with a unique identifier, and determine whether there is a data prediction task for the same user in the database corresponding to the model server, and if so Then the generated data prediction task is terminated, otherwise, the generated data prediction task is stored in the database corresponding to the model server, and the generated unique identifier of the data prediction task is sent to the user.
  7. 根据权利要求1至4任一项所述的数据预测方法,其中,所述全量数据表的获取过程包括:The data prediction method according to any one of claims 1 to 4, wherein the acquiring process of the full data table comprises:
    从多个数据源获取数据进行分析,按照不同数据源或者不同主题生成多个所述初始数据表,并对多个所述初始数据表进行关联整合,生成所述全量数据表,并输出支撑数据分析的字段内容和待预测内容;Obtain data from multiple data sources for analysis, generate multiple initial data tables according to different data sources or different topics, associate and integrate multiple initial data tables, generate the full data table, and output supporting data The analyzed field content and the content to be predicted;
    其中,所述字段内容用作所述建模入模特征或所述预测入模特征,所述待预测内容用作所述建模目标变量或所述待预测的目标变量,基于所述全量数据表选取用作所述建模入模特征的字段内容创建得到新的数据表可生成所述训练数据表,基于所述全量数据表选取用作所述预测入模特征的字段内容创建得到新的数据表可生成所述预测数据表。Wherein, the field content is used as the modeling entry feature or the prediction entry feature, the to-be-predicted content is used as the modeling target variable or the to-be-predicted target variable, based on the full data The table selects the field content used as the modeling entry feature to create a new data table, and the training data table can be generated, and selects the field content used as the prediction entry feature based on the full data table to create a new data table. A data table may generate the forecast data table.
  8. 一种数据预测装置,包括:数据预测信息获取模块、预测配置模块、数据预测模块和模型生成模块;A data prediction device, comprising: a data prediction information acquisition module, a prediction configuration module, a data prediction module and a model generation module;
    所述数据预测信息获取模块用于接收数据预测请求,根据所述数据预测请求确定模型信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;The data prediction information acquisition module is configured to receive a data prediction request, determine model information and first user information according to the data prediction request, and acquire a prediction data table from a full data table in the data processing server, wherein the full data A table is formed by associating at least two initial data tables;
    所述预测配置模块用于根据所述模型信息从模型服务器中获取通过所述模型生成模块预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;The prediction configuration module is configured to obtain the data mining model pre-generated by the model generation module from the model server according to the model information, and configure corresponding prediction resources in the model server according to the first user information;
    所述数据预测模块用于基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到待预测的目标变量的数据值,完成数据预测;The data prediction module is configured to generate a data prediction model file based on the prediction resource and the data mining model, and send it to at least one data storage server to run the data mining model on the data storage server, according to The prediction data table obtains the eigenvalues of the corresponding prediction input features from the data storage server and inputs them into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
    其中,所述模型生成模块具体用于接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表,根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量,基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。Wherein, the model generation module is specifically configured to receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain a training data table required for modeling from the full data table, according to The second user information configures corresponding modeling resources in the model server, determines the model framework to be trained from the model server according to the model algorithm information, and extracts modeling input based on the training data table. Model features and modeling target variables, and based on the modeling resources, perform model training through the model framework to be trained, the modeling input features, and the modeling target variables to generate the data mining model.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下的步骤:A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and the processor implements the following steps when executing the computer-readable instructions:
    接收数据预测请求,根据所述数据预测请求确定模型信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;Receive a data prediction request, determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in the data processing server, wherein the full data table is associated by at least two initial data tables form;
    根据所述模型信息从模型服务器中获取预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;Acquire a pre-generated data mining model from a model server according to the model information, and configure corresponding prediction resources in the model server according to the first user information;
    基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到待预测的目标变量的数据值,完成数据预测;A data prediction model file is generated based on the prediction resource and the data mining model, and sent to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table from all the The data storage server obtains the characteristic value of the corresponding predicted input feature and inputs it into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
    其中,所述数据挖掘模型的生成过程包括:Wherein, the generation process of the data mining model includes:
    接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表;根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量;基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。Receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain a training data table required for modeling from the full data table; The corresponding modeling resources are configured in the server, and the model framework to be trained is determined from the model server according to the model algorithm information, and the modeling input features and modeling target variables are extracted based on the training data table; The modeling resource is used for model training through the model framework to be trained, the modeling input feature and the modeling target variable, and the data mining model is generated.
  10. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现所述根据所述第二用户信息在所述模型服务器中配置对应的建模资源的步骤时,具体实现如下步骤:The computer device according to claim 9, wherein, when the processor executes the computer-readable instructions to implement the step of configuring the corresponding modeling resources in the model server according to the second user information, the specific Implement the following steps:
    根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第二用户信息对应的待执行建模任务的信息,生成建模资源配置请求;Obtain the information of the to-be-executed modeling task corresponding to the second user information in the database corresponding to the model server according to a preset time interval, and generate a modeling resource configuration request;
    根据所述建模资源配置请求查询所述模型服务器的闲置资源是否满足模型训练的需求,若满足则对获取的所述待执行建模任务分配相应的建模资源,否则拒绝当前所述建模资源配置请求。Query whether the idle resources of the model server meet the requirements of model training according to the modeling resource configuration request, if so, allocate corresponding modeling resources to the acquired modeling tasks to be executed, otherwise reject the current modeling Resource configuration request.
  11. 根据权利要求10所述的计算机设备,其中,所述处理器在执行所述计算机可读指令实现所述接收建模请求的步骤之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 10, wherein after the processor executes the computer-readable instructions to implement the step of receiving the modeling request, the processor further implements the following when executing the computer-readable instructions step:
    对所述建模请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的建模任务,并判断所述模型服务器对应的数据库中是否存在同一用户提交的建模任务,若存在则终止生成的所述建模任务,否则将生成的所述建模任务存入所述模型服务器对应的数据库中,并发送生成的所述建模任务的唯一标识至用户。Perform authentication and signature verification on the information contained in the modeling request, if passed, generate a modeling task with a unique identifier, and determine whether there is a modeling task submitted by the same user in the database corresponding to the model server, if If there is, the generated modeling task is terminated, otherwise, the generated modeling task is stored in the database corresponding to the model server, and the generated unique identification of the modeling task is sent to the user.
  12. 根据权利要求10所述的计算机设备,其中,所述处理器在执行所述计算机可读指令实现进行模型训练时,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 10, wherein, when the processor executes the computer-readable instructions to implement model training, the processor further implements the following steps when executing the computer-readable instructions:
    接收定时查询建模任务状态的请求,根据所述查询建模任务状态的请求访问所述模型服务器查询模型训练状态,并将查询到的模型训练状态实时更新至所述模型服务器对应的数据库中。Receive a request for regularly querying the modeling task status, access the model server to query the model training status according to the request for querying the modeling task status, and update the queried model training status to the database corresponding to the model server in real time.
  13. 根据权利要求9至12任一项所述的计算机设备,其中,所述处理器执行所述计算机可读指令实现所述根据第一用户信息在所述模型服务器中配置对应的预测资源的步骤时,具体实现如下步骤:The computer device according to any one of claims 9 to 12, wherein, when the processor executes the computer-readable instructions to implement the step of configuring the corresponding prediction resource in the model server according to the first user information , the specific steps are as follows:
    根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第一用户信息对应的待执行数据预测任务的信息,生成预测资源配置请求;Obtain the information of the data prediction task to be executed corresponding to the first user information in the database corresponding to the model server according to a preset time interval, and generate a prediction resource configuration request;
    根据所述预测资源配置请求查询所述模型服务器的闲置资源是否满足数据预测的需求,若满足则对获取的所述待执行数据预测任务分配相应的预测资源,否则拒绝所述预测资源配置请求。According to the prediction resource configuration request, it is queried whether the idle resources of the model server meet the data prediction requirements, and if so, corresponding prediction resources are allocated to the acquired data prediction task to be executed, otherwise the prediction resource configuration request is rejected.
  14. 根据权利要求13所述的计算机设备,其中,所述处理器在执行所述计算机可读指令实现所述接收数据预测请求的步骤之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 13, wherein after the processor executes the computer-readable instructions to implement the step of receiving a data prediction request, the processor further implements the following when executing the computer-readable instructions step:
    对所述数据预测请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的数据预测任务,并判断所述模型服务器对应的数据库中是否存在同一用户的数据预测任务,若存在则终止生成的所述数据预测任务,否则将生成的所述数据预测任务存入所述模型服务器对应的数据库中,并发送生成的所述数据预测任务的唯一标识至用户。Perform authentication and signature verification on the information contained in the data prediction request, and if passed, generate a data prediction task with a unique identifier, and determine whether there is a data prediction task for the same user in the database corresponding to the model server, and if so Then the generated data prediction task is terminated, otherwise, the generated data prediction task is stored in the database corresponding to the model server, and the generated unique identifier of the data prediction task is sent to the user.
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,使得所述处理器执行如下步骤:A computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the processor is caused to perform the following steps:
    接收数据预测请求,根据所述数据预测请求确定模型信息和第一用户信息,并从数据处理服务器内的全量数据表中获取预测数据表,其中所述全量数据表由至少两个初始数据表关联形成;Receive a data prediction request, determine model information and first user information according to the data prediction request, and obtain a prediction data table from a full data table in the data processing server, wherein the full data table is associated by at least two initial data tables form;
    根据所述模型信息从模型服务器中获取预生成的数据挖掘模型,并根据所述第一用户信息在所述模型服务器中配置对应的预测资源;Acquire a pre-generated data mining model from a model server according to the model information, and configure corresponding prediction resources in the model server according to the first user information;
    基于所述预测资源和所述数据挖掘模型生成数据预测模型文件,将其发送至至少一个数据存储服务器,以在所述数据存储服务器上运行所述数据挖掘模型,根据所述预测数据表从所述数据存储服务器获取对应的预测入模特征的特征值输入所述数据挖掘模型,得到待预测的目标变量的数据值,完成数据预测;A data prediction model file is generated based on the prediction resource and the data mining model, and sent to at least one data storage server to run the data mining model on the data storage server, according to the prediction data table from all the The data storage server obtains the characteristic value of the corresponding predicted input feature and inputs it into the data mining model, obtains the data value of the target variable to be predicted, and completes the data prediction;
    其中,所述数据挖掘模型的生成过程包括:Wherein, the generation process of the data mining model includes:
    接收建模请求,根据所述建模请求确定模型算法信息和第二用户信息,并从所述全量数据表中获取建模所需的训练数据表;根据所述第二用户信息在所述模型服务器中配置对应的建模资源,并根据所述模型算法信息从所述模型服务器中确定待训练的模型框架,以及基于所述训练数据表提取建模入模特征和建模目标变量;基于所述建模资源,通过所述待训练的模型框架、所述建模入模特征和所述建模目标变量进行模型训练,生成所述数据挖掘模型。Receive a modeling request, determine model algorithm information and second user information according to the modeling request, and obtain a training data table required for modeling from the full data table; The corresponding modeling resources are configured in the server, and the model framework to be trained is determined from the model server according to the model algorithm information, and the modeling input features and modeling target variables are extracted based on the training data table; The modeling resource is used for model training through the model framework to be trained, the modeling input feature and the modeling target variable, and the data mining model is generated.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行所述根据所述第二用户信息在所述模型服务器中配置对应的建模资源的步骤时,具体执行如下步骤:16. The computer-readable storage medium of claim 15, wherein the computer-readable instructions are executed by the processor to cause the processor to perform the configuring in the model server according to the second user information For the corresponding steps of modeling resources, perform the following steps:
    根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第二用户信息对应的待执行建模任务的信息,生成建模资源配置请求;Obtain the information of the to-be-executed modeling task corresponding to the second user information in the database corresponding to the model server according to a preset time interval, and generate a modeling resource configuration request;
    根据所述建模资源配置请求查询所述模型服务器的闲置资源是否满足模型训练的需求,若满足则对获取的所述待执行建模任务分配相应的建模资源,否则拒绝当前所述建模资源配置请求。Query whether the idle resources of the model server meet the requirements of model training according to the modeling resource configuration request, if so, allocate corresponding modeling resources to the acquired modeling tasks to be executed, otherwise reject the current modeling Resource configuration request.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器在执行所述接收建模请求的步骤之后,还执行如下步骤:17. The computer-readable storage medium of claim 16, wherein the computer-readable instructions are executed by the processor, such that the processor, after performing the step of receiving a modeling request, further performs the following steps:
    对所述建模请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的建模任务,并判断所述模型服务器对应的数据库中是否存在同一用户提交的建模任务,若存在则终止生成的所述建模任务,否则将生成的所述建模任务存入所述模型服务器对应的数据库中,并发送生成的所述建模任务的唯一标识至用户。Perform authentication and signature verification on the information contained in the modeling request, if passed, generate a modeling task with a unique identifier, and determine whether there is a modeling task submitted by the same user in the database corresponding to the model server, if If there is, the generated modeling task is terminated, otherwise, the generated modeling task is stored in the database corresponding to the model server, and the generated unique identification of the modeling task is sent to the user.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器在进行模型训练时,还执行如下步骤:The computer-readable storage medium according to claim 16, wherein the computer-readable instructions are executed by the processor, so that when the processor performs model training, the processor further performs the following steps:
    接收定时查询建模任务状态的请求,根据所述查询建模任务状态的请求访问所述模型服务器查询模型训练状态,并将查询到的模型训练状态实时更新至所述模型服务器对应的数据库中。Receive a request for regularly querying the modeling task status, access the model server to query the model training status according to the request for querying the modeling task status, and update the queried model training status to the database corresponding to the model server in real time.
  19. 根据权利要求15至18任一项所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器执行所述根据第一用户信息在所述模型服务器中配置对应的预测资源的步骤时,具体执行如下步骤:18. The computer-readable storage medium of any one of claims 15 to 18, wherein the computer-readable instructions are executed by the processor to cause the processor to execute the model according to the first user information When configuring the corresponding prediction resources in the server, perform the following steps:
    根据预设的时间间隔获取所述模型服务器对应的数据库中与所述第一用户信息对应的待执行数据预测任务的信息,生成预测资源配置请求;Obtain the information of the data prediction task to be executed corresponding to the first user information in the database corresponding to the model server according to a preset time interval, and generate a prediction resource configuration request;
    根据所述预测资源配置请求查询所述模型服务器的闲置资源是否满足数据预测的需求,若满足则对获取的所述待执行数据预测任务分配相应的预测资源,否则拒绝所述预测资源配置请求。According to the prediction resource configuration request, it is queried whether the idle resources of the model server meet the data prediction requirements, and if so, corresponding prediction resources are allocated to the acquired data prediction task to be executed, otherwise the prediction resource configuration request is rejected.
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述计算机可读指令被所述处理器执行,使得所述处理器在执行所述接收数据预测请求的之后,还执行如下步骤:The computer-readable storage medium of claim 19, wherein the computer-readable instructions are executed by the processor, so that the processor, after executing the received data prediction request, further performs the following steps:
    对所述数据预测请求包含的信息进行鉴权和签名校验,若通过则生成具有唯一标识的数据预测任务,并判断所述模型服务器对应的数据库中是否存在同一用户的数据预测任务,若存在则终止生成的所述数据预测任务,否则将生成的所述数据预测任务存入所述模型服务器对应的数据库中,并发送生成的所述数据预测任务的唯一标识至用户。Perform authentication and signature verification on the information contained in the data prediction request, and if passed, generate a data prediction task with a unique identifier, and determine whether there is a data prediction task for the same user in the database corresponding to the model server, and if so Then the generated data prediction task is terminated, otherwise, the generated data prediction task is stored in the database corresponding to the model server, and the generated unique identifier of the data prediction task is sent to the user.
PCT/CN2020/135601 2020-10-23 2020-12-11 Data prediction method, apparatus, computer device, and storage medium WO2022011946A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011148696.4 2020-10-23
CN202011148696.4A CN112256760B (en) 2020-10-23 2020-10-23 Data prediction method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022011946A1 true WO2022011946A1 (en) 2022-01-20

Family

ID=74261097

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135601 WO2022011946A1 (en) 2020-10-23 2020-12-11 Data prediction method, apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN112256760B (en)
WO (1) WO2022011946A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023186099A1 (en) * 2022-04-02 2023-10-05 维沃移动通信有限公司 Information feedback method and apparatus, and device
CN117492738A (en) * 2023-11-08 2024-02-02 交通银行股份有限公司北京市分行 Full flow method and device for data mining

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030145000A1 (en) * 2002-01-31 2003-07-31 International Business Machines Corporation System and method of using data mining prediction methodology
CN107145395A (en) * 2017-07-04 2017-09-08 北京百度网讯科技有限公司 Method and apparatus for handling task
CN107888669A (en) * 2017-10-31 2018-04-06 武汉理工大学 A kind of extensive resource scheduling system and method based on deep learning neutral net
CN109935338A (en) * 2019-03-07 2019-06-25 平安科技(深圳)有限公司 Data prediction processing method, device and computer equipment based on machine learning
CN110659261A (en) * 2019-09-19 2020-01-07 成都数之联科技有限公司 Data mining model publishing method, model and model service management method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030145000A1 (en) * 2002-01-31 2003-07-31 International Business Machines Corporation System and method of using data mining prediction methodology
CN107145395A (en) * 2017-07-04 2017-09-08 北京百度网讯科技有限公司 Method and apparatus for handling task
CN107888669A (en) * 2017-10-31 2018-04-06 武汉理工大学 A kind of extensive resource scheduling system and method based on deep learning neutral net
CN109935338A (en) * 2019-03-07 2019-06-25 平安科技(深圳)有限公司 Data prediction processing method, device and computer equipment based on machine learning
CN110659261A (en) * 2019-09-19 2020-01-07 成都数之联科技有限公司 Data mining model publishing method, model and model service management method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023186099A1 (en) * 2022-04-02 2023-10-05 维沃移动通信有限公司 Information feedback method and apparatus, and device
CN117492738A (en) * 2023-11-08 2024-02-02 交通银行股份有限公司北京市分行 Full flow method and device for data mining

Also Published As

Publication number Publication date
CN112256760B (en) 2021-07-06
CN112256760A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
US20230362165A1 (en) Identifying accounts having shared credentials
US9886563B2 (en) Personalized online content access experiences using inferred user intent to configure online session attributes
US10325076B2 (en) Personalized online content access experiences using online session attributes
CN103023875B (en) A kind of account management system and method
TWI473029B (en) Extensible and programmable multi-tenant service architecture
US11720825B2 (en) Framework for multi-tenant data science experiments at-scale
CN111241195B (en) Database processing method, device, equipment and storage medium of distributed system
WO2022011946A1 (en) Data prediction method, apparatus, computer device, and storage medium
WO2022116425A1 (en) Method and system for data lineage analysis, computer device, and storage medium
CN111797096A (en) Data indexing method and device based on ElasticSearch, computer equipment and storage medium
WO2018119589A1 (en) Account management method and apparatus, and account management system
CN112036125B (en) Document management method and device and computer equipment
WO2022095518A1 (en) Automatic interface test method and apparatus, and computer device and storage medium
CN111460394A (en) Copyright file verification method and device and computer readable storage medium
JP2018517982A (en) Automatic recharge system, method and server
CN104717197B (en) Conversation management system, session management equipment and conversation managing method
CN111338571A (en) Task processing method, device, equipment and storage medium
US11640450B2 (en) Authentication using features extracted based on cursor locations
CN112468409A (en) Access control method, device, computer equipment and storage medium
CN109683957A (en) The method and apparatus of Function Extension
CN111191200A (en) Page display method and device and electronic equipment
CN111339193A (en) Category coding method and device
CN108959309B (en) Method and device for data analysis
CN113312669B (en) Password synchronization method, device and storage medium
CN109302446B (en) Cross-platform access method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20945563

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20945563

Country of ref document: EP

Kind code of ref document: A1