CN113935434A - Data analysis processing system and automatic modeling method - Google Patents

Data analysis processing system and automatic modeling method Download PDF

Info

Publication number
CN113935434A
CN113935434A CN202111299347.7A CN202111299347A CN113935434A CN 113935434 A CN113935434 A CN 113935434A CN 202111299347 A CN202111299347 A CN 202111299347A CN 113935434 A CN113935434 A CN 113935434A
Authority
CN
China
Prior art keywords
model
algorithm
data
user
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111299347.7A
Other languages
Chinese (zh)
Inventor
路明奎
路宏琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zetyun Tech Co ltd
Original Assignee
Beijing Zetyun Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zetyun Tech Co ltd filed Critical Beijing Zetyun Tech Co ltd
Priority to CN202111299347.7A priority Critical patent/CN113935434A/en
Publication of CN113935434A publication Critical patent/CN113935434A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention provides a data analysis processing system and an automatic modeling method, wherein the method comprises the following steps: displaying a user interface for a user to set scenes and data for creating a business model; acquiring scenes and/or data set on the user interface by a user; selecting a model strategy from a plurality of model strategies according to the acquired scene and/or data, and creating a business model according to the selected model strategy, wherein the model strategy at least comprises the following information: an algorithm and a parameter tuning method of the algorithm. According to the invention, the model strategy can be automatically selected according to the scene and/or data set by the user, the user does not need to select the model strategy, the automation degree of the data analysis processing system is improved, and the user experience is improved.

Description

Data analysis processing system and automatic modeling method
The invention is a divisional application of the invention with application date of 2018, 6 and 19, application number of 2018106324996 and named as 'a data analysis and processing system and an automatic modeling method'.
Technical Field
The invention relates to the technical field of data processing, in particular to a data analysis processing system and an automatic modeling method.
Background
The main mode of the current data analysis processing system for performing business model training is as follows: and exporting data for training the business model to the local from the database, selecting a model strategy according to business requirements by a modeler by using a third-party modeling tool, training the business model, and continuously and manually debugging in the process of training the business model to obtain optimized model parameters so as to obtain the trained business model.
The business model training mode has the following great disadvantages: the process of the business model training is complex, the automation degree is low, and the method is not suitable for non-professional users.
Disclosure of Invention
In view of this, the present invention provides a data analysis processing system and an automatic modeling method, so as to solve the problems of complex process and low automation degree of the existing data analysis processing system training model.
In order to solve the above technical problem, the present invention provides an automatic modeling method for a data analysis processing system, comprising:
displaying a user interface for a user to set scenes and data for creating a business model;
acquiring scenes and/or data set on the user interface by a user, selecting a model strategy from a plurality of model strategies according to the acquired scenes and/or data, and creating a service model according to the selected model strategy, wherein the model strategy at least comprises the following information: an algorithm and a parameter tuning method of the algorithm.
Preferably, the model policy further comprises at least one of the following information: the method comprises an evaluation method of the algorithm, a parameter setting method of the algorithm, a splitting method of the data, a processing method of the data and a feature selection method of the data.
Preferably, the user interface is further used for setting target characteristics for creating the business model by the user.
Preferably, the step of displaying the user interface comprises:
displaying a scene form on the user interface for selection by a user;
when an operation that a user selects a scene in the scene form is detected, displaying the selected scene on the user interface;
or
Displaying a scene input area on the user interface;
when the operation that a user inputs a scene in the input area is detected, acquiring the scene input by the user;
and displaying the scenes matched with the scenes input by the user in the scene form on the user interface.
Preferably, the scene comprises at least one of: a scene corresponding to a clustering algorithm, a scene corresponding to a classification algorithm, a scene corresponding to a regression algorithm, a scene corresponding to anomaly detection, and a scene corresponding to language processing.
Preferably, when the scene is a scene of a corresponding clustering algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: hierarchical clustering, Bayesian Gaussian mixing, KD tree and restricted Boltzmann machine, wherein the parameter tuning method of the algorithm is performed based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method, a contour coefficient method;
when the scene is a scene corresponding to a classification algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: the method comprises the following steps of logistic regression, random forest, Bagging, AdaBoost, a neural network and a stack model, wherein a parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter searching method, a grid parameter searching method and an area under curve AUC fraction method;
when the scene is a scene corresponding to a regression algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: the method comprises the following steps of logistic regression, random forest, support vector regression and neural network, wherein the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the method for the hyper-parameter optimization comprises at least one of the following steps: a random parameter search method, a grid parameter search method, an R2 value method;
when the scene is a scene corresponding to anomaly detection, the information of the selected model strategy comprises: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: a neural network, a support vector machine, a robust regression, nearest neighbors, and isolated forests; the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method, an F1 score method;
when the scene is a scene processed by a corresponding language, the information of the selected model strategy comprises: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: latent semantic indexing, latent Dirichlet distribution, conditional random fields; the parameter tuning method of the algorithm comprises the following steps: and giving a default parameter according to the result of the word frequency analysis, and using the default parameter.
Preferably, after the step of creating a business model according to the selected model policy, the method further includes:
displaying modeling design information of the created business model, wherein the modeling design information at least comprises: information of the selected model strategy.
Preferably, after the step of displaying modeling design information of the created business model, the method further includes:
updating the modeling design information when an operation of adjusting the modeling design information by a user is detected;
and when detecting that the user executes the operation for operating the created service model, operating the created service model according to the updated modeling design information.
Preferably, after the step of creating a business model according to the selected model policy, the method further includes:
and when detecting that the user executes the operation for running the created service model, adopting the selected model strategy to run the created service model.
Preferably, after the step of running the created service model, the method further includes:
displaying a modeling effort of the operational completed business model, the modeling effort including at least one of: the name of the operated business model, the score of the operated business model and the output result of the operated business model.
Preferably, the modeling result further comprises: the information of the model strategy of the operated business model, the creation time of the operated business model, the training information of the operated business model, the workflow corresponding to the operated business model, the state of the operated business model and the importance ranking information of the data characteristics.
Preferably, the modeling effort includes: and the information of the first M service models with the highest score in the N service models which are operated and completed and correspond to the selected model strategy, or the information of all the N service models which are operated and completed and correspond to the selected model strategy, wherein M is a positive integer which is greater than or equal to 1, and N is a positive integer which is greater than or equal to M.
Preferably, after the step of running the created service model, the method further includes:
displaying modeling design information of the service model which is completed in operation, wherein the modeling design information at least comprises: information of the selected model strategy;
updating the modeling design information when an operation of adjusting the modeling design information by a user is detected;
and when detecting that the user executes the operation for rerunning the operated service model, rerunning the operated service model according to the updated modeling design information.
Preferably, the modeling design information further includes: scene and/or target features.
Preferably, after the step of creating a business model according to the selected model policy, the method further includes:
and creating a first workflow corresponding to the created business model, wherein the first workflow comprises a plurality of workflow modules.
Preferably, after the step of creating the first workflow corresponding to the created business model, the method further includes:
and updating the first workflow when detecting the operation of running the created service model or detecting the operation of adjusting the modeling design information by the user.
Preferably, after the step of creating the first workflow corresponding to the created business model, the method further includes:
and when detecting that the user creates a second workflow with the same content as the first workflow, generating the second workflow, wherein the second workflow can be edited.
Preferably, after the step of displaying the user interface, the method further includes:
when an operation of viewing the set data by a user is detected, displaying visual information corresponding to the data.
Preferably, after the step of running the created service model, the method further includes:
and when the operation that the user publishes the operation-finished service model is detected, publishing the operation-finished service model.
Preferably, after the step of running the created service model, the method further includes:
when the operation that the user reevaluates the operated business model or the issued business model is detected, reevaluating the operated business model or the issued business model.
The invention also provides a data analysis processing system, comprising:
the system comprises a display module, a service model creating module and a service model creating module, wherein the display module is used for displaying a user interface, and the user interface is used for setting scenes and data for creating the service model by a user;
the processing module is used for acquiring scenes and/or data set on the user interface by a user; selecting a model strategy from a plurality of model strategies according to the acquired scene and/or data, and creating a business model according to the selected model strategy, wherein the model strategy at least comprises the following information: an algorithm and a parameter tuning method of the algorithm.
Preferably, the model policy further comprises at least one of the following information: the method comprises an evaluation method of the algorithm, a parameter setting method of the algorithm, a splitting method of the data, a processing method of the data and a feature selection method of the data.
Preferably, the user interface is further used for setting target characteristics for creating the business model by the user.
Preferably, the display module is configured to display a scene form on the user interface for selection by a user; when an operation that a user selects a scene in the scene form is detected, displaying the selected scene on the user interface;
or
The display module is used for displaying a scene input area on the user interface; when the operation that a user inputs a scene in the input area is detected, acquiring the scene input by the user; and displaying the scenes matched with the scenes input by the user in the scene form on the user interface.
Preferably, the scene comprises at least one of: a scene corresponding to a clustering algorithm, a scene corresponding to a classification algorithm, a scene corresponding to a regression algorithm, a scene corresponding to anomaly detection, and a scene corresponding to language processing.
Preferably, when the scene is a scene of a corresponding clustering algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: hierarchical clustering, Bayesian Gaussian mixing, KD tree and restricted Boltzmann machine, wherein the parameter tuning method of the algorithm is performed based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method, a contour coefficient method;
when the scene is a scene corresponding to a classification algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: the method comprises the following steps of logistic regression, random forest, Bagging, AdaBoost, a neural network and a stack model, wherein a parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter searching method, a grid parameter searching method and an area under curve AUC fraction method;
when the scene is a scene corresponding to a regression algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: the method comprises the following steps of logistic regression, random forest, support vector regression and neural network, wherein the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the method for the hyper-parameter optimization comprises at least one of the following steps: a random parameter search method, a grid parameter search method, an R2 value method;
when the scene is a scene corresponding to anomaly detection, the information of the selected model strategy comprises: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: a neural network, a support vector machine, a robust regression, nearest neighbors, and isolated forests; the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method, an F1 score method;
when the scene is a scene processed by a corresponding language, the information of the selected model strategy comprises: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: latent semantic indexing, latent Dirichlet distribution, conditional random fields; the parameter tuning method of the algorithm comprises the following steps: and giving a default parameter according to the result of the word frequency analysis, and using the default parameter.
Preferably, the display module is further configured to display modeling design information of the created service model, where the modeling design information at least includes: information of the selected model strategy.
Preferably, the data analysis processing system further comprises:
the first adjusting module is used for updating the modeling design information when the operation of adjusting the modeling design information by a user is detected;
and the first operation module is used for operating the created service model according to the updated modeling design information when detecting that the user executes the operation for operating the created service model.
Preferably, the data analysis processing system further comprises:
and the second operation module is used for adopting the selected model strategy to operate the created service model when detecting that the user executes the operation of operating the created service model.
Preferably, the display module is further configured to display a modeling result of the service model completed in operation, where the modeling result includes at least one of: the name of the operation-completed business model, the score of the operation-completed business model and the output result of the operation-completed business model.
Preferably, the modeling result further comprises: the information of the model strategy of the operated business model, the creation time of the operated business model, the training information of the operated business model, the workflow corresponding to the operated business model, the state of the operated business model and the importance ranking information of the data characteristics.
Preferably, the modeling effort includes: and the information of the first M service models with the highest score in the N service models which are operated and completed and correspond to the selected model strategy, or the information of all the N service models which are operated and completed and correspond to the selected model strategy, wherein M is a positive integer which is greater than or equal to 1, and N is a positive integer which is greater than or equal to M.
Preferably, the display module is further configured to display modeling design information of the service model that is completed in operation, where the modeling design information at least includes: information of the selected model strategy;
the second adjusting module is used for updating the modeling design information when the operation of adjusting the modeling design information by a user is detected;
and the third running module is used for re-running the operated service model according to the updated modeling design information when detecting that the user executes the operation for re-running the operated service model.
Preferably, the modeling design information further includes: scene and/or target features.
Preferably, the data analysis processing system further comprises:
and the creating module is used for creating a first workflow corresponding to the created service model, and the first workflow comprises a plurality of workflow modules.
Preferably, the data analysis processing system further comprises:
and the updating module is used for updating the first workflow when detecting the operation of running the created service model or detecting the operation of adjusting the modeling design information by a user.
Preferably, the data analysis processing system further comprises:
and the copying module is used for generating a second workflow when detecting that the user creates the operation of creating the second workflow with the same content as the first workflow, and the second workflow can be edited.
Preferably, the data analysis processing system further comprises:
and the visualization module is used for displaying visualization information corresponding to the data when the operation of viewing the set data by the user is detected.
Preferably, the data analysis processing system further comprises:
and the issuing module is used for issuing the operation of issuing the operation-finished service model when detecting that the user issues the operation-finished service model.
Preferably, the data analysis processing system further comprises:
and the re-evaluation module is used for re-evaluating the operated service model or the issued service model when detecting that the user re-evaluates the operation of the operated service model or the issued service model.
The invention also provides a data analysis processing system, which comprises a processor, a memory and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the steps of the automatic modeling method when being executed by the processor.
The invention also provides a computer-readable storage medium on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the above-mentioned automatic modeling method.
The technical scheme of the invention has the following beneficial effects:
in the embodiment of the invention, the data analysis processing system can automatically select the model strategy according to the scene and/or data set by the user without selecting the model strategy by the user, so that the automation degree of the data analysis processing system is improved, and the user experience is improved.
Drawings
FIG. 1 is a schematic flow chart of an automatic modeling method for a data analysis processing system according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of an automated modeling user interface according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a user interface for viewing information of model policies of an embodiment of the present invention;
FIG. 4 is a schematic diagram of a user interface for a list of modeling outcomes of an embodiment of the present invention;
FIG. 5 is a schematic diagram of a user interface for a modeling effort chart in accordance with an embodiment of the present invention;
FIGS. 6 and 7 are schematic diagrams of a user interface for viewing data according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a user interface of a model repository of an embodiment of the present invention;
FIG. 9 is a schematic diagram of a user interface for online model performance and resource usage according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a data analysis processing system according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of a data analysis processing system according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention, are within the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an automatic modeling method of a data analysis processing system according to a first embodiment of the present invention, where the automatic modeling method includes:
step 11: displaying a user interface for a user to set scenes and data for creating a service model;
referring to fig. 2, fig. 2 is a schematic diagram of a user interface for automatic modeling of a data analysis processing system according to an embodiment of the present invention, where the user interface for automatic modeling includes an input box for "selecting a scene" and an input box for "selecting a data module" (i.e., data), and a user can set a scene for creating a business model in the input box for "selecting a scene" and set data for creating a business model in the input box for "selecting a data module" (i.e., a module for storing data). In the embodiment of the present invention, preferably, the data modules displayed on the user interface are all data modules that the user has the selection right, and the description of the data modules can be displayed on the user interface while the data modules are displayed.
Step 12: acquiring scenes and/or data set on the user interface by a user, selecting a model strategy from a plurality of model strategies according to the acquired scenes and/or data, and creating a service model according to the selected model strategy, wherein the model strategy at least comprises the following information: an algorithm and a parameter tuning method of the algorithm.
The model strategy at least comprises an algorithm of the business model and a parameter tuning method of the algorithm, and the algorithm of the business model can be trained based on the information of the model strategy. In some preferred embodiments of the present invention, the model policy may further include at least one of the following information: the method comprises an algorithm evaluation method, an algorithm parameter setting method, a data splitting method, a data processing method and a data feature selection method.
It is understood that, in the embodiment of the present invention, a plurality of model strategies need to be stored in the data analysis processing system in advance.
In the embodiment of the invention, the data analysis processing system can automatically select the model strategy according to the scene and/or data set by the user without selecting the model strategy by the user, so that the automation degree of the data analysis processing system is improved, and the user experience is improved.
In an embodiment of the present invention, the type of the algorithm of the business model may include at least one of the following: clustering algorithms, classification algorithms, regression algorithms, anomaly detection, and language processing algorithms. Correspondingly, the scene may include at least one of: a scene corresponding to a clustering algorithm, a scene corresponding to a classification algorithm, a scene corresponding to a regression algorithm, a scene corresponding to anomaly detection, and a scene corresponding to language processing.
For example, the scenario of the corresponding clustering algorithm may include: credit card customer base (i.e. which categories the customers of the credit card have) and network group domain (i.e. analyzing the relationship between the network alarm log and the devices, clustering the network alarm log based on the devices), etc. The scenarios corresponding to the classification algorithms may include, for example: customer churn prediction, financial product recommendation prediction, and the like. Scenarios corresponding to regression algorithms may include, for example: prediction of insurance claim settlement, cash reimbursement and the like. Scenarios corresponding to anomaly detection may include, for example: fraud and unusual transactions, etc. Scenarios of corresponding language processing may include, for example: latent semantic analysis, word frequency analysis, and the like.
In the embodiment of the present invention, the scenario set by the user may include a large type, that is, a clustering algorithm, a classification algorithm, and the like are selected on the user interface, and the scenario set by the user may also include a small type, for example, a scenario including a service target, for example, a credit card customer group, customer churn prediction, and the like are selected on the user interface. Of course, in other embodiments of the present invention, the user interface may have only a large type or only a small type, and the present invention is not limited thereto.
In the embodiment of the present invention, preferably, the scenario refers to a service scenario for creating a service model, and the scenario is related to a type of an algorithm of the service model.
In the embodiment of the present invention, when a model policy is selected, a scene set by a user may be analyzed to obtain a corresponding model policy, and certainly, a type of data set by the user may also be analyzed to obtain a corresponding model policy, or the scene and the data are simultaneously analyzed to obtain a corresponding model policy.
In the embodiment of the present invention, preferably, the data analysis processing system may store a correspondence between the scene and/or the data and the model policy, so as to select the model policy according to the correspondence. Of course, in some other embodiments of the present invention, the correspondence between the scene and/or the data and the information of the model policy may be stored, and the data analysis processing system may determine the model policy according to the correspondence between the scene and/or the data and the information of the model policy.
In the embodiment of the present invention, the scene and the data may be mutually influenced, and preferably, the data that can be selected according to different data is different, and the data that can be selected according to different scenes is different, and the data is different, including different types of data, different granularity of data, different selectable target columns, and the like.
In some embodiments of the invention, the step of displaying the user interface may comprise:
step 111: and displaying a scene form on the user interface for selection by a user.
Step 112: when an operation that a user selects a scene in the scene form is detected, displaying the selected scene on the user interface;
in some other embodiments of the present invention, the step of displaying the user interface may further include:
step 111': displaying a scene input area on the user interface; the scene input area can be a text input box or a voice input key;
step 112': when the operation that a user inputs a scene in the input area is detected, acquiring the scene input by the user;
step 113': and displaying the scenes matched with the scenes input by the user in the scene form on the user interface.
Specifically, the data analysis processing system can perform semantic understanding on the input scenes in the input area, automatically identify the scenes, and determine the scenes matched with the identified scenes from the scene table.
It can be appreciated that the data analysis processing system needs to store the scene table, and the scene table has at least one (usually more than one, for example, 80) scenes.
Referring to fig. 2, in the embodiment of the present invention, in addition to setting scenes and data, a user may also select to set target features (i.e., target columns in fig. 2), and determine a model policy according to the target features. For example, a target column in the customer churn prediction is a churn label column. In the embodiment of the present invention, one target column may be selected. Of course, a plurality of rows are also possible.
That is, the user interface is also used for the user to set target features for creating the business model.
Preferably, the step of acquiring a scene and/or data set on the user interface by a user, and selecting a model policy from a plurality of model policies according to the acquired scene and/or data includes: and acquiring scene, data and/or target characteristics set on the user interface by a user, and selecting a model strategy from a plurality of model strategies according to the acquired scene, data and/or target characteristics.
That is, the role of the target features may be used to select the model strategy, and in addition, the target features may be used in the process of training the business model, for example, in algorithm evaluation.
In addition, referring to fig. 2, on the user interface of the automatic modeling, the user may further set the name of the business model (i.e., the automatic modeling name in fig. 2), and at the same time, the user may further set the description of the business model, the label of the business model, and the like.
The following describes the correspondence between the scene, data, target characteristics and model policy by way of example.
The method comprises the following steps of: credit card customer groups (which categories the customers of the credit card have), network group domains (relationships between network alarm logs and devices, clustering network alarm logs based on devices), and the like.
Scenario-credit card customer group, data-credit card customer information (e.g. credit card customer information within a certain bank fixed period (e.g. 1 year)).
Model strategy 1: data processing: data cleansing and/or data normalization; characteristic engineering: performing feature selection through principal component analysis; an algorithm (selected based on spatial characteristics of the clustering space), the algorithm comprising at least one of: hierarchical clustering, Bayesian Gaussian mixing, KDTree (K-D tree) and restricted Boltzmann machine; the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the method of the hyper-parameter optimization comprises at least one of the following steps: a random parameter search method, a grid parameter search method, and a contour coefficient method (e.g., grouped into several categories), specifically, selecting a hyper-parameter based on the random parameter search method and/or the grid parameter search method, for example, selecting a set of optimal hyper-parameters from a parameter list based on the random parameter search method and/or the grid parameter search method, wherein a contour coefficient is used as an evaluation index of the hyper-parameter; evaluation: the algorithmic evaluation is based on Silhouette coefficients, homogeneity, completeness and/or V-measure. Each algorithm makes the same evaluation, and the results of each algorithm are retained and further analyzed in combination with the credit card service.
Corresponding to the classification algorithm: customer churn prediction, financial product recommendation prediction, and the like.
Scenario-customer churn prediction, data-customer information (e.g., customer information within a certain bank fixed period (e.g., 1 year)), target column-churn/non-churn.
Model strategy 2: data processing: data cleansing and/or data normalization; characteristic engineering: feature selection is carried out through a Chi-square test, a Pearson correlation coefficient method, an extreme tree feature selection method and/or a recursive feature elimination method; the algorithms include at least one of the following (based on the characteristics of the different algorithms, some of which are selected for each): logistic regression, random forest, Bagging, AdaBoost, neural network and stack model; the parameter tuning method of the algorithm is carried out based on the super-parameter optimization, and the super-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method, and an Area Under the Curve (AUC) score method, specifically, selecting a hyper-parameter based on the random parameter search method and/or the grid parameter search method, for example, selecting a set of optimal hyper-parameters from a parameter list based on the random parameter search method and/or the grid parameter search method, wherein the AUC score is used as an evaluation index of the hyper-parameter; evaluation: algorithmic assessments are made based on AUC scores, accuracy, precision, recall, F1 scores, and/or log loss. And each algorithm performs the same evaluation, selects an optimal algorithm and outputs the loss prediction probability value of each client.
And thirdly, corresponding to the scenes of the regression algorithm: insurance claim amount prediction, cash reimbursement and the like.
Scenario-insurance claim amount forecast, data-certain insurance company client information (e.g. client information within certain bank fixed period (e.g. 1 year)), target column-claim amount.
Model strategy 3: data processing: data cleansing and/or data normalization; characteristic engineering: feature selection is carried out through a Chi-square test, a Pearson correlation coefficient method, an extreme tree feature selection method and/or a recursive feature elimination method; the algorithms include at least one of the following (based on the characteristics of the different algorithms, some of which are selected for each): logistic regression, random forest, Support Vector Regression (SVR) and neural networks; the parameter tuning method of the algorithm is carried out based on the super-parameter optimization, and the super-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method and an R2 value method, specifically, selecting a hyper-parameter based on the random parameter search method and/or the grid parameter search method, for example, selecting a group of optimal hyper-parameters from a parameter list based on the random parameter search method and/or the grid parameter search method, wherein an R2 value is adopted as an evaluation index of the hyper-parameters; evaluation: the algorithmic evaluation is based on an explained variance score, mean absolute deviation, mean square error, R2 value, median absolute error, and/or log mean square error. And each algorithm performs the same evaluation, selects an optimal algorithm and outputs the predicted value of the insurance claim amount.
And fourthly, corresponding to an abnormal detection scene, such as fraud, abnormal transaction and the like.
Scenario-anomaly detection, data-certain industry transaction information (e.g., transaction information within a certain industry fixed period), may give a target column-anomaly/non-anomaly.
Model strategy 4: data processing: data cleansing and/or data normalization; characteristic engineering: feature imbalance handling (all features are generally adopted and feature imbalance handling is performed); the algorithms include at least one of the following (some algorithms are selected for anomaly detection): neural networks, support vector machines, robust regression, nearest neighbor and Isolation Forest; the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method and an F1 score method, specifically, selecting a hyper-parameter based on a random parameter search and/or a grid parameter search method, for example, selecting a group of optimal hyper-parameters from a parameter list based on the random parameter search method and/or the grid parameter search method, wherein an F1 score is adopted as an evaluation index of the hyper-parameters; evaluation: algorithmic assessments are made based on AUC scores, accuracy, precision, recall, F1 scores, and/or log loss. And each algorithm performs the same evaluation, selects an optimal algorithm and outputs the abnormal prediction probability value of the transaction.
Corresponding to the language processing scene, more specifically, for example, latent semantic analysis, word frequency analysis, etc.
Scenario-latent semantic analysis, data-corresponding text information (e.g., summary information, log information, search terms).
Model strategy 5: data processing: word segmentation processing and/or word frequency analysis; the algorithms include at least one of the following (some algorithms are selected for language processing): potential semantic indexing, implicit Dirichlet distribution and conditional random access; the parameter tuning method of the algorithm comprises the following steps: giving out default parameters according to the result of the word frequency analysis, and using the default parameters; further clustering is carried out: the algorithm is as follows: and (4) reducing the dimension (selecting based on the spatial characteristics of the flow pattern space) by adopting at least one of local linear embedding, spectral embedding, multi-dimensional scale analysis and local spatial arrangement, and then clustering by using K-MEANS. Further analysis is performed in connection with specific services.
In this embodiment of the present invention, preferably, after the step of creating a business model according to the selected model policy, the method may further include: displaying modeling design information of the created business model, wherein the modeling design information at least comprises: information of the selected model strategy. Thereby allowing the user to view information of the selected model strategy. The modeling design information may further include: target features and/or scenes.
In this embodiment of the present invention, preferably, after the step of displaying the modeling design information of the created service model, the method may further include:
updating the modeling design information when an operation of adjusting the modeling design information by a user is detected;
and when detecting that the user executes the operation for operating the created service model, operating the created service model according to the updated modeling design information.
That is, the user can customize the content of the modeling design information, thereby improving the user experience.
The service model of the completion of the operation creation at least comprises the following components: training the algorithm for creating the completed business model may, of course, further include: splitting data, processing data, and/or selecting characteristics of data, etc.
In this embodiment of the present invention, preferably, after the step of creating a business model according to the selected model policy, the method may further include: and when detecting that the user executes the operation for running the created service model, running the created service model by adopting the selected model strategy.
Referring to fig. 2, after the user sets the scene, data, and the like for creating the service model on the user interface shown in fig. 2, the user may click a "new" button to display the modeling design information of the created service model. For example, data processing methods, algorithms, parameters of algorithms and/or evaluation methods of algorithms, etc. in the model strategy can be reviewed. Or, when the 'training' key is clicked, the selected model strategy is adopted to create the business model and operate the business model. That is to say, as long as the user clicks the "training" button, the data analysis processing system can create the service model according to the automatically selected model strategy and run the created service model, without the need of the user to select the model strategy, which simplifies the training process, improves the automation degree of the data analysis processing system, and improves the user experience.
In the embodiment of the present invention, after the user clicks "new" the displayed user interface may be as shown in fig. 3, and the modeling design information of the created service model displayed under the user interface includes: basic information, characterization, modeling, and evaluation, wherein the basic information includes targets and training/test sets, the targets including: the method comprises the steps that a scene and a target column, a training/testing set is formed by splitting and/or sampling data and the like, and modeling comprises an algorithm and parameters.
In this embodiment of the present invention, after the step of running the created service model, the method further includes: and displaying modeling design information of the service model which is completed in operation. That is, after the business model is run, the modeling design information of the run-completed business model can also be viewed.
In this embodiment of the present invention, after the step of displaying the modeling design information of the service model that is completed in operation, the method may further include: updating the modeling design information when an operation of adjusting the modeling design information by a user is detected; and when detecting that the user executes the operation for rerunning the operated service model, rerunning the operated service model according to the updated modeling design information.
The modeling design information includes: the information of the model strategy of the service model which is completed by running can also comprise scene and/or target characteristics. That is to say, in the embodiment of the present invention, after the business model training is completed, the modeling design information may be viewed or adjusted, for example, the model strategy, the target feature and/or the scene of the business model are adjusted, other information except data may be adjusted, and the adjusted business model may be rerun.
In this embodiment of the present invention, after the step of running the created service model, the method further includes: displaying a modeling effort of the operational completed business model, the modeling effort may include at least one of: the name of the business model finished in operation, the score of the business model finished in operation, the output result of the business model finished in operation and the like, wherein the output result can be the prediction result of whether the customer loses prediction or not. The name of the business model may be, for example, the algorithm name + time stamp.
In some preferred embodiments of the present invention, the modeling result may further include: the information of the model strategy of the service model completed by the operation, the creation time of the service model completed by the operation, the training information (such as training duration) of the service model completed by the operation, the workflow (also called task, which will be described in the following content) corresponding to the service model completed by the operation, the state (such as success, failure, etc.) of the service model completed by the operation, and the importance ranking information of the features of the data.
In the embodiment of the invention, one model strategy can comprise a plurality of algorithms, so that after the created service model is operated, the information of a plurality of service models can be obtained. Thus, the modeling effort may include: and the information of the first M service models with the highest score in the N service models which are operated and completed and correspond to the selected model strategy, or the modeling results of all the N service models which are operated and completed and correspond to the selected model strategy, wherein M is a positive integer which is greater than or equal to 1, and N is a positive integer which is greater than or equal to M. That is, a model policy may include a plurality of business models, and after the operation is completed, information of part or all of the business models may be displayed.
In the embodiment of the invention, the modeling result of the operated business model can be displayed through the modeling result chart or the modeling result list, wherein the modeling result chart can see the comparison of different business model results of different training of one-time automatic modeling, and is convenient for seeing the better business model in each training. The modeling result list can see the comparison of all the service model results of one-time automatic modeling, including all the service models corresponding to different trainings, and can be completely sequenced, so that all the better service models in the trainings can be conveniently seen.
Referring to fig. 4, fig. 4 is a schematic diagram of a user interface of a modeling result list of a completed business model in an embodiment of the present invention, where the user interface of the modeling result list displays all the model lists, and defaults to displaying the model lists in a reverse order of time.
The field names are shown as follows:
checkboxes: only the successful state is issued, which can be checked; after check-in the bottom [ model assessment ] button becomes available;
model name: displaying specific names (automatic modeling is named with model name + timestamp, workflow is named with output name of source analysis module) and sources (named with source analysis module); displaying the state (smiling face) of putting in a warehouse, supporting sorting, wherein the state is successful, and the click name can enter the page (model achievement detail);
attribution: and displaying the task name of the model. Clicking to enter a [ task details ] page in a new window to support sorting;
create human: displaying creator information and supporting sorting;
creation time: date + time, support ordering;
state: success, failure, in load, - - (representing no evaluation method found);
training time: displaying the training time h m s, if no large unit is needed, not displaying 59 s; score index: can display 6 items at the same time by default through table configuration;
the operation is as follows:
a checking result (eyes) clicks to enter [ model achievement detail page ], and a [ checking result ] button is displayed in a success state;
and checking the log (bookmark), clicking and popping a popup window (log details), and displaying all states (checking the log) through a button.
Referring to fig. 5, fig. 5 is a schematic diagram of a user interface of a modeling result chart of a business model completed in operation according to an embodiment of the present invention.
The user interface right side content-task model display area comprises:
the user interface displays a list of all models for the left-hand selected task.
Task visualization display area: and displaying all model visualization information under the task, including model algorithm parameters, feature importance, training information and the like. All model training contents can be displayed in a linear (curve, broken line and the like) graphic state, and more information can be displayed by suspending the nodes with a mouse.
A model display area: displaying the color identifier, the name of the model, the state identifier, the warehousing state, the champion identifier, the specific score, the starting time, the operation item and the visual chart of the model; when the mouse suspends the table area, switching to a selected state, and corresponding to the selected state of the left task model list one by one;
color identification: the color identification in front of the model name is consistent with the line graph in the right side [ task model visualization score ], and at most 13 different colors (upper limit of support algorithm) are allocated.
Model name: displaying the specific name of the suspension, and completely displaying the suspension; clicking the model name after the successful publishing, and entering the (model detail) page on the current page; when the model fails, the model name turns red and cannot be clicked; when the model is loaded, the model can not be clicked after being selected, and when the model has no evaluation module, the model name is changed into red and can not be clicked after being selected.
The identification status: in loading, the release is successful (icon is not displayed), the failure is caused (icon is not displayed, name is red), and no evaluation method is found (icon is not displayed, name is red, and only evaluation comparison in workflow is limited).
Warehousing identification: the model is updated into the warehouse and the updated to warehouse identification (smiling face in the figure) is displayed. Champion identification: in this task, the champion logo (trophy in the figure) is displayed in front of the better scoring model. (the score may vary depending on the content of the score-screening, subject to the score-screening bar).
Specific scores: the score condition is displayed at most three decimal points (the score is changed according to different contents screened by the score screening column under the influence of the score screening column).
Start time: the task start time, date + time are displayed.
Operation items:
and clicking a viewing result (eyes) to enter a model result detail page, and displaying a view result button in a success state.
And checking the log (bookmark), clicking and popping a popup window (log details), and displaying all states (checking the log) through a button.
The user interface left content-task model list includes:
1. displaying a task list generated by all automatic modeling (workflow) training, and pulling down and loading;
2. the task list is defaulted to be sorted up and down in a time reverse order;
3. clicking a specific name of the task, and opening a page of the task detail in a new window;
4. the task can be deleted, a secondary confirmation prompt is popped up after deletion, all model contents generated in the task are emptied after the deletion is successful, and simultaneously, associated tasks in a task list are deleted (similarly, the tasks are deleted, and associated automatic modeling contents are also deleted) without affecting the contents in a model warehouse;
5. one task may comprise a plurality of models, and color identifiers, model names, state identifiers, warehousing states, champion identifiers and specific scores of the models are displayed;
color identification: the color identification in front of the model name is consistent with the line graph in the right side [ task model visual scoring ], and at most 13 different colors are allocated (the upper limit of the support algorithm);
the model name: the automatic modeling is named by the model name + timestamp, and the workflow is named by the output name of the analysis module; displaying the specific name of the suspension, and completely displaying the suspension; clicking a line where a model name is located, selecting a state in the line, switching right-side content into a current task display area, sliding the current task display area to the model display position, clicking the model name in the current state in the line selection and after the model name is successfully published, and entering the [ model detail ] page on the current page after clicking; when the model fails, the model name is changed into red, and the model name cannot be clicked after being selected; when the model is loaded, the model can not be clicked after being selected; when the model has no evaluation module, the model name is changed into red, and the model name cannot be clicked after being selected;
and (3) marking the state: in loading, the release is successful (icon is not displayed), the release is failed (icon is not displayed, name is changed to red), and no result is obtained (icon is not displayed, name is changed to red);
warehousing identification: if the model is updated to the warehouse, displaying the identifier (such as smiling face in the figure) updated to the warehouse;
champion identification: in the task, a champion mark (such as a winning cup in a picture) is displayed in front of a better-scoring model (influenced by a scoring screening column, and scores can be changed according to different scoring screening contents);
specific scores: the score condition is displayed at most three decimal points (the score is changed according to different contents screened by the score under the influence of a score screening bar).
Model assessments are placed at the bottom of the list, one or more items are checked, the party can use, and clicking a button pops up a [ model assessments ] prompt window (only the check box that issued the success status can be checked).
The substantial contents (model result comparison) of the modeling result chart and the modeling result list are the same. The graphs may better represent the goodness of the same training generative model (e.g., task 001 or task 002), and the lists may better represent the goodness of different training generative models (all training).
The training is typically an iteratively run algorithm model, i.e. a model that runs the algorithm more than once (the hyper-parameter may contain the number of iterations).
In the embodiment of the invention, a user interface of the historical modeling result can be provided, so that a user can conveniently check the historical modeling result.
In the embodiment of the present invention, the automatic modeling method may further include: and creating a first workflow corresponding to the created business model, wherein the first workflow comprises a plurality of workflow modules, the workflow modules can have a connection relationship, and the output of one workflow module is used as the input of the other workflow module in two workflow modules with the connection relationship. For example, the workflow module may include a data module corresponding to data set by a user, and the workflow module may further include an analysis module corresponding to an algorithm in the model policy. The first workflow is not editable and modifiable and can only be viewed. That is, in the automatic modeling, the bottom layer of the data analysis processing system simultaneously creates a task (i.e. the first workflow), and the name of the task, such as the model name + the timestamp, can be automatically generated, and the user function authority of the first workflow is consistent with the user function authority of the automatic modeling.
In this embodiment of the present invention, after the step of creating the first workflow corresponding to the created service model, the method further includes: and updating the first workflow when detecting the operation of running the created service model or detecting the operation of adjusting the modeling design information by the user.
That is, in the above embodiment, when the automatic modeling is created, a first workflow is automatically created, each time the model runs, the first workflow is executed, and each time the model runs, the version of the first workflow is updated accordingly; the runtime is modeled automatically, i.e., the workflow runtime. And when the operation of adjusting the modeling design information by the user is detected, updating the modeling design information, and updating the first workflow according to the updated modeling design information.
In this embodiment of the present invention, after the step of creating the workflow corresponding to the created service model, the method further includes: and when detecting that the user creates a second workflow with the same content as the first workflow, generating the second workflow. The newly-built second workflow can be subjected to operations such as viewing, modifying and editing, and further modify the model automatically modeled and design a complex scene.
In the embodiment of the present invention, a workflow can be newly created through a user interface of a data application, that is, a second workflow having the same content as the first workflow is newly created, which specifically includes the following steps:
1. generating a new data application (i.e., workflow); a modeling result interface (a modeling result chart and a modeling result list) comprises a key for generating data application, and the current workflow can be partially copied through modeling result display.
2. Setting a data application name;
3. describing and displaying original automatic modeling content by default;
4. and a label displaying the original automatic modeling content by default.
In the embodiment of the present invention, referring to fig. 6 and 7, the step of displaying the user interface includes: when an operation of viewing set data by a user is detected, displaying visual information corresponding to the data. That is, after the user sets the data, the set data information may be previewed, and the visualized information may be, for example, a table or a chart, so as to facilitate the user to filter the data. In the embodiment of the invention, the data content is divided into numerical (integer, floating point) type and other non-numerical type values, and the numerical type and the non-numerical type can be respectively displayed.
In the above embodiments, before the model training or after the model training, the information of the model strategy may be displayed for the user to view or modify, and the following describes a user interface for displaying the information of the model strategy.
User interface for training set and test set
In the embodiment of the invention, when model training is carried out, a training set and a test set are required, and the default method for obtaining the training set and the test set is as follows: the method for splitting the current data to obtain the training set and the test set may further be: splitting another data, extracting training and test data from the data, extracting training and test data from both data, extracting training and test data from the other data.
In the embodiment of the present invention, the method for obtaining the training set and the test set of the training service model may include sampling and splitting, and the sampling method may include: 1) not sampling, using all data; 2) original recording; 3) Randomly selecting X% rows; 4) randomly selecting N rows; 5) class balancing N rows; 6) class balance X% rows, etc. The splitting method can comprise the following steps: 1) splitting randomly; 2) starting K-fold cross validation; 3) number of folds/ratio of training data (not enabled); 4) and (4) random seed.
(II) setting and selecting a user interface for feature engineering, including data processing and feature selection
(1) Data processing
1: category-based data processing, comprising: class handling, missing values. The selectable category processing method comprises the following steps: a dummy coded vector; alternative missing value processing methods include numeric processing, padding, deleting rows, etc.
2: numerical-based data processing comprising: numerical processing, missing values. Alternative numerical processing methods include: standard numerical features (Keep a regular numerical feature), binarization based on given values, binning, etc.; alternative missing value handling methods include filling, deleting rows, and the like.
3: alternative processing methods include text-based data processing: word segmentation processing and word frequency analysis
(2) The feature selection comprises:
the selectable feature selection method comprises: mutual information, chi-square test, F test, Pearson correlation coefficient method, recursive feature elimination method, feature model elimination method, etc.; further, feature orthogonalization, principal component analysis of features, matrix decomposition, and the like can also be included. Based on the method selected by the user, the system automatically performs feature selection.
Subsequently, feature selection can be performed again based on feature importance calculated by the model after automatic modeling.
Specifically, the user can also directly customize the selection features:
1: according to the data types of different columns, different variable types (namely characteristics) are distinguished to display the characteristics (such as classification and numerical value);
2: the method supports the sequential ordering of a list by data (the sequence of field names is displayed according to a data report), names (the names a-z 0-9 of the field names), types (category first and number second), roles (target column first, open column second and close column second); the open column indicates: a selected feature; the column closed refers to: there are no features selected.
3: the data column supports multiple selections, one-key full selection and one-key clearing multiple selections;
4: supporting a search;
5: leaving the page and entering again, and keeping the last operation trace;
6: the target column is obviously distinguished from the common column;
(III) user interface for setting and selecting algorithms and parameters
(1) Algorithm
1: all algorithms may display an algorithm profile;
2: the first operation algorithm has a display default value of a default value, and the last operation record is reserved if the operation algorithm is not the first operation algorithm, and the operation is not influenced by opening and closing the button;
3: when the button corresponding to the algorithm is closed, any parameter of the algorithm cannot be adjusted, and obvious display distinction is achieved;
the algorithm comprises the following steps: (1) clustering: K-MEANS, neighbor propagation, mean shift, spectral clustering, hierarchical clustering, density noise, balanced iterative hierarchical clustering and the like; (2) and (4) classification: random forests, gradient progressive trees, XGBoost, decision trees, neighbor algorithms (KNN), extra random numbers, neural networks, logistic regression, support vector machines, random gradient descent, and the like; (3) and (3) regression: random forests, gradient progressive trees, ridge regression, lasso regression, XGboost, decision trees, neighbor algorithms (KNN), additional random numbers, neural networks, lasso paths, logistic regression, support vector machines, random gradient descent, and the like.
(2) User interface for parameters
Setting the hyper-parameters:
1: searching for hyper-parameters
1) Random grid search speed
Whether or not to disturb the original sequence
2) The maximum number of iterations is then selected,
3) maximum search time, only positive integer and floating point types
4) The concurrent number can only be positive integer and-1
Where the hyper-parameters are parameters that are set to values before the learning process is started, not parameter data obtained by training. In general, the hyper-parameters need to be optimized, and a group of optimal hyper-parameters is selected to improve the performance and effect of learning.
Further, the system provides automatic tuning of the hyper-parameters, and the selectable tuning method comprises the following steps: (1) clustering: contour coefficients, Silhouette coefficients, homogeneity, completensity, V-measure, etc.; (2) and (4) classification: AUC score, accuracy, precision, recall, F1 score, log loss, etc.; (3) and (3) regression: r2 value, interpretation variance score, mean error, mean square error, root mean square log error, absolute mean error, and the like. Generally only one can be selected.
Note: the default hyper-parameters are: "randomised": true; "nJobs": 1; "mode": K-FOLD "; "nFalds": 5.
Inclusion of cross-validation user interface settings in a hyper-parameter user interface
1: cross validation
1) Splitting training set/validation set default support input split ratio in traditional manner
Can only be positive integer and floating point type, and defaults to 0.8
2) The default of K-Fold supports the folding number, which can only be positive integer, and the default value is 0
Specifically, the data may be first split into a training set and a test set (see the user interface settings of the training set and the test set); the cross validation part divides the training set into a training set and a validation set. Wherein the validation set is used for cross validation and the test set is used for subsequent evaluation.
Note: usually, all data is not trained, but a part (i.e. the validation set, which is not involved in training) is separated to test the parameters generated by the training set, so as to judge the conformity degree of the parameters to the data outside the training set relatively objectively. This idea is called Cross Validation (Cross Validation).
(IV) user interface for setting and selecting evaluation method
User interface for evaluation method
1: according to different types of algorithms, different model evaluation methods, single selection or one of the models is taken as a core standard of the score, and other evaluation indexes related to the core standard are displayed
The evaluation method comprises the following steps: interpretation variance score, mean absolute deviation, mean square error, R2 score, median absolute error, log mean square error, F1 value, accuracy rate, precision rate, recall rate, AUC score, log loss, cost matrix, cumulative promotion, FBeta score, contour coefficient, homogenity, completeness, V-measure, etc. The evaluation method of the corresponding clustering algorithm comprises the following steps: contour coefficient, homogeneity, completeness, V-measure; the evaluation method corresponding to the multi-classification algorithm comprises the following steps: f1 value, accuracy, precision, recall, AUC score, log loss, FBeta score; the evaluation method corresponding to the two-classification algorithm comprises the following steps: f1 value, accuracy rate, precision rate, recall rate, AUC score, log loss, cost matrix, cumulative promotion degree, FBeta score; the evaluation method of the corresponding regression algorithm comprises the following steps: interpretation variance score, mean absolute deviation, mean square error, R2 value, median absolute error, mean logarithm error.
Note: the default evaluation methods are respectively as follows: and II, classification: AUC score, multi-classification: accuracy, regression: r2 value
After any link is adjusted, the user can store the data, click 'training', check the result and store the result. The user may save a custom model policy for the next use or for other users.
In the embodiment of the invention, after the model is trained, the model can be released, and the model can be released to the warehouse and on-line and the like only when reaching a certain standard, namely, the content meeting a certain scoring index (evaluation standard) can be released to the warehouse and on-line operation is carried out. The model refers to a model automatically modeled or a model newly created by a data generation application. And only the model released to the model warehouse can be on-line, compared and iterated.
The user interface published to the repository may include:
1. clicking a button (releasing to a warehouse), and popping up a popup window (releasing to a model warehouse) on a current page;
2. the pop-up window comprises the following contents: name, description, label;
the condition is satisfied: selecting a pull-down frame: selecting an evaluation method supported by all models; selecting a condition drop-down frame: greater than or equal to, less than or equal to; an input box: if a numerical type is greater than or equal to 0, for example, by AUC score, the conditional drop-down box is selected: greater than or equal to, less than or equal to, the set value.
Automatic update: on or off: after the model is started, updating the model which meets the conditions and is not a power amplifier and is put into a warehouse into a model warehouse; the automatic update time interval defaults to 24 hours;
submitting, clicking a button, popping up an updating progress prompt box, and checking the updating progress of all the models meeting the conditions;
after submission, the button style is changed to the button style of the button. Model warehouse referring to fig. 8, clicking on "online model monitoring" can view online model performance and resource usage. See FIG. 9 for online model performance and resource usage views.
Referring to FIG. 8, a list of all models that are already online may be displayed, with the default being in reverse order of time online. The list shows the following fields: the online model name, the current real-time usage of the container, CPU, MEM, and GPU, the average/minimum/maximum response time within a certain time range (the specific time can be configured, within hours or days), the number of calls, and the success rate.
And clicking an online model name or a 'model detail' button in operation, entering a model result detail page, and browsing the specific information of the model. Clicking the calling times and entering a specific calling detail page.
In the calling detail page, details of calling times within a certain range can be displayed. Displaying the calling conditions of provinces in the country (different colors represent calling times of different degrees) in a visible mode of a domestic map, suspending specific provinces by a mouse, and displaying details comprising specific province names, ranks, calling times and occupation details; and the details of each call, including call time, response time, call type, call mode, access state, province and source, can be checked through the detail list.
The published model achievement list and details may also include the following:
1. the model which is released to the model warehouse can be subjected to on-line operation only through an auditing mechanism;
2. the (batch) model is imported from local and all the modeling result lists are displayed;
3. the batch re-evaluation of the model is supported, and the evaluation result can be checked;
4. the models are supported to be online in an iteration mode, the models which are successfully deployed can replace the models which are online (only one online model can be arranged in each model, at most three models which are successfully deployed are on line by default and wait for the models which are deployed and are successfully deployed, so that the existing denominations of the models which are deployed and the models which are arranged at the upper limit need to be replaced), and the models become online models;
5. when the model is online, whether the model is subjected to characteristic value configuration, resource configuration and debugging mode configuration or not is detected, if not, a resource configuration mode is entered, relevant configuration is carried out, and if yes, skipping is carried out;
6. basic information (name, algorithm type, training time, long row and column during training, new configuration and attributive data application analysis module) of the model can be checked by clicking the model result list;
7. API interface information and APIkey of the display model can debug the interface in three debugging modes of Rest, message queue and file system NFS, but only the on-line model can carry out interface calling;
8. the configuration information of the characteristic value, the resource value and the debugging mode can be checked;
9. displaying the importance of the characteristic variable and displaying the information of each parameter of the model evaluation index;
10. regarding the ROC curve of the performance, the model evaluation result of the confusion matrix is more intuitive chart display information;
11. and displaying algorithm parameter information, training data information and training detailed information of the model.
The issued model result list is different from the modeled result list, and contains performance (the situation after the model is on line, whether calling is successful, the resource situation and the like).
In the embodiment of the present invention, a user interface for model reevaluation may be further provided, and the user interface may perform:
1. selecting an evaluation label: alternative, all evaluation criteria;
2. selecting data, displaying names and descriptions of all data modules with authority (capable of being read), and sequencing the data modules up and down according to the sequence of letters A-Z and 0-9; clicking (previewing), and popping up (data previewing) a page on the current page; supporting screening by using names and description keywords;
3. and clicking [ submitting ], re-evaluating the selected evaluation method and data, and popping up an evaluation result list.
The model generated by automatic modeling and the released model can be re-evaluated. And (4) re-evaluating the model, namely evaluating by using new data, and re-performing modeling design, model training and the like if the evaluation result does not meet the current service requirement.
Referring to fig. 10, an embodiment of the present invention further provides a data analysis processing system, including:
a display module 1101, configured to display a user interface, where the user interface is used for a user to set a scene and data for creating a service model;
a processing module 1102, configured to acquire a scene and/or data set by a user on the user interface; selecting a model strategy from a plurality of model strategies according to the acquired scene and/or data, and creating a business model according to the selected model strategy, wherein the model strategy at least comprises the following information: an algorithm and a parameter tuning method of the algorithm.
Preferably, the model policy further comprises at least one of the following information: the method comprises an evaluation method of the algorithm, a parameter setting method of the algorithm, a splitting method of the data, a processing method of the data and a feature selection method of the data.
Preferably, the user interface is further used for setting target characteristics for creating the business model by the user.
Preferably, the display module 1101 is configured to display a scene form on the user interface for selection by a user; when an operation that a user selects a scene in the scene form is detected, displaying the selected scene on the user interface;
or
The display module 1101 is configured to display a scene input area on the user interface; when detecting the operation of inputting a scene in the input area by a user, acquiring the scene input by the user; and displaying the scenes matched with the scenes input by the user in the scene form on the user interface.
Preferably, the scene comprises at least one of: a scene corresponding to a clustering algorithm, a scene corresponding to a classification algorithm, a scene corresponding to a regression algorithm, a scene corresponding to anomaly detection, and a scene corresponding to language processing.
Preferably, when the scene is a scene of a corresponding clustering algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: hierarchical clustering, Bayesian Gaussian mixing, KD tree and restricted Boltzmann machine, wherein the parameter tuning method of the algorithm is performed based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method, a contour coefficient method;
when the scene is a scene corresponding to a classification algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: the method comprises the following steps of logistic regression, random forest, Bagging, AdaBoost, a neural network and a stack model, wherein a parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter searching method, a grid parameter searching method and an area under curve AUC fraction method;
when the scene is a scene corresponding to a regression algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: the method comprises the following steps of logistic regression, random forest, support vector regression and neural network, wherein the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the method for the hyper-parameter optimization comprises at least one of the following steps: a random parameter search method, a grid parameter search method, an R2 value method;
when the scene is a scene corresponding to anomaly detection, the information of the selected model strategy comprises: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: a neural network, a support vector machine, a robust regression, nearest neighbors, and isolated forests; the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method, an F1 score method;
when the scene is a scene processed by a corresponding language, the information of the selected model strategy comprises: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: latent semantic indexing, latent Dirichlet distribution, conditional random fields; the parameter tuning method of the algorithm comprises the following steps: and giving a default parameter according to the result of the word frequency analysis, and using the default parameter.
Preferably, the display module is further configured to display modeling design information of the created service model, where the modeling design information at least includes: information of the selected model strategy.
Preferably, the data analysis processing system further comprises:
the first adjusting module is used for updating the modeling design information when the operation of adjusting the modeling design information by a user is detected;
and the first operation module is used for operating the created service model according to the updated modeling design information when detecting that the user executes the operation for operating the created service model.
Preferably, the data analysis processing system further comprises:
and the second operation module is used for adopting the selected model strategy to operate the created service model when detecting that the user executes the operation of operating the created service model.
Preferably, the first and second electrodes are formed of a metal,
the display module is further configured to display a modeling result of the service model completed in operation, where the modeling result includes at least one of: the name of the operated business model, the score of the operated business model and the output result of the operated business model.
Preferably, the modeling result further comprises: the information of the model strategy of the operated business model, the creation time of the operated business model, the training information of the operated business model, the workflow corresponding to the operated business model, the state of the operated business model and the importance ranking information of the data characteristics.
Preferably, the modeling effort includes: and the information of the first M service models with the highest score in the N service models which are operated and completed and correspond to the selected model strategy, or the information of all the N service models which are operated and completed and correspond to the selected model strategy, wherein M is a positive integer which is greater than or equal to 1, and N is a positive integer which is greater than or equal to M.
Preferably, the display module is further configured to display modeling design information of the service model that is completed in operation, where the modeling design information at least includes: information of the selected model strategy.
Preferably, the data analysis processing system further comprises:
the second adjusting module is used for updating the modeling design information when the operation of adjusting the modeling design information by a user is detected;
and the third running module is used for re-running the operated service model according to the updated modeling design information when detecting that the user executes the operation for re-running the operated service model.
Preferably, the modeling design information further includes: scene and/or target features.
Preferably, the data analysis processing system further comprises:
and the creating module is used for creating a first workflow corresponding to the created service model, and the first workflow comprises a plurality of workflow modules.
Preferably, the data analysis processing system further comprises:
and the updating module is used for updating the first workflow when detecting the operation of running the created service model or detecting the operation of adjusting the modeling design information by a user.
Preferably, the data analysis processing system further comprises:
and the copying module is used for generating a second workflow when detecting that the user creates the operation of creating the second workflow with the same content as the first workflow, and the second workflow can be edited.
Preferably, the data analysis processing system further comprises:
and the visualization module is used for displaying visualization information corresponding to the data when the operation of viewing the set data by the user is detected.
Preferably, the data analysis processing system further comprises:
and the issuing module is used for issuing the operation completed business model when the operation of issuing the operation completed business model by the user is detected.
Preferably, the data analysis processing system further comprises:
and the re-evaluation module is used for re-evaluating the operated service model or the issued service model when detecting that the user re-evaluates the operation of the operated service model or the issued service model.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a data analysis processing system according to another embodiment of the present invention, the data analysis processing system 120 includes: a processor 1201 and a memory 1202. In an embodiment of the present invention, the data analysis processing system 120 further includes: a computer program stored on the memory 1202 and executable on the processor 1201, the computer program when executed by the processor 1201 performing the steps of:
displaying a user interface for a user to set scenes and data for creating a business model;
acquiring scenes and/or data set on the user interface by a user, selecting a model strategy from a plurality of model strategies according to the acquired scenes and/or data, and creating a service model according to the selected model strategy, wherein the model strategy at least comprises the following information: an algorithm and a parameter tuning method of the algorithm.
The processor 1201 is responsible for managing a bus architecture and general processing, and the memory 112 may store data used by the processor 1201 in performing operations.
Preferably, the model policy further comprises at least one of the following information: the method comprises an evaluation method of the algorithm, a parameter setting method of the algorithm, a splitting method of the data, a processing method of the data and a feature selection method of the data.
Preferably, the user interface is further used for setting target characteristics for creating the business model by the user.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: displaying a scene form on the user interface for selection by a user;
when an operation that a user selects a scene in the scene form is detected, displaying the selected scene on the user interface;
or
Displaying a scene input area on the user interface;
when the operation that a user inputs a scene in the input area is detected, acquiring the scene input by the user;
and displaying the scenes matched with the scenes input by the user in the scene form on the user interface.
Preferably, the scene comprises at least one of: a scene corresponding to a clustering algorithm, a scene corresponding to a classification algorithm, a scene corresponding to a regression algorithm, a scene corresponding to anomaly detection, and a scene corresponding to language processing.
Preferably, when the scene is a scene corresponding to a clustering algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: hierarchical clustering, Bayesian Gaussian mixing, KD tree and restricted Boltzmann machine, wherein the parameter tuning method of the algorithm is performed based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method, a contour coefficient method;
when the scene is a scene corresponding to a classification algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: the method comprises the following steps of logistic regression, random forest, Bagging, AdaBoost, a neural network and a stack model, wherein a parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter searching method, a grid parameter searching method and an area under curve AUC fraction method;
when the scene is a scene corresponding to a regression algorithm, the information of the selected model strategy includes: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: the method comprises the following steps of logistic regression, random forest, support vector regression and neural network, wherein the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the method for the hyper-parameter optimization comprises at least one of the following steps: a random parameter search method, a grid parameter search method, an R2 value method;
when the scene is a scene corresponding to anomaly detection, the information of the selected model strategy comprises: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: a neural network, a support vector machine, a robust regression, nearest neighbors, and isolated forests; the parameter tuning method of the algorithm is carried out based on hyper-parameter optimization, and the hyper-parameter optimization method comprises at least one of the following steps: a random parameter search method, a grid parameter search method, an F1 score method;
when the scene is a scene processed by a corresponding language, the information of the selected model strategy comprises: an algorithm and a parameter tuning method of the algorithm, wherein the algorithm comprises at least one of the following: latent semantic indexing, latent Dirichlet distribution, conditional random fields; the parameter tuning method of the algorithm comprises the following steps: and giving a default parameter according to the result of the word frequency analysis, and using the default parameter.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of selecting a model strategy from the plurality of model strategies, the method further comprises:
displaying modeling design information of the created business model, wherein the modeling design information at least comprises: information of the selected model strategy.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of displaying the modeling design information of the created service model, the method further comprises the following steps:
updating the modeling design information when an operation of adjusting the modeling design information by a user is detected;
and when detecting that the user executes the operation for operating the created service model, operating the created service model according to the updated modeling design information.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of selecting a model strategy from the plurality of model strategies, the method further comprises:
and when detecting that the user executes the operation for running the created service model, adopting the selected model strategy to run the created service model.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of running the created service model, the method further includes:
displaying a modeling effort of the operational completed business model, the modeling effort including at least one of: the name of the operated business model, the score of the operated business model and the output result of the operated business model.
Preferably, the modeling result further comprises: the information of the model strategy of the operated business model, the creation time of the operated business model, the training information of the operated business model, the workflow corresponding to the operated business model, the state of the operated business model and the importance ranking information of the data characteristics.
Preferably, the modeling result comprises: and the information of the first M service models with the highest score in the N service models which are operated and completed and correspond to the selected model strategy, or the information of all the N service models which are operated and completed and correspond to the selected model strategy, wherein M is a positive integer which is greater than or equal to 1, and N is a positive integer which is greater than or equal to M.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of running the created service model, the method further includes:
displaying modeling design information of the service model which is completed in operation, wherein the modeling design information at least comprises: information of the selected model strategy;
updating the modeling design information when an operation of adjusting the modeling design information by a user is detected;
and when detecting that the user executes the operation for rerunning the operated service model, rerunning the operated service model according to the updated modeling design information.
Preferably, the modeling design information further includes: scene and/or target features.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: and creating a first workflow corresponding to the created business model, wherein the first workflow comprises a plurality of workflow modules.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of creating the first workflow corresponding to the created business model, the method further includes:
and updating the first workflow when detecting the operation of running the created service model or detecting the operation of adjusting the information of the run service model by a user.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of creating the first workflow corresponding to the created business model, the method further includes:
when the operation that a user creates a second workflow with the same content as the first workflow is detected, the second workflow is generated, and the second workflow can be edited.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of displaying the user interface, the method further comprises:
when an operation of viewing the set data by a user is detected, displaying visual information corresponding to the data.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of running the created service model, the method further includes:
and when the operation that the user publishes the operation-finished service model is detected, publishing the operation-finished service model.
Preferably, the computer program when executed by the processor 1201 further implements the steps of: after the step of running the created service model, the method further includes:
and when the operation that the user reevaluates the operated service model or the issued service model is detected, reevaluating the operated service model or the issued service model.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the embodiment of the automatic modeling method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The foregoing is a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should be construed as the protection scope of the present invention.

Claims (10)

1. An automatic modeling method for a data analysis processing system, comprising:
displaying a user interface for a user to set scenes and data for creating a business model;
acquiring scenes and/or data set on the user interface by a user, selecting a model strategy from a plurality of model strategies according to the acquired scenes and/or data, and creating a service model according to the selected model strategy, wherein the model strategy at least comprises the following information: an algorithm and a parameter tuning method of the algorithm.
2. The automated modeling method of claim 1, wherein the model policy further comprises at least one of the following information: the method comprises an evaluation method of the algorithm, a parameter setting method of the algorithm, a splitting method of the data, a processing method of the data and a feature selection method of the data.
3. The automated modeling method of claim 1, wherein the scenario includes at least one of: a scene corresponding to a clustering algorithm, a scene corresponding to a classification algorithm, a scene corresponding to a regression algorithm, a scene corresponding to anomaly detection, and a scene corresponding to language processing.
4. The automated modeling method of claim 1, further comprising, after the step of creating a business model according to the selected model policy:
displaying modeling design information of the created business model, wherein the modeling design information at least comprises: information of the selected model strategy.
5. The automated modeling method of claim 1, further comprising, after the step of creating a business model according to the selected model policy:
and when detecting that the user executes the operation for running the created service model, adopting the selected model strategy to run the created service model.
6. A data analysis processing system, comprising:
the system comprises a display module, a service module and a service module, wherein the display module is used for displaying a user interface, and the user interface is used for setting scenes and data for creating a service model by a user;
the processing module is used for acquiring scenes and/or data set on the user interface by a user; selecting a model strategy from a plurality of model strategies according to the acquired scene and/or data, and creating a business model according to the selected model strategy, wherein the model strategy at least comprises the following information: an algorithm and a parameter tuning method of the algorithm.
7. The data analysis processing system of claim 6, wherein the model policy further comprises at least one of: the method comprises an evaluation method of the algorithm, a parameter setting method of the algorithm, a splitting method of the data, a processing method of the data and a feature selection method of the data.
8. The data analysis processing system of claim 6, wherein the scenario includes at least one of: a scene corresponding to a clustering algorithm, a scene corresponding to a classification algorithm, a scene corresponding to a regression algorithm, a scene corresponding to anomaly detection, and a scene corresponding to language processing.
9. The data analysis processing system of claim 6,
the display module is further configured to display modeling design information of the created service model, where the modeling design information at least includes: information of the selected model strategy.
10. The data analysis processing system of claim 6, further comprising:
and the second operation module is used for adopting the selected model strategy to operate the created service model when detecting that the user executes the operation of operating the created service model.
CN202111299347.7A 2018-06-19 2018-06-19 Data analysis processing system and automatic modeling method Pending CN113935434A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111299347.7A CN113935434A (en) 2018-06-19 2018-06-19 Data analysis processing system and automatic modeling method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111299347.7A CN113935434A (en) 2018-06-19 2018-06-19 Data analysis processing system and automatic modeling method
CN201810632499.6A CN109389143A (en) 2018-06-19 2018-06-19 A kind of Data Analysis Services system and method for automatic modeling

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201810632499.6A Division CN109389143A (en) 2018-06-19 2018-06-19 A kind of Data Analysis Services system and method for automatic modeling

Publications (1)

Publication Number Publication Date
CN113935434A true CN113935434A (en) 2022-01-14

Family

ID=65416532

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111299347.7A Pending CN113935434A (en) 2018-06-19 2018-06-19 Data analysis processing system and automatic modeling method
CN201810632499.6A Pending CN109389143A (en) 2018-06-19 2018-06-19 A kind of Data Analysis Services system and method for automatic modeling

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201810632499.6A Pending CN109389143A (en) 2018-06-19 2018-06-19 A kind of Data Analysis Services system and method for automatic modeling

Country Status (1)

Country Link
CN (2) CN113935434A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610204A (en) * 2022-03-14 2022-06-10 中国农业银行股份有限公司 Auxiliary device and method for data processing, storage medium and electronic equipment
CN115455135A (en) * 2022-06-30 2022-12-09 北京九章云极科技有限公司 Visual automatic modeling method and device, electronic equipment and storage medium

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724185A (en) * 2019-03-21 2020-09-29 北京沃东天骏信息技术有限公司 User maintenance method and device
CN110083637B (en) * 2019-04-23 2023-04-18 华东理工大学 Bridge disease rating data-oriented denoising method
CN110222710B (en) * 2019-04-30 2022-03-08 北京深演智能科技股份有限公司 Data processing method, device and storage medium
CN110135064B (en) * 2019-05-15 2023-07-18 上海交通大学 Method, system and controller for predicting temperature faults of rear bearing of generator
CN110443126A (en) * 2019-06-27 2019-11-12 平安科技(深圳)有限公司 Model hyper parameter adjusts control method, device, computer equipment and storage medium
CN110334955B (en) * 2019-07-08 2021-09-14 北京字节跳动网络技术有限公司 Index evaluation processing method, device, equipment and storage medium
CN110705312B (en) * 2019-09-30 2021-05-07 贵州航天云网科技有限公司 Development system for rapidly developing industrial mechanism model based on semantic analysis
CN110717535B (en) * 2019-09-30 2020-09-11 北京九章云极科技有限公司 Automatic modeling method and system based on data analysis processing system
CN110766167B (en) * 2019-10-29 2021-08-06 深圳前海微众银行股份有限公司 Interactive feature selection method, device and readable storage medium
CN110807044A (en) * 2019-10-30 2020-02-18 东莞市盟大塑化科技有限公司 Model dimension management method based on artificial intelligence technology
CN110956272B (en) * 2019-11-01 2023-08-08 第四范式(北京)技术有限公司 Method and system for realizing data processing
CN111242358A (en) * 2020-01-07 2020-06-05 杭州策知通科技有限公司 Enterprise information loss prediction method with double-layer structure
CN111784040B (en) * 2020-06-28 2023-04-25 平安医疗健康管理股份有限公司 Optimization method and device for policy simulation analysis and computer equipment
CN112380216B (en) * 2020-11-17 2023-07-28 北京融七牛信息技术有限公司 Automatic feature generation method based on intersection
CN112577955A (en) * 2020-11-23 2021-03-30 淮阴师范学院 Water bloom water body detection method and system
CN112633754A (en) * 2020-12-30 2021-04-09 国网新疆电力有限公司信息通信公司 Modeling method and system of data analysis model
CN113010946B (en) * 2021-02-26 2024-01-23 深圳市万翼数字技术有限公司 Data analysis method, electronic equipment and related products
CN113010226A (en) * 2021-03-16 2021-06-22 北京云从科技有限公司 Model loading method, system, electronic device and medium
CN113239025B (en) * 2021-04-23 2022-08-19 四川大学 Ship track classification method based on feature selection and hyper-parameter optimization
CN112884092B (en) * 2021-04-28 2021-11-02 深圳索信达数据技术有限公司 AI model generation method, electronic device, and storage medium
CN113282461B (en) * 2021-05-28 2023-06-23 中国联合网络通信集团有限公司 Alarm identification method and device for transmission network
CN113449471B (en) * 2021-06-25 2022-08-30 东北电力大学 Wind power output simulation generation method for continuously improving MC (multi-channel) by utilizing AP (access point) clustering-skipping
CN113822327A (en) * 2021-07-31 2021-12-21 云南电网有限责任公司信息中心 Algorithm recommendation method based on data characteristics and analytic hierarchy process
CN114117050B (en) * 2021-11-30 2022-08-05 济南农村商业银行股份有限公司 Full-automatic accounting flow popup window processing method, device and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101169798A (en) * 2007-12-06 2008-04-30 中国电信股份有限公司 Data excavation system and method
CN104850405A (en) * 2015-05-25 2015-08-19 武汉众联信息技术股份有限公司 Intelligent configurable workflow engine and implementation method therefor
CN105095436A (en) * 2015-07-23 2015-11-25 苏州国云数据科技有限公司 Automatic modeling method for data of data sources
CN106164945A (en) * 2014-04-11 2016-11-23 微软技术许可有限责任公司 Sight modeling and visualization
CN106250987A (en) * 2016-07-22 2016-12-21 无锡华云数据技术服务有限公司 A kind of machine learning method, device and big data platform
CN106997386A (en) * 2017-03-28 2017-08-01 上海跬智信息技术有限公司 A kind of OLAP precomputations model, method for automatic modeling and automatic modeling system
CN107038167A (en) * 2016-02-03 2017-08-11 普华诚信信息技术有限公司 Big data excavating analysis system and its analysis method based on model evaluation
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107958268A (en) * 2017-11-22 2018-04-24 用友金融信息技术股份有限公司 The training method and device of a kind of data model

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101169798A (en) * 2007-12-06 2008-04-30 中国电信股份有限公司 Data excavation system and method
CN106164945A (en) * 2014-04-11 2016-11-23 微软技术许可有限责任公司 Sight modeling and visualization
CN104850405A (en) * 2015-05-25 2015-08-19 武汉众联信息技术股份有限公司 Intelligent configurable workflow engine and implementation method therefor
CN105095436A (en) * 2015-07-23 2015-11-25 苏州国云数据科技有限公司 Automatic modeling method for data of data sources
CN107038167A (en) * 2016-02-03 2017-08-11 普华诚信信息技术有限公司 Big data excavating analysis system and its analysis method based on model evaluation
CN106250987A (en) * 2016-07-22 2016-12-21 无锡华云数据技术服务有限公司 A kind of machine learning method, device and big data platform
CN106997386A (en) * 2017-03-28 2017-08-01 上海跬智信息技术有限公司 A kind of OLAP precomputations model, method for automatic modeling and automatic modeling system
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107958268A (en) * 2017-11-22 2018-04-24 用友金融信息技术股份有限公司 The training method and device of a kind of data model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114610204A (en) * 2022-03-14 2022-06-10 中国农业银行股份有限公司 Auxiliary device and method for data processing, storage medium and electronic equipment
CN114610204B (en) * 2022-03-14 2024-03-26 中国农业银行股份有限公司 Auxiliary device and method for data processing, storage medium and electronic equipment
CN115455135A (en) * 2022-06-30 2022-12-09 北京九章云极科技有限公司 Visual automatic modeling method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109389143A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN113935434A (en) Data analysis processing system and automatic modeling method
Kelleher et al. Data science
US6988090B2 (en) Prediction analysis apparatus and program storage medium therefor
US7818286B2 (en) Computer-implemented dimension engine
JPH0877010A (en) Method and device for data analysis
CN107924384A (en) For the system and method using study model prediction result is predicted
CN107622427A (en) The method, apparatus and system of deep learning
US11151480B1 (en) Hyperparameter tuning system results viewer
CN108491511A (en) Data digging method and device, model training method based on diagram data and device
JP7125358B2 (en) Method of presenting information on basis of prediction results for computer system and input data
CN112559900B (en) Product recommendation method and device, computer equipment and storage medium
Guruler et al. Modeling student performance in higher education using data mining
US20220172258A1 (en) Artificial intelligence-based product design
Seymen et al. Customer churn prediction using deep learning
CN113177643A (en) Automatic modeling system based on big data
JP5905651B1 (en) Performance evaluation apparatus, performance evaluation apparatus control method, and performance evaluation apparatus control program
Jeyaraman et al. Practical Machine Learning with R: Define, build, and evaluate machine learning models for real-world applications
US20210356920A1 (en) Information processing apparatus, information processing method, and program
JP2018088087A (en) Data analyzer, data analysis method and data analysis program
CN114519073A (en) Product configuration recommendation method and system based on atlas relation mining
Shah et al. Predictive Analytic Modeling: A Walkthrough
US20200342302A1 (en) Cognitive forecasting
CN113537731A (en) Design resource capacity evaluation method based on reinforcement learning
JP3452308B2 (en) Data analyzer
Liu Apache spark machine learning blueprints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination