WO2021109928A1 - 机器学习方案模板的创建方法、使用方法及装置 - Google Patents

机器学习方案模板的创建方法、使用方法及装置 Download PDF

Info

Publication number
WO2021109928A1
WO2021109928A1 PCT/CN2020/132093 CN2020132093W WO2021109928A1 WO 2021109928 A1 WO2021109928 A1 WO 2021109928A1 CN 2020132093 W CN2020132093 W CN 2020132093W WO 2021109928 A1 WO2021109928 A1 WO 2021109928A1
Authority
WO
WIPO (PCT)
Prior art keywords
configuration
input source
template
parameter
machine learning
Prior art date
Application number
PCT/CN2020/132093
Other languages
English (en)
French (fr)
Inventor
孔维
宋尧
王萌
吕自荟
朱晓丹
李冠琳
黄缨宁
周振华
Original Assignee
第四范式(北京)技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 第四范式(北京)技术有限公司 filed Critical 第四范式(北京)技术有限公司
Publication of WO2021109928A1 publication Critical patent/WO2021109928A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure generally relates to the field of artificial intelligence, and more specifically, to a method for creating, using, and a device for a machine learning solution template.
  • Machine learning is an inevitable product of the development of artificial intelligence research to a certain stage. It is committed to improving the performance of the system itself through computational means and using experience.
  • experience usually exists in the form of "data”.
  • a "model” can be generated from data. That is to say, by providing empirical data to the machine learning algorithm, it can be generated based on these empirical data
  • the model when faced with a new situation, the model will provide the corresponding judgment, that is, the prediction result. It can be seen that how to generate a model based on empirical data (ie, the machine learning modeling process) is the key to machine learning technology.
  • the exemplary embodiments of the present disclosure aim to overcome the disadvantages of the prior art that the machine learning modeling process is time-consuming and requires a high threshold.
  • a method for creating a machine learning solution template including: obtaining a template solution for describing at least part of a machine learning process marked for at least one input source, wherein the machine learning process Involving at least one of model training and model application; acquiring input source configuration limitation information about the template solution, wherein the input source configuration limitation information is used to generate an input source configuration interface, so that the configuration interface via the input source The configured at least one configuration input source replaces at least one input source tag in the template solution; and the template file of the machine learning solution template is generated based on the obtained template solution and input source configuration limitation information.
  • a method for executing a machine learning process based on a machine learning solution template includes: obtaining a template file of a machine learning solution template, wherein the template file includes the template solution and the input source configuration definition Information, the template solution is used to describe at least part of the machine learning process for at least one input source tag, the machine learning process involves at least one of model training and model application, and the input source configuration defining information is used to generate input Source configuration interface; showing the input source configuration interface generated based on the input source configuration limitation information to the second user; acquiring at least one configuration input source configured by the second user via the input source configuration interface; using the acquired at least one configuration input The source replaces at least one input source tag in the template solution to obtain the modified template solution; and the machine learning process is performed based on the modified template solution.
  • an apparatus for creating a machine learning solution template which includes: a first acquisition module for acquiring a template used to describe at least part of the machine learning process marked for at least one input source Solution, wherein the machine learning process involves at least one of model training and model application; the second acquisition module is used to acquire input source configuration limitation information about the template solution, wherein the input source configuration limitation information is used for Generating an input source configuration interface such that at least one configuration input source configured via the input source configuration interface replaces at least one input source tag in the template solution; and a generating module for limiting information based on the acquired template solution and input source configuration Generate template files for machine learning program templates.
  • an apparatus for executing a machine learning process based on a machine learning solution template which includes: a first obtaining module for obtaining a template file of the machine learning solution template, wherein the template file includes A template scheme and input source configuration defining information, the template scheme is used to describe at least part of a machine learning process marked for at least one input source, the machine learning process involves at least one of model training and model application, the input source
  • the configuration limited information is used to generate the input source configuration interface;
  • the first display module is used to display the input source configuration interface generated based on the input source configuration limited information to the second user;
  • the second acquisition module is used to obtain the second user via the At least one configuration input source configured in the input source configuration interface; a first replacement module for replacing at least one input source mark in the template scheme with the acquired at least one configuration input source to obtain the modified template scheme; execution module , Used to execute the machine learning process based on the modified template scheme.
  • a system including at least one computing device and at least one storage device storing instructions
  • the instructions when the instructions are executed by the at least one computing device, the instructions cause the at least one computing device to execute for creating The steps of a method for a machine learning solution template or a method for performing a machine learning process based on a machine learning solution template
  • the steps of the method for creating a machine learning solution template include: obtaining at least one tag for describing at least one input source A template solution for part of the machine learning process, where the machine learning process involves at least one of model training and model application; acquiring input source configuration limitation information about the template solution, wherein the input source configuration limitation information is used to generate Input source configuration interface, so that at least one configuration input source configured via the input source configuration interface replaces at least one input source tag in the template solution; and generating a machine learning solution template based on the acquired template solution and input source configuration limitation information A template file; wherein the steps of the method for executing a machine learning process based on
  • a computer-readable storage medium storing instructions, wherein, when the instructions are executed by at least one computing device, at least one computing device is prompted to execute the first aspect or the second aspect of the present disclosure. The methods mentioned in each aspect.
  • the method for creating, using, and apparatus for a machine learning solution template by reusing the machine learning solution template, it is possible to lower the modeling threshold and reduce modeling time-consuming, while the template file of the machine learning solution template
  • the input source configuration limited information in can be used to solve the data matching problem between the actual business data and the inherent data in the template solution, so that the machine learning solution template can be obtained when it is applied to the business data of different data structures in the same business direction. Better modeling effect.
  • Fig. 1 shows a flowchart of a method for creating a machine learning solution template according to an exemplary embodiment of the present disclosure
  • Figure 2 shows a schematic diagram of the creation interface of a machine learning solution template
  • Fig. 3 shows a flowchart of a method for executing a machine learning process based on a machine learning scheme template according to an exemplary embodiment of the present disclosure
  • Figure 4 shows a schematic diagram of the configuration interface of the machine learning solution template
  • Fig. 5 shows a structural block diagram of an apparatus for creating a machine learning solution template according to an exemplary embodiment of the present disclosure
  • Fig. 6 shows a structural block diagram of an apparatus for creating a machine learning scheme template according to an exemplary embodiment of the present disclosure.
  • “perform at least one of step one and step two” means the following three parallel situations: (1) perform step one; (2) perform step two; (3) Perform steps one and two. That is to say, “A and/or B” can also be expressed as “at least one of A and B”, and “perform step one and/or step two” can also be expressed as "perform one of step one and step two.” At least one of ".
  • Fig. 1 shows a flowchart of a method for creating a machine learning scheme template according to an exemplary embodiment of the present disclosure.
  • the method shown in FIG. 1 can be completely implemented in software through a computer program, and the method shown in FIG. 1 can also be executed by a specially configured computing device.
  • step S110 a template solution for describing at least part of the machine learning process for at least one input source tag is obtained.
  • machine learning modeling schemes under different business scenarios are often quite different.
  • the machine learning modeling solution in the time series scenario focuses on the construction of time series windows and is more sensitive to time series data
  • machine learning modeling solution in the marketing scenario pays more attention to data such as products and user tags.
  • the template solution can be regarded as a set of general machine learning modeling solutions in the business scenario defined by the user (for easy distinction, may be referred to as the first user) according to the business scenario summary.
  • the first user may refer to a scientist with rich experience in machine learning modeling.
  • the template solution may include at least part of the configuration of each step of the machine learning process.
  • the template solution can describe at least part of the processing objects, processing methods, processing results and other configuration information involved in each step of the machine learning process.
  • model training and/or model application can also be expressed as: at least one of model training and model application.
  • Model training refers to the training process of a machine learning model, which can include but is not limited to at least one of the following steps: data import, data split, feature extraction, model training, model testing and model evaluation, and a detailed description of each step You can refer to the existing machine learning knowledge, which will not be repeated in this disclosure.
  • Model application refers to the application process of a machine learning model. For example, it can refer to a process of using a trained machine learning model to predict data to obtain a prediction result. As an example, it can include processing such as packaging applications, deploying online, and providing services.
  • the template scheme may be a document written based on a specific language for describing at least part of the machine learning process.
  • the template scheme may be a DAG (directed acyclic graph) file, and the DAG file may describe the configuration information of the machine learning steps represented by each node (that is, the processing node described below) in the directed acyclic graph.
  • the template scheme includes one or more input sources, and the input source refers to the input source used in the machine learning process described by the template scheme.
  • the input source may include, but is not limited to, input tables and/or fields.
  • Input tables are also data tables used in the machine learning process described by the template scheme
  • fields are also fields used in the machine learning process described by the template scheme, such as input Fields in the table. It should be noted here that the input table and/or field can also be expressed as: at least one of the input table and the field.
  • the template solution can be regarded as a general machine learning modeling solution in a specific business scenario.
  • business data with different data structures may also be used in the same business scenario, that is, the input source and actual business data in the template solution may have certain data structures. difference.
  • the input source in the template solution may be an input source that can be replaced by actual business data.
  • the replacement please refer to the relevant description below.
  • the present disclosure may use input source tags to characterize input sources that can be replaced in the template scheme.
  • the input source tag is only used to refer to the input source that can be replaced in the template solution, and the present disclosure does not limit the limited form of the input source in the template solution. That is, the input source that can be replaced in the template scheme can be identified by a special mark or not.
  • step S120 the input source configuration limitation information about the template solution is acquired.
  • the input source configuration defining information is used to generate an input source configuration interface, so that at least one configuration input source configured via the input source configuration interface replaces at least one input source tag in the template solution.
  • the input source configuration interface refers to the interface displayed for users who use the machine learning solution template (for easy distinction, it can be called the second user), which is used to assist the second user to mark the actual business data with the input source in the template solution Correspondingly.
  • the input source configuration limited information can be set by the first user, and the first user can set the input source configuration limited information for one or more input sources in the template solution.
  • the present disclosure can obtain the input source configuration limitation information set by the first user in various ways. For example, the first user can generate a file including the input source configuration limitation information through, but not limited to, editing the document.
  • the present disclosure can provide the first user with a file upload interface and obtain the input source from the file uploaded by the first user through the interface. Configure restricted information.
  • the present disclosure may also provide a visual operation interface to the first user, and obtain the input source configuration limitation information set by the first user according to an operation performed by the first user on the visual operation interface.
  • a control for setting input source configuration limitation information may be generated based on the acquired template scheme, the generated control is displayed to the first user, and the input source configuration limitation information set by the first user through the control is received.
  • the input source indicating the input table
  • the template scheme can be parsed to determine the input tables involved in the template scheme, and the first user can be provided with controls for adding input tables.
  • the control adds an input table, and can set relevant input source configuration limited information for the input table based on other controls.
  • the input source configuration limited information may include any information that can be used to assist the second user to associate actual business data with the input source in the template solution.
  • the input source configuration limited information may include but is not limited to at least one of the following items: the required configuration displayed on the input source configuration interface The name of at least one input table, the processing node corresponding to each input table, the name of each field that needs to be configured under each input table, and the indication information of whether each field is displayed as an optional field on the input source configuration interface.
  • the input table name refers to the table name displayed by the second user.
  • the first user can use the name of the input table in the template scheme as the table name displayed to the second user, or name the input table name according to the business meaning of the input table, so that the second user can configure the input table with the same or similar business meaning data sheet.
  • the processing node corresponding to the input table is used to represent the node that processes the input table in the machine learning process described by the template scheme, that is, in which machine learning step the input table is processed.
  • the template solution may be the DAG file mentioned above.
  • Each processing node in the DAG file may have a corresponding node ID, and the node ID may be used to characterize the processing node corresponding to the input table.
  • the name of the field refers to the field name displayed by the second user.
  • the first user can use the original name of the field in the input table as the field name displayed to the second user, or rename the field name according to the business meaning of the field, so that the second user can match the field in the business data table with the input according to the business meaning
  • the fields in the table correspond.
  • the fields that need to be configured under each input table can also include fields that do not exist in the input table, that is, extended fields. For example, suppose input table A includes field a, field b, and field c.
  • field d is an extension field.
  • the extension field can be set by the first user.
  • the first user can start from the business scenario and add one or more extension fields that do not exist in the input table to the input table according to the business data structure that may exist in the business scenario, and set the extension field The field name.
  • you can also set the processing mode of the extension field.
  • each extension field added to the input table can also be regarded as a field category, and different extension fields correspond to different field categories.
  • the extension field By setting the extension field, it is possible to provide support for the matching of additional fields in the actual business data beyond the fields involved in the template scheme. That is, the additional fields can be associated with the extension fields, and the processing method of the extension fields is preset. The additional fields can also participate in machine learning to realize their data value, thereby enhancing the data adaptability of the template solution.
  • the indication information is used to indicate whether the field is optional.
  • the optional field means that the second user can decide whether to configure the field in the actual service data for the field according to the actual situation.
  • the non-optional field that is, the required field, means that the second user needs to configure at least one field in the actual service data for the field.
  • the field can be set as an optional field or a required field according to the characteristics (such as versatility, importance) of the field in the business scenario.
  • the first user may set a small number of fields commonly used in the industry as mandatory fields, and set other fields as optional fields.
  • the optional fields include not only the fields that exist in the template scheme, but also the extended fields that do not exist in the template scheme.
  • the field format corresponding to at least one field is set to allow one or more fields in the actual business data to be configured for a single field, so that the configured one or more fields are all fielded in the same way as a single field in the template scheme. deal with.
  • the input source configuration limitation information may also include the field format corresponding to each field.
  • the input source configuration limitation information may further include processing items for defining the processing that at least one configuration input source configured via the input source configuration interface undergoes before at least one input source tag in the template solution is replaced.
  • the processing items may include but are not limited to check items for each field.
  • the check item may include the allowable format and/or allowable value range of each field, and may also include information indicating whether to perform the check. It should be noted here that the allowable format and/or allowable value range can also be expressed as: at least one of the allowable format and the allowable value range.
  • step S130 a template file of a machine learning solution template is generated based on the obtained template solution and input source configuration limitation information.
  • the template file of the machine learning solution template includes the template solution and input source configuration limitation information.
  • the template solution can be regarded as a set of general machine learning modeling solutions in the business scenario defined by a scientist with rich modeling experience based on the business scenario.
  • the template solution includes some machine learning modeling know how precipitated by the scientist, and the second user These modeling know how can be reused by using template solutions to lower the modeling threshold and reduce modeling time; input source configuration limited information can be used to assist the second user to mark actual business data with the input source in the template solution Corresponding to solve the data matching problem between the actual business data and the inherent data in the template solution, so that the template solution can obtain better modeling effects when applied to business data with different data structures in the same business direction.
  • the template file can also be tested to determine whether the template file can meet expectations. If it meets the expectations, the machine learning solution template can be released, and if it does not meet the expectations, the machine learning solution template can be debugged.
  • the input source configuration interface generated based on the input source configuration limitation information may be shown to the third user, the third user may refer to the tester; the third user is obtained based on the test data table corresponding to the test scenario, via the input source configuration interface At least one configured input source is configured, where the configured input source refers to the input source configured by the third user according to the test data for the input source mark in the template scheme, and the configured input source may include the test data table configured for the input table and / Or for the fields in the test data table configured for the fields in the input table, the specific configuration process can be referred to the relevant description in conjunction with Figure 3 below, which will not be repeated here; replace at least one input in the template solution with at least one configuration input source Source mark to obtain the modified machine learning solution template; execute at least part of the machine learning process based on the modified machine learning solution template to obtain the execution result of at least part of the machine learning process; evaluate the execution result to obtain the test result, Determine whether to release the machine learning solution template based on the test results, or debug the machine
  • test data table configured for the input table and/or the fields in the test data table configured for the fields in the input table can also be expressed as: the test data table configured for the input table and the test data table configured for the input table At least one of the fields in the test data table configured in the field.
  • the template scheme can also include at least one parameter placeholder.
  • the parameter placeholder can be any agreed non-code language commonly used, such as " ⁇ $placeholder$ ⁇ ".
  • the parameters represented by the parameter placeholders refer to the parameters that can be determined by the second user, and may include, but are not limited to, script parameters and/or operating parameters. Therefore, when the first user sets the template plan, he can also modify the template plan based on experience, and replace the changeable parameters in the form of placeholders, such as feature combination methods, hyperparameters, operating resources, etc.
  • the parameters that can be changed are set as parameter placeholders.
  • the script parameter and/or the running parameter can also be expressed as: at least one of the script parameter and the running parameter.
  • the present disclosure can also obtain limited information about the parameter configuration of the template scheme.
  • the parameter configuration limitation information is used to generate a parameter configuration interface, so that at least one configuration parameter configured via the parameter configuration interface replaces at least one parameter placeholder in the template solution.
  • the parameter configuration interface refers to an interface presented to the second user, and is used to assist the second user in configuring the parameters represented by the parameter placeholders.
  • the parameter configuration limited information can be set by the first user, and the first user can set the parameter configuration limited information for each parameter placeholder in the template scheme.
  • the parameter configuration limitation information can be obtained in various ways.
  • the first user can generate a file including parameter configuration limitation information through, but not limited to, editing the document.
  • the present disclosure can provide the first user with a file upload interface to obtain the parameter configuration limitation from the file uploaded by the first user through the interface. information.
  • the present disclosure may also provide a visual operation interface to the first user, and obtain the parameter configuration limitation information set by the first user according to the operation performed by the first user on the visual operation interface.
  • a control for setting parameter configuration limited information may be generated based on the obtained template scheme, the generated control is displayed to the first user, and the parameter configuration limited information set by the first user through the control is received.
  • the template scheme can be parsed, the parameter placeholders in the template scheme can be identified, and the control for setting the parameter configuration limited information related to the parameter placeholders can be displayed to the first user, so that Obtain the parameter configuration limited information set by the first user through the control.
  • the parameter configuration limited information may include any information that can be used to assist the second user in configuring the parameter represented by the parameter placeholder.
  • the parameter configuration limited information may include, but is not limited to, at least one of the following items: type information that needs to be configured for parameter placeholders displayed on the parameter configuration interface, input method information, display information, default values, actual Value.
  • the type information can indicate script parameters and/or operating parameters
  • the input method information can include the filling method and/or the selection method, where the filling method refers to the input by filling in, and the selection method refers to the The selection item is selected for input
  • the display information can include prompt information to help the second user understand the parameter that needs to be input, such as the name of the parameter represented by the parameter placeholder
  • the default value can be
  • the actual value can refer to the actual value of the provided option.
  • the display value is different from the actual value.
  • the display value can refer to the translated value that is convenient for users.
  • the script parameter and/or the running parameter can also be expressed as: at least one of the script parameter and the running parameter.
  • the parameter configuration limitation information may further include processing items for defining the processing that at least one configuration parameter configured via the parameter configuration interface undergoes before replacing at least one parameter placeholder in the template solution.
  • the processing item may include, but is not limited to, a verification item for verifying at least one configuration parameter.
  • the check item may include the allowable format and/or allowable value range of the configuration parameter, and may also include the indication information whether to perform the check.
  • the present disclosure may generate the template file of the machine learning solution template based on the acquired template solution, parameter configuration limitation information, and input source configuration limitation information. That is, the template file can include not only template scheme and input source configuration limitation information, but also parameter configuration limitation information.
  • the parameter configuration limited information can assist the second user to configure the parameters represented by the parameter placeholders in the template solution, so as to ensure that the overall modeling effect is compatible with the actual business scenario and improve the application effect of the template solution in the actual business scenario.
  • the parameter configuration limitation information may also include classification information used to limit the configuration of the parameter placeholders according to the classification area on the parameter configuration interface.
  • the present disclosure may also show the first user a control for classifying the parameter placeholders, according to the first user's classification of the parameter placeholders through the control. Get classification information.
  • the generated parameter configuration interface can classify and display the parameter placeholders that need to be configured according to the classification information, wherein the parameter placeholders of different classifications are displayed in different classification areas, so that the second user can perform the parameter placeholders There is a more logical sense when configuring.
  • the present disclosure can also show a control for uploading a description document to the first user, receive the description document uploaded by the first user through the control, and merge the description document into a template file.
  • the description document may be a document used to inform the user how to configure the template scheme.
  • the document may be described in a business language that is easy for the second user to understand, so that the second user does not need to understand the concept of machine learning in most cases.
  • a machine learning solution template can be used.
  • the documentation can include instructions for setting sample labels, so that users do not need to understand the concept of positive samples and negative samples of machine learning, but only need to respond to the truth of the business.
  • users only need to clarify which transactions are problematic. Which are normal transactions, the entire modeling template can be operated through normal business understanding, which reduces the threshold for template use.
  • the present disclosure can also show a control for setting resource configuration information to the first user, receive resource configuration information set by the first user through the control, and the resource configuration information is used to characterize resource configuration for performing at least part of the machine learning process, and configure the resource Merge into the template file.
  • the resource configuration information mentioned here can be regarded as the running resource setting provided by the first user.
  • FIG. 1 can be executed by a machine learning platform used to implement machine learning related services.
  • Figure 2 shows a schematic diagram of the creation interface of the machine learning solution template presented to the user by the machine learning platform.
  • the user mentioned here refers to the user used to create the machine learning solution template, that is, the first user mentioned above, for example, may be a scientist with rich experience in machine learning modeling.
  • the creation of a machine learning solution template can be divided into four parts, which are basic information configuration, input configuration, placeholder configuration, and classification configuration.
  • the basic information may include, but is not limited to, the name of the modeling template, the visible status of the modeling template, the modeling template DAG, and the modeling template configuration description document.
  • the name of the modeling template refers to the name of the machine learning solution template displayed to the user (the template user, that is, the second user mentioned above) to view.
  • scientists can name the machine learning program template according to the applicable business scenarios and the functions of the machine learning program template.
  • a scientist can name the machine learning program template created for the marketing scenario as a general marketing modeling template.
  • the visibility status of the modeling template refers to whether the modeling template is visible to other users. If invisible is selected, the user cannot view it in the foreground, otherwise the user can view it at the entrance of the foreground.
  • the modeling template DAG refers to the DAG file of the machine learning solution template.
  • Engineers can upload DAG files by clicking the upload control. After the upload is complete, the background can automatically scan the placeholder information in the DAG file, and based on the scan results, the controls for the scientist to configure the placeholders are displayed in the interface.
  • the modeling template configuration description document is used to inform users how to configure the modeling template. It can contain detailed descriptions of each configuration field to help users understand.
  • Input configuration refers to the setting of limited information of input source configuration by the scientist.
  • the input table module information includes a control used to fill in the name of the input table, a control used to fill in the node ID (that is, the node ID shown in the figure), a control used to add a field, and a field display table.
  • the field display table includes controls for filling in field names, controls for setting whether field verification is required, and controls for selecting field types.
  • the scientist can set the input table name according to the business meaning of the input table, so that the user can select the appropriate data table in the actual business data for matching based on the input table name.
  • the scientist can also rename the field name according to the meaning of the field to ensure that the field name seen by the user is easy to understand.
  • scientists can also set whether field verification is required. If you choose to require field verification, you also need to set the field type, which is convenient for verifying whether the field type provided by the user is consistent with the scientist's requirements.
  • Each placeholder module contains an option box, a placeholder name, and the option content corresponding to the option box.
  • the option box contains two choices of selection items and input items.
  • the selected item means that the user will see a fill-in requirement for a select category.
  • the select category only supports single selection and set the corresponding options. There are at least two options. You can click Add option to add options; fill-in items represent that the user will see a fill-in
  • the text box of the item name the user can fill in the content.
  • the option name is used to help users understand the information that needs to be filled in the option, and the setting of the option name should be as consistent as possible with the modeling template configuration documentation. If you select the verification field type, you need to verify the content according to the field type set by the scientist. If you do not select the verification field type, you do not need to verify the field.
  • the field type refers to the field type set by the scientist, which is convenient for verifying whether the field type provided by the user is consistent with the scientist's requirement. It can include but is not limited to enums and String. If you select the check field threshold, you need to check it according to the field threshold. If you don’t select it, the data information filled in by the user will not be checked.
  • the field threshold can be filled or not, for example, a regular expression and a closed interval (representing an interval greater than or equal to or less than or equal to) can be filled in to characterize the field threshold. After filling in, it will verify whether the data in the corresponding form field meets the interval requirements. If it is filled in as a regular expression, it can also provide a regular expression prompt copy.
  • the main content is to explain what field information needs to be filled in.
  • Run parameter field types can support but are not limited to String, Int, Double, Boolean, Long.
  • each placeholder variable there is a button to add a classification line.
  • the classification line set above the classification line to the previous setting is displayed as a module when displayed to the user. If there is no other classification line module setting above the classification line, the classification line to the first placeholder variable forms a module display.
  • the category name is used to represent the category information of the placeholder variables displayed as a module. Taking the placeholder variables between the two classification lines as the marketing timing window configuration and the wealth management purchase timing window configuration respectively, scientists can set the category name as the timing parameter setting.
  • the platform After the scientist clicks the save scheme control after completing the settings, the platform will save the current modeling template.
  • the creation time is the time when you click to save the plan.
  • scientists can click Cancel to send out a reminder message that "all current settings will not be retained after cancellation, please cancel with caution”.
  • the present disclosure also proposes a flowchart of a method for executing a machine learning process based on a machine learning scheme template.
  • the machine learning solution template may be generated based on the method of creating a machine learning solution template of the present disclosure. Therefore, the method for executing the machine learning process based on the machine learning scheme template of the present disclosure may further include the steps shown in FIG. 1.
  • Fig. 3 shows a flowchart of a method for executing a machine learning process based on a machine learning scheme template according to an exemplary embodiment of the present disclosure.
  • the method shown in FIG. 3 can be completely implemented in software through a computer program, and the method shown in FIG. 3 can also be executed by a specially configured computing device.
  • step S310 a template file of a machine learning solution template is obtained.
  • the template file includes the template scheme and input source configuration limitation information.
  • the template solution is used to describe at least part of the machine learning process for at least one input source tag.
  • the machine learning process involves model training and/or model application, and the input source configuration limited information is used to generate the input source configuration interface.
  • the template scheme, input source mark, and input source configuration limitation information please refer to the relevant description above, which is not repeated here.
  • step S320 the input source configuration interface generated based on the input source configuration limitation information is displayed to the second user.
  • the input source configuration interface is used to assist the second user in matching the actual business data with the input source mark in the template scheme.
  • the second user refers to a user who uses the machine learning solution template.
  • the second user can be a scientist with rich experience in machine learning modeling, or a business person with a lack of experience in machine learning modeling.
  • step S330 at least one configuration input source configured by the second user via the input source configuration interface is acquired.
  • the input source configuration interface may include a control for setting a configuration input source, and may receive the configuration input source set by the second user through the control.
  • the configuration input source comes from actual business data.
  • the configured input source refers to the input source configured by the second user according to actual business data for the input source mark in the template solution. Take the input source tag used to identify the input table and/or field that can be replaced in the template scheme as an example.
  • the field under the input table in the template scheme can be called the first field, and the field under the business data table can be called the second field.
  • the configuration input source may include a business data table configured for the input table and/or a second field under the business data table configured for the first field.
  • the business data table refers to a data table generated in an actual business scenario, and the business data table represents actual business data.
  • the second field under the business data table configured for the input table and/or the business data table configured for the first field can also be expressed as: the business data table configured for the input table and the business data table configured for the first field. At least one item in the second field under the business data table of the field configuration.
  • the input source configuration limited information may include at least one of the following items: at least one of the required configuration displayed on the input source configuration interface
  • the input source configuration interface may also include at least one of the following items: the name of at least one input table that needs to be configured, the name of each first field that needs to be configured under each input table, and an indication of whether each first field is optional. information.
  • the second user can first configure the corresponding business data table according to the input table name displayed on the input source configuration interface, and then according to the name of each first field that needs to be configured under each input table and whether it is optional. Indication information, configure a corresponding second field for the first field, so as to correspond to the second field in the service data table with the first field in the input table.
  • the second user can determine whether there is a second field matching the first field in the business data table, and if there is a matching second field, set the corresponding first field for the first field. For the second field, if there is no matching second field, the second field may not be set for the first field. For the first field that is not an optional field (that is, a required field), the second user needs to configure at least one second field in the service data table for the field.
  • whether the first field is an optional field can be determined according to the versatility or importance of the field in the business scenario. Fields that are not universal or of low importance are set as optional fields. Therefore, when the second user matches the second field in the business data table with the first field in the input table, there is at least one corresponding second field in the first field that is a mandatory field, so that the template solution can be guaranteed When applied to actual business scenarios, the modeling effect will not be too bad, and whether there is a corresponding second field in the first field that is an optional field is set by the second user according to the actual situation, that is, the second user does not need to input Each first field under the table is configured with a corresponding second field, so that the adaptability of the data table can be optimized while ensuring the modeling effect.
  • the first field that needs to be configured under each input table may not only include the original field that exists in the input table, but also may include a field that does not exist in the input table, that is, an extended field.
  • the extension field may be added to the input table according to the data structure that may exist in the business scenario, and the added extension field may also have a corresponding processing mode. Therefore, when the second user matches the second field in the business data table with the first field in the input table, the second field can also be associated with a field that does not exist in the input table (that is, an extension field).
  • each extension field added to the input table can also be regarded as a field category, and different extension fields correspond to different field categories.
  • the second user can The additional fields are divided into corresponding categories of extended fields, so that the additional fields can also participate in machine learning to realize their data value.
  • the field format corresponding to at least one first field is set to allow one or more second fields in the actual business data to be configured for a single first field, so that the configured one or more second fields are in accordance with the template scheme Field processing is performed in the same way as a single first field.
  • the control used to set the second field for the first field in the input source configuration interface is generated based on the field format corresponding to the first field, so that the second user can use the control
  • the second field can be configured for the first field according to the field format corresponding to the first field.
  • the field format of the first field may allow only one second field to be configured for this field, or it may allow multiple configurations for this field.
  • Second field the first field that needs to be configured under the input table in the input source configuration interface may include the first field that supports the "one-to-one” configuration, or may include the first field that supports the "one-to-many” configuration.
  • the second user is allowed to configure at most one second field for this field, and for the first field that supports "one-to-many” configuration, the second user is allowed to configure multiple for this field The second field.
  • the input table displayed in the input source configuration interface may include at least one optional form, and the optional form may include multiple column names.
  • the column names in the optional form can be the existing basic fields (such as card number, account opening time, etc.) in the template scheme, and these basic fields have corresponding processing methods in the template scheme.
  • the second user can fill in such additional fields under the column names corresponding to the optional form. For example, if the additional field is activation time, the field name of the activation time can be filled under "Account Opening Time", so that the activation time can be processed according to the account opening time processing method.
  • the column name in the optional form can also be a field category (for example, it can be the extension field mentioned above), and each field category has a corresponding processing mode.
  • the second user can fill in such additional fields under the column names corresponding to the optional form. For example, if the additional field is activation time, the field name of the activation time can be filled under the "time" category, so that the activation time can be processed according to the processing method of the "time" category.
  • the input source configuration limitation information may also include the field format corresponding to each first field.
  • the input source configuration limitation information may further include processing items for defining the processing that at least one configuration input source configured via the input source configuration interface undergoes before replacing at least one input source tag in the template solution.
  • the present disclosure can also process the configuration input source according to the processing items, and display the processing result in the input source configuration interface.
  • the processing item may include a check item for each first field, and the check item may include an allowable format and/or an allowable value range of each first field.
  • the format and/or value of the second field configured for the first field can be verified according to the check item of the first field, and the processing result is used to indicate the format and/or value of the second field configured for the first field. Whether the value meets the check item.
  • the check item may also include information indicating whether to perform the check. It should be noted here that the format and/or value of the second field can also be expressed as: at least one of the format and value of the second field.
  • step S340 at least one input source tag in the template scheme is replaced with the acquired at least one configuration input source to obtain a modified template scheme.
  • the input table in the template scheme can be replaced with the configured business data table.
  • the processing method of the input table and the first field under the input table in the template scheme is known.
  • the replaced value can be set according to the processing method of the input table
  • the processing method of the business data table is set according to the processing method of the first field to the processing method of the second field configured for it. Therefore, the modified template scheme describes the machine learning process for configuring the input source.
  • the second user uses the machine learning solution template, he does not need to generate the corresponding form according to the field requirements of the template solution. It only needs to match the fields with the same business meaning to complete the introduction of actual business data. The user's repeated modification of the data structure.
  • step S350 the machine learning process is executed based on the modified template scheme.
  • the modified template scheme describes the machine learning process for the configuration input source, and the configuration input source represents the actual business data. Therefore, the machine learning process is executed based on the modified template scheme, and the machine learning results that meet the actual business scenario can be obtained.
  • the template solution can be regarded as a set of general machine learning modeling solutions in the business scenario defined by a scientist with rich modeling experience according to the business scenario.
  • the template solution includes some machine learning modeling knowledge precipitated by the scientist.
  • the second user can reuse these modeling know how by using the machine learning modeling solution to lower the modeling threshold and reduce modeling time; and in the process of using the template, the second user does not need to understand the machine learning solution template
  • the principle of modeling does not need to understand the modeling process, and only by visually matching the actual business data with the data in the template solution, you can get the machine learning results in line with the business scenario.
  • the template solution may also include at least one parameter placeholder
  • the template file may also include parameter configuration limitation information used to generate the parameter configuration interface.
  • parameter placeholders, parameter configuration interface, and parameter configuration limitation information please refer to the relevant description above, which will not be repeated here.
  • the present disclosure can also show the second user a parameter configuration interface generated based on the parameter configuration limitation information, obtain at least one configuration parameter configured by the second user via the parameter configuration interface, and replace at least one of the template solutions with the obtained at least one configuration parameter.
  • some configurable parameters can be opened to the second user.
  • These configurable parameters can be parts that need to be adjusted according to business requirements. In this way, it can be ensured that the execution effect of the machine learning process is adapted to the actual business scenario. Improve the use of machine learning program templates.
  • the parameter configuration interface may include controls for setting configuration parameters, and may receive the configuration parameters set by the second user through the controls.
  • the parameter configuration limited information may also include but is not limited to at least one of the following items: the type information that needs to be configured for the parameter placeholder displayed on the parameter configuration interface, input method information, display information, default value, actual value .
  • the parameter configuration interface may also include but is not limited to at least one of the following items: type information of parameter placeholders that need to be configured, input method information, display information, and default values.
  • type information, input method information, display information, default value, and actual value please refer to the relevant description above, which will not be repeated here.
  • the parameter configuration limitation information may further include processing items used to define the processing that at least one configuration parameter configured via the parameter configuration interface undergoes before at least one parameter placeholder in the template solution is replaced.
  • the present disclosure may also pair the processing items according to the processing items. Configure the parameters for processing, and display the processing results in the input source configuration interface.
  • the processing item may include a check item for checking at least one configuration parameter. Wherein, the check item may include, but is not limited to, the allowable format and/or allowable value range of the configuration parameter, and may also include information indicating whether to perform the check.
  • the parameter configuration limited information can also include classification information used to limit the configuration of the parameter placeholders according to the classification area on the parameter configuration interface.
  • the parameter configuration interface can classify and display the parameter placeholders that need to be configured according to the classification information. Different classifications The parameter placeholders are displayed in different classification areas. The classified display means that the parameters belonging to the same category are displayed as a group, which makes the user fill in a more logical sense.
  • the template file may also include a description document for assisting the second user to understand and/or configure the template solution, and the present disclosure may also provide the second user with a description document.
  • the description document may be described in a business language that is easy for the second user to understand, so that the second user does not need to understand the concept of machine learning in most cases to use the machine learning solution template.
  • the template file may also include an instruction document for assisting the second user to understand and/or configure the template solution, and it may also be expressed as: the template file may also include instructions for assisting the second user to understand the template solution At least one of a document and a description document for assisting the second user in configuring the template solution.
  • the template file may also include resource configuration information.
  • the resource configuration information is used to characterize the resource configuration for performing at least part of the machine learning process, where the resource configuration information may be set by the first user.
  • the machine learning process may be executed based on the modified template scheme using the resource configuration represented by the resource configuration information.
  • the resource configuration required in the machine learning process of the modified template scheme can also be predicted, and the machine learning process can be performed using the predicted resource configuration.
  • the required resource allocation can be predicted by using but not limited to regular formulas and prediction models.
  • the required resource configuration can be calculated according to the rule formula, combined with sample data volume, feature extraction method, model training algorithm, etc., and for example, various data volumes of modeling tasks can be provided in different resource configurations (for example, increasing from small to large).
  • the trial run under the resource allocation takes the data volume and resource allocation at the time of success and failure of the trial run as samples, pre-trains the machine learning model used to estimate the required resource allocation, and uses the machine learning model to predict the required resources .
  • FIG. 3 may be executed by a machine learning platform used to implement machine learning related services.
  • Figure 4 shows a schematic diagram of the configuration interface of the machine learning solution template displayed to the user by the machine learning platform.
  • the user mentioned here refers to the user of the machine learning solution template, that is, the second user mentioned above.
  • the name of the modeling template is displayed in the upper left corner, and the configuration description of the modeling template is the description document uploaded when the first user creates the modeling template.
  • the modeling template configuration instructions you can understand which data tables, which fields, and corresponding business meanings need to be configured in the current modeling template.
  • the user table, product table, and behavior table are the names of the input tables that need to be configured in the template plan.
  • the user needs to fill in the corresponding business data table.
  • the user can directly fill in the table name of the business data table.
  • the user can search and input during the user filling process.
  • the content of the matching business data table, and the matching items will pop up from the drop-down box.
  • the matching items can be arranged in reverse chronological order.
  • Each page displays 5 business data tables, which the user can view by scrolling down.
  • user_id, age, sex, etc. are the fields that need to be configured under the user table.
  • the user can select the corresponding field in the business data table for the fields that need to be configured under the user table.
  • the corresponding error field will display a red star; if the field threshold requirement is not met, you can remind "#field name# only accepts # threshold value lower limit#-# threshold value upper limit# data, please check the corresponding table data".
  • Timing parameter settings and sample parameter settings refer to the parameter placeholders that need to be configured.
  • the corresponding name and input content can be displayed according to the parameter configuration limited information. If there is parameter configuration limited information that includes one or more options set, the options will be displayed for the user to choose. As an example, a default value that can be adjusted by the user can be shown.
  • the user After the user fills in, he can replace the replaceable parts (such as input source tags, parameter placeholders) in the template scheme according to the content filled in by the user, and use the replaced template scheme to perform the machine learning process.
  • replaceable parts such as input source tags, parameter placeholders
  • the method for creating a machine learning solution template of the present disclosure can also be implemented as a device for creating a machine learning solution template.
  • Fig. 5 shows a structural block diagram of an apparatus for creating a machine learning scheme template according to an exemplary embodiment of the present disclosure.
  • the functional unit of the device for creating a machine learning solution template can be implemented by hardware, software, or a combination of hardware and software that implements the principles of the present disclosure.
  • the functional units described in FIG. 5 can be combined or divided into sub-units to realize the principle of the above-mentioned invention. Therefore, the description herein may support any possible combination, or division, or further limitation of the functional units described herein.
  • the apparatus 500 for creating a machine learning solution template includes a first obtaining module 510, a second obtaining module 520, and a generating module 530.
  • the first obtaining module 510 is configured to obtain a template solution used to describe at least part of the machine learning process for at least one input source tag, where the machine learning process involves model training and/or model application.
  • the second acquisition module 520 is configured to acquire input source configuration limitation information about the template solution, where the input source configuration limitation information is used to generate an input source configuration interface so that at least one configuration input source configured via the input source configuration interface replaces the template solution At least one input source tag in.
  • the second obtaining module 520 may generate a control for setting limited information of the input source configuration based on the obtained template scheme, show the generated control to the first user, and receive the input source configuration set by the first user through the control. Limited information.
  • the generating module 530 is configured to generate a template file of a machine learning solution template based on the obtained template solution and input source configuration limitation information.
  • the template scheme may include at least one parameter placeholder, and the apparatus 500 for creating a machine learning scheme template may also include a third acquisition module.
  • the third acquisition module is used to acquire parameter configuration limitation information about the template solution, where the parameter configuration limitation information is used to generate a parameter configuration interface so that at least one configuration parameter configured via the parameter configuration interface replaces at least one parameter in the template solution.
  • the generating module 530 may generate a template file of the machine learning solution template based on the acquired template solution, parameter configuration limited information, and input source configuration limited information.
  • the limited information of parameter configuration and the parameter configuration interface please refer to the relevant description above, which will not be repeated here.
  • the third obtaining module generates a control for setting parameter configuration limited information based on the obtained template scheme, shows the generated control to the first user, and receives the parameter configuration limited information set by the first user through the control.
  • the parameter configuration limitation information may also include classification information used to limit the configuration of the parameter placeholders according to the classification area on the parameter configuration interface.
  • the third obtaining module also displays a control for classifying the parameter placeholder to the first user, and obtains classification information according to the classification of the parameter placeholder by the first user through the control.
  • the apparatus 500 for creating a machine learning solution template may further include a first display module, a first receiving module, and a first merging module.
  • the first display module is used to display the control for uploading the description document to the first user;
  • the first receiving module is used to receive the description document uploaded by the first user through the control;
  • the first merging module is used to merge the description document into the template file .
  • the apparatus 500 for creating a machine learning solution template may further include a second display module, a second receiving module, and a second merging module.
  • the second display module is used to display the control for setting resource configuration information to the first user;
  • the second receiving module is used to receive the resource configuration information set by the first user through the control, and the resource configuration information is used to characterize the execution of at least part of machine learning The resource configuration of the process;
  • the second merging module is used to merge the resource configuration into the template file.
  • the apparatus 500 for creating a machine learning solution template may further include a third display module, a fourth acquisition module, a replacement module, an execution module, an evaluation module, and a release or debugging module.
  • the third display module is used to display the input source configuration interface generated based on the input source configuration limitation information to the third user; the fourth acquisition module is used to obtain the third user based on the test data table corresponding to the test scenario, and the input source configuration interface Configured at least one configuration input source;
  • the replacement module is used to replace at least one input source tag in the template solution with the at least one configuration input source to obtain a modified machine learning solution template;
  • the execution module is used to based on the modified machine
  • the learning solution template executes the at least part of the machine learning process to obtain the execution result of the at least part of the machine learning process; the evaluation module is used to evaluate the execution result to obtain the test result; the release or debug module is used to The test result determines whether to release the machine learning solution template, or to debug the machine learning solution template based on
  • the method for executing a machine learning process based on a machine learning solution template of the present disclosure can also be implemented as a device for executing a machine learning process based on a machine learning solution template.
  • Fig. 6 shows a structural block diagram of an apparatus for creating a machine learning scheme template according to an exemplary embodiment of the present disclosure.
  • the functional unit of the device for executing the machine learning process based on the machine learning solution template can be implemented by hardware, software, or a combination of hardware and software that implements the principles of the present disclosure.
  • the functional units described in FIG. 6 can be combined or divided into sub-units to realize the principle of the above-mentioned invention. Therefore, the description herein may support any possible combination, or division, or further limitation of the functional units described herein.
  • the apparatus 600 for executing a machine learning process based on a machine learning solution template includes a first acquisition module 610, a first display module 620, a second acquisition module 630, a first replacement module 640 and an execution module 650.
  • the first obtaining module 610 is used to obtain a template file of a machine learning solution template, where the template file includes a template solution and input source configuration limitation information, and the template solution is used to describe at least part of the machine learning process for at least one input source tag.
  • the process involves model training and/or model application, and the input source configuration limited information is used to generate the input source configuration interface.
  • input source mark, and input source configuration limitation information please refer to the relevant description above, which is not repeated here.
  • the first display module 620 is configured to display the input source configuration interface generated based on the input source configuration limitation information to the second user.
  • the input source configuration interface please refer to the relevant description above, which will not be repeated here.
  • the second obtaining module 630 is configured to obtain at least one configuration input source configured by the second user via the input source configuration interface.
  • the input source configuration interface may include a control for setting the configuration input source, and the second acquisition module 630 may receive the configuration input source set by the second user through the control.
  • the configuration input source please refer to the relevant description above, which will not be repeated here.
  • the first replacement module 640 is configured to replace at least one input source mark in the template scheme with the acquired at least one configuration input source to obtain a modified template scheme.
  • the execution module 650 is used to execute the machine learning process based on the modified template scheme.
  • the template file may also include resource configuration information, which is used to characterize the resource configuration for performing at least part of the machine learning process.
  • the execution module 650 may perform the machine learning process based on the modified template scheme using the resource configuration represented by the resource configuration information. Or the execution module 650 may predict the resource configuration required during the machine learning process of the modified template solution, and use the predicted resource configuration to perform the machine learning process.
  • resource configuration information which is used to characterize the resource configuration for performing at least part of the machine learning process.
  • the execution module 650 may perform the machine learning process based on the modified template scheme using the resource configuration represented by the resource configuration information. Or the execution module 650 may predict the resource configuration required during the machine learning process of the modified template solution, and use the predicted resource configuration to perform the machine learning process.
  • the input source configuration limitation information further includes processing items for defining the processing that at least one configuration input source configured via the input source configuration interface undergoes before at least one input source tag in the template solution is replaced, and the execution machine is based on the machine learning solution template.
  • the learning process apparatus 600 may also include a first processing module and a second display module. The first processing module is used to process the configuration input source according to the processing items; the second display module is used to display the processing result in the input source configuration interface.
  • the processing items please refer to the relevant description above, which will not be repeated here.
  • the template solution may also include at least one parameter placeholder.
  • the template file may also include parameter configuration limited information.
  • the parameter configuration limited information is used to generate a parameter configuration interface.
  • the apparatus 600 for executing a machine learning process based on a machine learning solution template may also include a first Three display modules, third acquisition modules, and second replacement modules.
  • the third display module is used to display the parameter configuration interface generated based on the parameter configuration limitation information to the second user; the third obtaining module is used to obtain at least one configuration parameter configured by the second user via the parameter configuration interface; the second replacement module is used To replace at least one parameter placeholder in the template scheme with the obtained at least one configuration parameter to obtain a modified template scheme.
  • the parameter configuration interface may include controls for setting configuration parameters, and the third acquisition module may receive the configuration parameters set by the second user through the controls.
  • the parameter configuration limitation information may further include processing items for defining the processing that at least one configuration parameter configured via the parameter configuration interface undergoes before at least one parameter placeholder in the template solution is replaced, and machine learning is performed based on the machine learning solution template.
  • the process apparatus 600 may further include a second processing module and a fourth display module.
  • the second processing module is used to process the configuration parameters according to the processing items; the fourth display module is used to display the processing results in the input source configuration interface.
  • the processing items please refer to the relevant description above, which will not be repeated here.
  • the template file also includes a description document for assisting the second user to understand and/or configure the template solution.
  • the apparatus 600 for performing a machine learning process based on the machine learning solution template may also include a providing module for providing the second user with the description document. .
  • the specific implementation of the apparatus 600 for executing a machine learning process based on a machine learning solution template can refer to the above in conjunction with FIGS. 3 and 4 for the method for executing a machine learning process based on a machine learning solution template. Relevant descriptions are implemented, so I won’t repeat them here.
  • the creation method, use method and device of the machine learning solution template according to the exemplary embodiments of the present disclosure are described above with reference to FIGS. 1 to 6. It should be understood that the above method can be implemented by a program recorded on a computer-readable medium.
  • a computer-readable storage medium storing instructions can be provided, wherein the computer can be The reading medium records a computer program for executing the method for creating a machine learning scheme template of the present disclosure (as shown in FIG. 1) or a method for executing a machine learning process based on the machine learning scheme template (for example, as shown in FIG. 3).
  • the computer program in the above-mentioned computer-readable medium can be run in an environment deployed in computer equipment such as a client, a host, an agent device, a server, etc. It should be noted that the computer program can be used to execute other than those shown in FIG. 1 or FIG. In addition to the steps above, it can also be used to perform additional steps other than the above steps or perform more specific processing when performing the above steps. The content of these additional steps and further processing has been described with reference to Figures 1 and 3, here for To avoid repetition, it will not be repeated here.
  • the apparatus for creating a machine learning scheme template and the apparatus for executing a machine learning process based on the machine learning scheme template may completely rely on the operation of a computer program to realize the corresponding function, that is, each apparatus Corresponding to each step in the functional architecture of the computer program, the entire device is invoked through a special software package (for example, a lib library) to implement corresponding functions.
  • a special software package for example, a lib library
  • each device shown in FIG. 5 and FIG. 6 can also be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof.
  • the program code or code segment used to perform the corresponding operation can be stored in a computer-readable medium such as a storage medium, so that the processor can read and run the corresponding program Code or code segment to perform the corresponding operation.
  • the exemplary embodiments of the present disclosure may also be implemented as a computing device, which includes a storage component and a processor.
  • the storage component stores a set of computer-executable instructions.
  • the storage component stores a set of computer-executable instructions.
  • the method for creating a machine learning solution template or a method for performing a machine learning process based on the machine learning solution template is performed.
  • the computing device can be deployed in a server or a client, and can also be deployed on a node device in a distributed network environment.
  • the computing device may be a PC computer, a tablet device, a personal digital assistant, a smart phone, a web application, or other devices capable of executing the above set of instructions.
  • the computing device does not have to be a single computing device, and may also be any device or a collection of circuits that can execute the above-mentioned instructions (or instruction sets) individually or jointly.
  • the computing device may also be a part of an integrated control system or a system manager, or may be configured as a portable electronic device interconnected with a local or remote (e.g., via wireless transmission) interface.
  • the processor may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor.
  • the processor may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, and the like.
  • Certain operations described in the method for creating a machine learning solution template or the method for executing a machine learning process based on the machine learning solution template according to an exemplary embodiment of the present disclosure may be implemented in software, and some operations may be implemented in hardware. In addition, these operations can also be achieved through a combination of software and hardware.
  • the processor can run instructions or codes stored in one of the storage components, where the storage component can also store data. Instructions and data can also be sent and received via a network via a network interface device, wherein the network interface device can use any known transmission protocol.
  • the storage component can be integrated with the processor, for example, RAM or flash memory is arranged in an integrated circuit microprocessor or the like.
  • the storage component may include an independent device, such as an external disk drive, a storage array, or any other storage device that can be used by a database system.
  • the storage component and the processor may be operatively coupled, or may communicate with each other, for example, through an I/O port, a network connection, or the like, so that the processor can read files stored in the storage component.
  • the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, a mouse, a touch input device, etc.). All components of the computing device may be connected to each other via a bus and/or network.
  • a video display such as a liquid crystal display
  • a user interaction interface such as a keyboard, a mouse, a touch input device, etc.
  • an apparatus for creating a machine learning scheme template or an apparatus for performing a machine learning process based on the machine learning scheme template may include a storage component and a processor, wherein the storage component stores A set of computer-executable instructions, when the set of computer-executable instructions is executed by the processor, the method for creating a machine learning solution template described above or a method for executing a machine learning process based on the machine learning solution template is executed.
  • the method for creating, using, and apparatus for a machine learning solution template by reusing the machine learning solution template, it is possible to lower the modeling threshold and reduce modeling time-consuming, while the template file of the machine learning solution template
  • the input source configuration limited information in can be used to solve the data matching problem between the actual business data and the inherent data in the template solution, so that the machine learning solution template can be obtained when it is applied to the business data of different data structures in the same business direction. Better modeling effect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • User Interface Of Digital Computer (AREA)
  • Stored Programmes (AREA)

Abstract

公开了一种机器学习方案模板的创建方法、使用方法及装置。获取用于描述针对至少一个输入源标记的至少部分机器学习过程的模板方案;获取关于模板方案的输入源配置限定信息,其中,输入源配置限定信息用于生成输入源配置界面,使得经由输入源配置界面而配置的至少一个配置输入源替换模板方案中的至少一个输入源标记;以及基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件。由此,可以通过复用所创建的机器学习方案模板降低建模门槛以及建模耗时,并且可以解决实际业务数据与模板方案中固有数据之间的数据匹配问题,使得机器学习方案模板在应用于同一业务方向下不同数据结构的业务数据时,均能够获取较好的建模效果。

Description

机器学习方案模板的创建方法、使用方法及装置
本申请要求申请号为201911225347.5,申请日为2019年12月04日,名称为“机器学习方案模板的创建方法、使用方法及装置”的中国专利申请的优先权,其中,上述申请公开的内容通过引用结合在本申请中。
技术领域
本公开总体说来涉及人工智能领域,更具体地说,涉及一种机器学习方案模板的创建方法、使用方法及装置。
背景技术
机器学习是人工智能研究发展到一定阶段的必然产物,其致力于通过计算的手段,利用经验来改善系统自身的性能。在计算机系统中,“经验”通常以“数据”形式存在,通过机器学习算法,可从数据中产生“模型”,也就是说,将经验数据提供给机器学习算法,就能基于这些经验数据产生模型,在面对新的情况时,模型会提供相应的判断,即,预测结果。可以看出,如何基于经验数据产生模型(即,机器学习建模过程)是机器学习技术的关键。
目前在将机器学习技术应用到具体业务场景中时,通常需要建模人员根据业务场景特点,从零开始进行建模调研的工作。从数据准备到模型调试的过程需要较多的时间耗费且对建模人员的素养要求较高,对于业务而言不仅费时,而且有较高的门槛要求。
发明内容
本公开的示例性实施例旨在克服现有技术中机器学习建模过程耗时且门槛要求较高的缺陷。
根据本公开的第一个方面,提出了一种用于创建机器学习方案模板的方法,包括:获取用于描述针对至少一个输入源标记的至少部分机器学习过程的模板方案,其中,机器学习过程涉及模型训练和模型应用之中的至少一个;获取关于所述模板方案的输入源配置限定信息,其中,所述输入源配置限定信息用于生成输入源配置界面,使得经由所述输入源配置界面而配置的至少一个配置输入源替换模板方案中的至少一个输入源标记;以及基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件。
根据本公开的第二个方面,提出了一种基于机器学习方案模板执行机器学习过程的方法,包括:获取机器学习方案模板的模板文件,其中,所述模板文件包括模板方案和输入源配置限定信息,所述模板方案用于描述针对至少一个输入源标记的至少部分机器学习过程,所述机器学习过程涉及模型训练和模型应用之中的至少一个,所述输入源配置限定信息用于生成输入源配置界面;向第二用户展示基于输入源配置限定信息而生成的输入源配置界面;获取第二用户经由所述输入源配置界面而配置的至少一个配置输入源;用获取的至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的模板方案;基于修改后的模板方案来执行机器学习过程。
根据本公开的第三个方面,提出了一种用于创建机器学习方案模板的装置,包括:第一获取模块,用于获取用于描述针对至少一个输入源标记的至少部分机器学习过程的模板方案,其中,机器学习过程涉及模型训练和模型应用之中的至少一个;第二获取模块,用于获取关于所述模板方案的输入源配置限定信息,其中,所述输入源配置限定信息用于生成输入源配置界面,使得经由所述输入源配置界面而配置的至少一个配置输入源替换模板 方案中的至少一个输入源标记;以及生成模块,用于基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件。
根据本公开的第四个方面,提出了一种基于机器学习方案模板执行机器学习过程的装置,包括:第一获取模块,用于获取机器学习方案模板的模板文件,其中,所述模板文件包括模板方案和输入源配置限定信息,所述模板方案用于描述针对至少一个输入源标记的至少部分机器学习过程,所述机器学习过程涉及模型训练和模型应用之中的至少一个,所述输入源配置限定信息用于生成输入源配置界面;第一展示模块,用于向第二用户展示基于输入源配置限定信息而生成的输入源配置界面;第二获取模块,用于获取第二用户经由所述输入源配置界面而配置的至少一个配置输入源;第一替换模块,用于用获取的至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的模板方案;执行模块,用于基于修改后的模板方案来执行机器学习过程。
根据本公开的第五个方面,提出了一种包括至少一个计算装置和至少一个存储指令的存储装置的系统,其中,指令在被至少一个计算装置运行时,促使至少一个计算装置执行用于创建机器学习方案模板的方法或基于机器学习方案模板执行机器学习过程的方法的步骤,其中,所述用于创建机器学习方案模板的方法的步骤包括:获取用于描述针对至少一个输入源标记的至少部分机器学习过程的模板方案,其中,机器学习过程涉及模型训练和模型应用之中的至少一个;获取关于所述模板方案的输入源配置限定信息,其中,所述输入源配置限定信息用于生成输入源配置界面,使得经由所述输入源配置界面而配置的至少一个配置输入源替换模板方案中的至少一个输入源标记;以及基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件;其中,所述基于机器学习方案模板执行机器学习过程的方法的步骤包括:获取机器学习方案模板的模板文件,其中,所述模板文件包括模板方案和输入源配置限定信息,所述模板方案用于描述针对至少一个输入源标记的至少部分机器学习过程,所述机器学习过程涉及模型训练和模型应用之中的至少一个,所述输入源配置限定信息用于生成输入源配置界面;向第二用户展示基于输入源配置限定信息而生成的输入源配置界面;获取第二用户经由所述输入源配置界面而配置的至少一个配置输入源;用获取的至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的模板方案;基于修改后的模板方案来执行机器学习过程。
根据本公开的第六个方面,提出了一种存储指令的计算机可读存储介质,其中,当指令被至少一个计算装置运行时,促使至少一个计算装置执行如本公开第一个方面或第二个方面述及的方法。
在根据本公开示例性实施例的机器学习方案模板的创建方法、使用方法及装置中,通过复用机器学习方案模板可以降低建模门槛并减少建模耗时,而机器学习方案模板的模板文件中的输入源配置限定信息可以用于解决实际业务数据与模板方案中固有数据之间的数据匹配问题,使得机器学习方案模板在应用于同一业务方向下不同数据结构的业务数据时,均能够获取较好的建模效果。
附图说明
从下面结合附图对本公开实施例的详细描述中,本公开的这些和/或其他方面和优点将变得更加清楚并更容易理解,其中:
图1示出了根据本公开示例性实施例的用于创建机器学习方案模板的方法的流程图;
图2示出了机器学习方案模板的创建界面示意图;
图3示出了根据本公开示例性实施例的基于机器学习方案模板执行机器学习过程的方法的流程图;
图4示出了机器学习方案模板的配置界面示意图;
图5示出了根据本公开示例性实施例的用于创建机器学习方案模板的装置的结构框图;
图6示出了根据本公开示例性实施例的用于创建机器学习方案模板的装置的结构框图。
具体实施方式
为了使本领域技术人员更好地理解本公开,下面结合附图和具体实施方式对本公开的示例性实施例作进一步详细说明。在此需要说明的是,在本公开中出现的“若干项之中的至少一项”均表示包含“该若干项中的任意一项”、“该若干项中的任意多项的组合”、“该若干项的全体”这三类并列的情况。在本公开中出现的“和/或”均表示被其连接的前后两项或多项中的至少一项。例如,“包括A和B之中的至少一个”、“包括A和/或B”即包括如下三种并列的情况:(1)包括A;(2)包括B;(3)包括A和B。又例如,“执行步骤一和步骤二之中的至少一个”、“执行步骤一和/或步骤二”即表示如下三种并列的情况:(1)执行步骤一;(2)执行步骤二;(3)执行步骤一和步骤二。也就是说,“A和/或B”也可被表示为“A和B之中的至少一个”,“执行步骤一和/或步骤二”也可被表示为“执行步骤一和步骤二之中的至少一个”。
图1示出了根据本公开示例性实施例的用于创建机器学习方案模板的方法的流程图。图1所示的方法可完全通过计算机程序以软件方式实现,还可通过特定配置的计算装置来执行图1所示的方法。
参见图1,在步骤S110,获取用于描述针对至少一个输入源标记的至少部分机器学习过程的模板方案。
不同业务场景下的机器学习建模方案往往存在较大差别。例如时序场景中的机器学习建模方案注重时序窗口的搭建,对时序类型数据更为敏感,而营销场景中的机器学习建模方案更关注商品、用户标签等方面的数据。
在本公开中,模板方案可以视为由用户(为了便于区分,可以称为第一用户)根据业务场景总结定义的一套该业务场景中通用的机器学习建模方案。其中,第一用户可以是指机器学习建模经验丰富的科学家。
模板方案可以包括至少部分机器学习过程的各步骤配置。例如,模板方案可以描述至少部分机器学习过程中各步骤涉及的处理对象、处理方式、处理结果等一种或多种配置信息。
机器学习过程涉及模型训练和/或模型应用。在此需要说明的是,模型训练和/或模型应用还可以表述为:模型训练和模型应用之中的至少一个。
模型训练是指机器学习模型的训练过程,可以包括但不限于以下步骤之中的至少一个步骤:数据导入、数据拆分、特征抽取、模型训练、模型测试和模型评估,关于各步骤的详细描述可以参见现有机器学习知识,本公开不再赘述。模型应用是指机器学习模型的应用过程,例如可以是指使用训练好的机器学习模型对数据进行预测以得到预测结果的过程,作为示例,可包括打包应用、部署上线、提供服务等处理。
模板方案可以是基于特定语言编写的用于描述至少部分机器学习过程的文件。例如,模板方案可以是DAG(有向无环图)文件,DAG文件可以描述有向无环图中各个节点(即下文述及的处理节点)所表示的机器学习步骤的配置信息。
模板方案中包括一个或多个输入源,输入源是指模板方案描述的机器学习过程所使用的输入源。其中,输入源可以包括但不限于输入表和/或字段,输入表也即模板方案描述的机器学习过程所使用的数据表,字段也即模板方案描述的机器学习过程所使用的字段,例如输入表中的字段。在此需要说明的是,输入表和/或字段还可以表述为:输入表和字段之中的至少一个。
模板方案可以视为特定业务场景中通用的机器学习建模方案,然而同一业务场景下也可能采用数据结构不同的业务数据,即模板方案中的输入源和实际业务数据在数据结构上可能存在一定差异。为了使得模板方案中的输入源能够适应不同数据结构的业务数据, 模板方案中的输入源可以是能够被实际业务数据替换的输入源,关于替换的具体实现可以参见下文相关描述。
为了便于区分,本公开可以使用输入源标记表征模板方案中能够被替换的输入源。输入源标记仅用于指代模板方案中能够被替换的输入源,关于输入源在模板方案中的限定形式,本公开不做限定。也即,模板方案中能够被替换的输入源可以使用特殊标记标识,也可以不用特殊标记标识。
在步骤S120,获取关于模板方案的输入源配置限定信息。
输入源配置限定信息用于生成输入源配置界面,使得经由输入源配置界面而配置的至少一个配置输入源替换模板方案中的至少一个输入源标记。其中,输入源配置界面是指面向使用机器学习方案模板的用户(为了便于区分,可以称为第二用户)展示的界面,用于辅助第二用户将实际业务数据与模板方案中的输入源标记对应起来。
输入源配置限定信息可以由第一用户设置,第一用户可以针对模板方案中的一个或多个输入源设置输入源配置限定信息。本公开可以通过多种方式获取第一用户设置的输入源配置限定信息。例如,第一用户可以通过但不限于编辑文档的方式生成包括输入源配置限定信息的文件,本公开可以向第一用户提供文件上传接口,从第一用户通过该接口上传的文件中获取输入源配置限定信息。再例如,本公开也可以向第一用户提供可视化操作界面,根据第一用户在可视化操作界面上执行的操作,获取第一用户设置的输入源配置限定信息。
作为示例,可以基于获取的模板方案来产生用于设置输入源配置限定信息的控件,向第一用户展示产生的控件,并接收第一用户通过控件所设置的输入源配置限定信息。以输入源指示输入表为例,在获取到模板方案中,可以对模板方案进行解析,以确定模板方案涉及的输入表,向第一用户提供用于添加输入表的控件,第一用户可以通过该控件添加输入表,并可以基于其他控件为该输入表设置相关的输入源配置限定信息。
输入源配置限定信息可以包括任何能够用于辅助第二用户将实际业务数据与模板方案中的输入源对应起来的信息。以输入源标记用于标识模板方案中能够被替换的输入表和/或字段为例,输入源配置限定信息可以包括但不限于以下项之中的至少一个:输入源配置界面上展示的需要配置的至少一个输入表名称、各输入表对应的处理节点、各输入表下需要配置的各字段的名称、各字段在输入源配置界面上是否展示为可选字段的指示信息。
输入表名称是指向第二用户展示的表名称。第一用户可以将模板方案中输入表的名称作为向第二用户展示的表名称,也可以根据输入表的业务含义命名输入表名称,以便第二用户为输入表配置业务含义相同或相似的业务数据表。
输入表对应的处理节点用于表征模板方案描述的机器学习过程中对输入表进行处理的节点,也即在哪个机器学习步骤处理输入表。作为示例,模板方案可以是上文提及的DAG文件,DAG文件中每个处理节点可以具有对应的节点ID,可以利用节点ID表征输入表对应的处理节点。
字段的名称是指向第二用户展示的字段名。第一用户可以将输入表中字段的原始名称作为向第二用户展示的字段名,也可以根据字段的业务含义重新命名字段名,以便第二用户根据业务含义将业务数据表中的字段与输入表中的字段对应起来。需要说明的是,各输入表下需要配置的各字段,除了可以包括输入表中存在的原始字段,还可以包括输入表中不存在的字段,即扩展字段。举例来说,假设输入表A包括字段a、字段b和字段c,输入表A下需要配置的各字段除了可以包括字段a、字段b和字段c,还可以包括输入表A中不存在的字段d,字段d即为扩展字段。扩展字段可以由第一用户设置,例如第一用户可以从业务场景出发,根据业务场景中可能存在的业务数据结构,为输入表添加一个或多个输入表不存在的扩展字段,并设置扩展字段的字段名。其中,在添加扩展字段时,还可以设置扩展字段的处理方式。作为示例,为输入表添加的每个扩展字段还可以视为一个字段类别,不同扩展字段对应不同字段类别。通过设置扩展字段,可以为实际业务数据中 超出模板方案涉及的字段的附加字段的匹配提供支持,即可以将附加字段与扩展字段对应起来,而扩展字段的处理方式又是预先设定好的,使得附加字段也可以参与机器学习,实现其数据价值,从而可以增强模板方案的数据适应性。
指示信息用于指示字段是否为可选字段。可选字段是指第二用户可以根据实际情况决定是否为该字段配置实际业务数据中的字段。非可选字段,也即必选字段,是指第二用户需要为该字段配置实际业务数据中的至少一个字段。本公开可以根据业务场景中字段的特性(如通用性、重要性),将字段设置为可选字段或必选字段。例如第一用户可以将少量在业内通用的字段设置为必选字段,将其他字段设置为可选字段。其中,可选字段除了包括模板方案中存在的字段外,还可以包括模板方案中不存在的扩展字段。关于扩展字段可以参见上文相关描述,此处不再赘述。由此,通过将业内通用或重要性较高的字段划分为必选字段,可以保证建模效果不会太差,而通过划分可选字段,则可以优化模板方案在不同数据结构下的适应性。其中,至少一个字段对应的字段格式被设置为允许针对单个字段配置实际业务数据中的一个或多个字段,使得所配置的一个或多个字段均按照模板方案中处理单个字段的同样方式进行字段处理。由此,通过这种“一对多”的字段对应方式,使得即便实际业务数据中存在大量字段,也可以将这些字段与输入表中的字段对应起来,进而使得实际业务数据中的所有字段都可以参与到机器学习过程(如特征构造),实现其数据价值,并对最终结果产生影响。
此外,输入源配置限定信息还可以包括各字段对应的字段格式。
另外,输入源配置限定信息还可以包括用于限定经由输入源配置界面而配置的至少一个配置输入源在替换模板方案中的至少一个输入源标记之前所经过的处理的处理项。其中,处理项可以包括但不限于关于各字段的校验项。校验项可以包括各字段的允许格式和/或允许取值范围,还可以包括是否进行校验的指示信息。在此需要说明的是,允许格式和/或允许取值范围还可以表述为:允许格式和允许取值范围之中的至少一个。
在步骤S130,基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件。
机器学习方案模板的模板文件包括模板方案和输入源配置限定信息。模板方案可以视为建模经验丰富的科学家根据业务场景总结定义的一套该业务场景中通用的机器学习建模方案,模板方案中包括科学家沉淀下来的一些机器学习建模know how,第二用户可以通过使用模板方案来复用这些建模know how,以降低建模门槛并减少建模耗时;输入源配置限定信息则可以用于辅助第二用户将实际业务数据与模板方案中输入源标记对应起来,以解决实际业务数据与模板方案中固有数据之间的数据匹配问题,使得模板方案在应用于同一业务方向下不同数据结构的业务数据时,均能够获取较好的建模效果。
在生成模板文件后,还可以对模板文件进行测试,以判断模板文件能否达到预期,如果达到预期则可以发布机器学习方案模板,如果没有达到预期则可以对机器学习方案模板进行调试。
作为示例,可以向第三用户展示基于输入源配置限定信息生成的输入源配置界面,第三用户可以是指测试人员;获取第三用户基于与测试场景对应的测试数据表,经由输入源配置界面所配置的至少一个配置输入源,这里配置输入源是指由第三用户根据测试数据针对模板方案中的输入源标记而配置的输入源,配置输入源可以包括针对输入表配置的测试数据表和/或针对输入表中的字段配置的测试数据表中的字段,具体配置过程可以参见下文结合图3的相关描述,此处不再赘述;用至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的机器学习方案模板;基于修改后的机器学习方案模板执行至少部分机器学习过程,以得到至少部分机器学习过程的执行结果;对执行结果进行评估,以得到测试结果,基于测试结果确定是否发布机器学习方案模板,或者基于测试结果对机器学习方案模板进行调试。例如如果测试结果符合预期则可以发布机器学习方案模板,如果测试结果不符合预期则可以对机器学习方案模板进行调试,如可以对模板文件中的输 入源配置限定信息进行修改。在此需要说明的是,针对输入表配置的测试数据表和/或针对输入表中的字段配置的测试数据表中的字段,还可以表述为:针对输入表配置的测试数据表和针对输入表中的字段配置的测试数据表中的字段之中的至少一个。
模板方案还可以包括至少一个参数占位符,参数占位符可以是任何约定好的非编写代码常用的语言,如可以是“{$占位符$}”。参数占位符所表征的参数是指可以由第二用户确定的参数,可以包括但不限于脚本参数和/或运行参数。由此,第一用户在设置模板方案时,还可以根据经验对模板方案进行改造,对其中可以变化的参数以占位符的方式进行替换,例如可以将特征组合方式、超参数、运行资源等可以变化的参数设置为参数占位符。在此需要说明的是,脚本参数和/或运行参数还可以表述为:脚本参数和运行参数之中的至少一个。
本公开还可以获取关于模板方案的参数配置限定信息。参数配置限定信息用于生成参数配置界面,使得经由参数配置界面而配置的至少一个配置参数替换模板方案中的至少一个参数占位符。其中,参数配置界面是指面向第二用户展示的界面,用于辅助第二用户对参数占位符所表征的参数进行配置。
参数配置限定信息可以由第一用户设置,第一用户可以针对模板方案中的各个参数占位符设置参数配置限定信息。本公开可以通过多种方式获取参数配置限定信息。例如,第一用户可以通过但不限于编辑文档的方式生成包括参数配置限定信息的文件,本公开可以向第一用户提供文件上传接口,从第一用户通过该接口上传的文件中获取参数配置限定信息。再例如,本公开也可以向第一用户提供可视化操作界面,根据第一用户在可视化操作界面上执行的操作,获取第一用户设置的参数配置限定信息。
作为示例,可以基于获取的模板方案来产生用于设置参数配置限定信息的控件,向第一用户展示产生的控件,接收第一用户通过控件所设置的参数配置限定信息。例如,在获取到模板方案后,可以对模板方案进行解析,识别模板方案中的参数占位符,并向第一用户展示用于设置与参数占位符相关的参数配置限定信息的控件,以便通过该控件获取第一用户设置的参数配置限定信息。
参数配置限定信息可以包括任何能够用于辅助第二用户对参数占位符所表征的参数进行配置的信息。作为示例,参数配置限定信息可以包括但不限于以下项之中的至少一个:参数配置界面上展示的需要针对参数占位符进行配置的类型信息、输入方式信息、展示信息、默认取值、实际取值。其中,类型信息可以指示脚本参数和/或运行参数;输入方式信息可以包括填写方式和/或选择方式,其中填写方式是指通过填写的方式进行输入,选择方式是指通过对所提供的多个选择项进行选择的方式进行输入;展示信息可以包括用于帮助第二用户了解需要输入的参数的提示信息,如可以是指参数占位符所表征的参数的名称;默认取值可以是指在输入方式为选择方式的情况下提供的选择项的展示值,实际取值可以是指所提供的选择项的实际值,展示值不同于实际值,其中展示值可以是指经过转译得到的便于用户理解的展示内容,由此在提供选择项时,还可以将选择性的实际值隐藏起来,而向用户暴露转译后的展示值。如此既方便用户理解,同时还可以将第一用户设置的实际值作为技术诀窍(know how)保护起来。在此需要说明的是,脚本参数和/或运行参数还可以表述为:脚本参数和运行参数之中的至少一个。
参数配置限定信息还可以包括用于限定经由参数配置界面而配置的至少一个配置参数在替换模板方案中的至少一个参数占位符之前所经过的处理的处理项。其中,处理项可以包括但不限于对至少一个配置参数进行校验的校验项。其中,校验项可以包括配置参数的允许格式和/或允许取值范围,还可以包括是否进行校验的指示信息。
在模板方案中包括参数配置限定信息时,本公开可以基于获取的模板方案、参数配置限定信息和输入源配置限定信息来生成机器学习方案模板的模板文件。即模板文件中不仅可以包括模板方案和输入源配置限定信息,还可以包括参数配置限定信息。参数配置限定信息可以辅助第二用户对模板方案中的参数占位符所表征的参数进行配置,以保证整个 建模效果与实际业务场景相适应,提升模板方案在实际业务场景中的应用效果。
参数配置限定信息还可以包括用于限定在参数配置界面上按照分类区域对参数占位符进行配置的分类信息。在获取关于模板方案的参数配置限定信息的过程中,本公开还可以向第一用户展示用于对参数占位符进行分类的控件,根据第一用户通过控件对参数占位符进行的分类来获取分类信息。由此,所生成的参数配置界面可以按照分类信息将需要配置的参数占位符进行分类显示,其中不同分类的参数占位符被显示在不同分类区域,使得第二用户对参数占位符进行配置时更加有逻辑感。
本公开还可以向第一用户展示用于上传说明文档的控件,接收第一用户通过控件所上传的说明文档,并将说明文档合并入模板文件。说明文档可以是用于告知用户如何对模板方案进行配置的文档,该文档可以是用便于第二用户理解的业务语言描述的,使得第二用户在大部分情况下不需要理解机器学习的概念即可使用机器学习方案模板。例如,说明文档可以包括样本标签的设置说明,使得用户不需要理解机器学习的正样本、负样本概念,只需根据业务如实反应情况,在反欺诈业务中,用户只需明确哪些是问题交易,哪些是正常交易,通过正常业务理解的方式即可进行整个建模模板的操作,降低了模板使用的门槛。
本公开还可以向第一用户展示用于设置资源配置信息的控件,接收第一用户通过控件所设置的资源配置信息,资源配置信息用于表征执行至少部分机器学习过程的资源配置,将资源配置合并入模板文件。此处述及的资源配置信息可以视为由第一用户提供的运行资源设置。
图1所示的方法可以由用于实现机器学习相关业务的机器学习平台执行。图2示出了由机器学习平台向用户展示的机器学习方案模板的创建界面示意图。此处述及的用户是指用于创建机器学习方案模板的用户,也即上文述及的第一用户,例如可以是机器学习建模经验丰富的科学家。
如图2所示,机器学习方案模板的创建可以分为四部分,分别是基础信息配置、输入配置、占位符配置以及分类配置。
1、基础信息配置
基础信息可以包括但不限于建模模板名称、建模模板可见状态、建模模板DAG、建模模板配置说明文档。
建模模板名称是指展示给用户(模板使用者,也即上文述及的第二用户)查看的机器学习方案模板的名称。科学家可以根据机器学习方案模板适用的业务场景、机器学习方案模板的功能进行命名,例如科学家可以将针对营销场景创建的机器学习方案模板命名为通用营销建模模板。
建模模板可见状态是指建模模板是否对其他用户可见。如选择不可见则用户无法在前台查看,反之用户可以在前台入口查看。
建模模板DAG是指机器学习方案模板的DAG文件。科学家可以通过点击上传控件上传DAG文件,上传完成后后台可以自动扫描DAG文件中的占位符信息,基于扫描结果在界面中展示用于科学家对占位符进行配置的控件。
建模模板配置说明文档用于告知用户如何对该建模模板进行配置的文档,可以包含每个配置字段的详细说明,帮助用户理解。
2、输入配置
输入配置是指由科学家设置输入源配置限定信息。
初始情况下,界面中可以仅显示一个“添加输入表”控件,第一用户点击“添加输入表”控件后,界面中可以显示一个输入表模块信息。
输入表模块信息包括用于填写输入表名称的控件、用于填写节点ID(即图中示出的node ID)的控件、用于添加字段的控件以及字段展示表。字段展示表中包括用于填写字段名的控件、用于设置是否需要进行字段校验的控件以及用于选择字段类型的控件。
科学家可以根据输入表的业务含义设置输入表名称,以便用户可以根据输入表名称 选择合适的实际业务数据中的数据表进行匹配。相应地,科学家也可以根据字段含义重命名字段名以保障用户看到的字段名易于理解。科学家还可以设置是否需要字段校验,如果选择需要字段校验,还需要设置字段类型,便于校验用户提供的字段类型与科学家要求是否一致。
3、占位符配置
可以读取科学家上传的DAG中的占位符并展示对应的占位符模块,每个占位符模块包含一个选项框、一个占位符名称和与选项框对应的选项内容。占位符变量有脚本变量和运行参数两种。
占位符选择脚本变量时,选项框包含选择项和输入项两种选择。选择项代表用户会看到一个选择类的填写需求,选择类只支持单选,并设置对应的选项,选项最少两个,可点击添加选项来增加选项;填写项代表用户会看到一个含有填写项名称的文本框,用户填写内容即可。
选项名用于帮助用户了解该选项需要填写的信息,选项名的设置应尽量与建模模板配置说明文档一致。如选择校验字段类型,需根据科学家设置字段类型进行填写内容校验,如不选择校验字段类型则无需校验字段。字段类型是指科学家设置的字段类型,便于校验用户提供的字段类型与科学家要求是否一致,可以包含但不限于enums、String。如选择校验字段阈值,需根据字段阈值校验,如不选择,则不校验用户填写的数据信息。字段阈值可填可不填,例如可填写正则表达式、闭合区间(代表大于等于,小于等于的区间)表征字段阈值。填写后会校验对应表单字段列下的数据是否符合区间要求,如填写为正则表达式,还可以提供正则表达式提示文案,主要内容为说明需要填写什么字段信息。
占位符选择运行参数时,只允许用户填写,科学家可以通过节点别名(节点名称,例如可以是节点ID)指定算子并明确参数的名称(代码内英文名),设置显示给用户填写的参数名、默认的参数项、参数字段类型和对应的阈值正则表达式及正则表达式的提示文案。运行参数字段类型可以支持但不限于String、Int、Double、Boolean、Long。
4、分类配置
在每一个占位符变量下有一个添加分类线按钮。分类线以上至上一设置的分类线在展示给用户时作为一个模块展示,如分类线以上没有其他分类线模块设置,则该分类线到第一个占位符变量组成一个模块展示。
科学家需要输入分类名称以用于分类展示,分类名称用于表征作为一个模块展示的占位符变量的类别信息。以两个分类线之间的占位符变量分别为营销时序窗口配置和理财购买时序窗口配置为例,科学家可以将分类名称设置为时序参数设置。
科学家在完成设置后点击保存方案控件,平台会保存当前建模模板。科学家还可以在建模模板管理页面查看所创建的建模模板。其中,创建时间为点击保存方案的时间。科学家点击取消可以发出“取消后将不保留当前所有设置,请谨慎取消”的提醒消息。
至此结合图1、图2就本公开的创建机器学习方案模板的方法的流程做了详细说明。本公开还提出了一种基于机器学习方案模板执行机器学习过程的方法的流程图。其中机器学习方案模板可以是基于本公开的创建机器学习方案模板的方法生成的。因此,本公开的基于机器学习方案模板执行机器学习过程的方法还可以包括图1所示的各步骤。
图3示出了根据本公开示例性实施例的基于机器学习方案模板执行机器学习过程的方法的流程图。图3所示的方法可完全通过计算机程序以软件方式实现,还可通过特定配置的计算装置来执行图3所示的方法。
参见图3,在步骤S310,获取机器学习方案模板的模板文件。
模板文件包括模板方案和输入源配置限定信息。模板方案用于描述针对至少一个输入源标记的至少部分机器学习过程,机器学习过程涉及模型训练和/或模型应用,输入源配置限定信息用于生成输入源配置界面。关于模板方案、输入源标记、输入源配置限定信息可以参见上文相关描述,此处不再赘述。
在步骤S320,向第二用户展示基于输入源配置限定信息而生成的输入源配置界面。输入源配置界面用于辅助第二用户将实际业务数据与模板方案中输入源标记对应起来。其中,第二用户是指使用机器学习方案模板的用户。第二用户可以是机器学习建模经验丰富的科学家,也可以是机器学习建模经验欠缺的业务人员。
在步骤S330,获取第二用户经由输入源配置界面而配置的至少一个配置输入源。输入源配置界面中可以包括用于设置配置输入源的控件,可以接收第二用户通过控件所设置的配置输入源。
配置输入源来自于实际业务数据。配置输入源是指由第二用户根据实际业务数据针对模板方案中的输入源标记而配置的输入源。以输入源标记用于标识模板方案中能够被替换的输入表和/或字段为例,模板方案中输入表下的字段可以称为第一字段,业务数据表下的字段可以称为第二字段,配置输入源可以包括针对输入表配置的业务数据表和/或针对第一字段配置的业务数据表下的第二字段。其中,业务数据表是指实际业务场景中生成的数据表,业务数据表表征的是实际业务数据。在此需要说明的是,针对输入表配置的业务数据表和/或针对第一字段配置的业务数据表下的第二字段,还可以表述为:针对输入表配置的业务数据表和针对第一字段配置的业务数据表下的第二字段之中的至少一项。
以输入源标记用于标识模板方案中能够被替换的输入表和/或字段为例,输入源配置限定信息可以包括以下项之中的至少一个:输入源配置界面上展示的需要配置的至少一个输入表名称、各输入表对应的处理节点、各输入表下需要配置的各第一字段的名称、各第一字段在输入源配置界面上是否展示为可选字段的指示信息。输入源配置界面中还可以包括以下项之中的至少一个:需要配置的至少一个输入表名称、各输入表下需要配置的各第一字段的名称、各第一字段是否为可选字段的指示信息。
作为示例,第二用户首先可以根据输入源配置界面上展示的输入表名称,配置对应的业务数据表,然后根据各输入表下需要配置的各第一字段的名称及其是否为可选字段的指示信息,为第一字段配置对应的第二字段,以将该业务数据表中的第二字段与输入表中的第一字段对应起来。
对于属于可选字段的第一字段,第二用户可以判断业务数据表中是否存在与该第一字段匹配的第二字段,如果存在匹配的第二字段,则为该第一字段设置对应的第二字段,如果不存在匹配的第二字段,则可以不为该第一字段设置第二字段。对于不属于可选字段(也即属于必选字段)的第一字段,第二用户需要为该字段配置业务数据表中至少一个第二字段。
如上文所述,第一字段是否为可选字段,可以是根据业务场景中字段的通用性或重要性确定的,如可以将业务通用或重要性较高的字段设置为必选字段,将业务不通用或重要性不高的字段设置为可选字段。由此,第二用户在将业务数据表中的第二字段和输入表中的第一字段进行匹配时,属于必选字段的第一字段至少存在一个对应的第二字段,如此可以保证模板方案应用到实际业务场景中时建模效果不会太差,而属于可选字段的第一字段是否存在对应的第二字段是由第二用户根据实际情况设定的,即第二用户无需为输入表下各个第一字段都配置对应的第二字段,使得在保证建模效果的同时还可以优化数据表的适应性。
如上文所述,各输入表下需要配置的第一字段除了可以包括输入表中存在的原始字段,还可以包括输入表中不存在的字段,即扩展字段。扩展字段可以是根据业务场景中可能存在的数据结构为输入表添加的,并且所添加的扩展字段还可以具有对应的处理方式。因此第二用户在将业务数据表中的第二字段和输入表中的第一字段进行匹配时,还可以将第二字段与输入表中不存在的字段(即扩展字段)对应起来。由此,即使业务数据表中的第二字段的数量较多,超出了模板方案中输入表下的字段,第二用户也可以通过将超出的字段与扩展字段对应起来,使得这些超出的字段也可以参与机器学习,实现数据价值,并对最终结果产生影响。例如上文所述,为输入表添加的每个扩展字段还可以视为一个字段 类别,不同扩展字段对应不同字段类别,对于业务数据表中超出输入表涉及的字段的附加字段,第二用户可以将附加字段划分到对应类别的扩展字段下,使得附加字段也可以参与机器学习,实现其数据价值。
例如,至少一个第一字段对应的字段格式被设置为允许针对单个第一字段配置实际业务数据中的一个或多个第二字段,使得所配置的一个或多个第二字段均按照模板方案中处理单个第一字段的同样方式进行字段处理,输入源配置界面中用于设置针对第一字段的第二字段的控件是基于第一字段对应的字段格式生成的,以使得第二用户通过该控件能够按照第一字段对应的字段格式针对第一字段配置第二字段。
也就是说,对于输入源配置界面中输入表下需要配置的第一字段,该第一字段的字段格式可以是只允许针对该字段配置一个第二字段,或者也可以是允许针对该字段配置多个第二字段。也即,输入源配置界面中输入表下需要配置的第一字段中可以既可以包括支持“一对一”配置的第一字段,也可以包括支持“一对多”配置的第一字段。对于支持“一对一”配置的第一字段,允许第二用户为该字段配置最多一个第二字段,对于支持“一对多”配置的第一字段,允许第二用户为该字段配置多个第二字段。
由此,即便业务数据表中存在大量字段,通过这种“一对多”的字段对应方式,也可以将这些字段与输入表中的第一字段对应起来,进而使得业务数据表中的所有字段都可以参与到建模过程,实现其数据价值,并对最终建模结果产生影响。
作为本公开的一个示例,输入源配置界面中展示的输入表可以包括至少一个可选表单,该可选表单中可以包括多个列名。可选表单中的列名可以是模板方案中已有的基本字段(例如卡号、开户时间等),这些基本字段在模板方案中已经具有相应的处理方式。针对实际业务数据中超出模板方案涉及的字段的附加字段,第二用户可将这样的附加字段填写到可选表单对应的列名之下。比如,附加字段为激活时间,则可把该激活时间的字段名称填写到“开户时间”之下,使得可以按照开户时间的处理方式来处理激活时间。
可选表单中的列名也可以是字段类别(例如可以是上文述及的扩展字段),其中每个字段类别具有对应的处理方式。针对模板方案中不涉及的附加字段,第二用户可将这样的附加字段填写到可选表单对应的列名之下。比如,附加字段为激活时间,则可把该激活时间的字段名称填写到“时间”类别之下,使得可以按照“时间”类别的处理方式来处理激活时间。
此外,输入源配置限定信息还可以包括各第一字段对应的字段格式。
另外,输入源配置限定信息还可以包括用于限定经由输入源配置界面而配置的至少一个配置输入源在替换模板方案中的至少一个输入源标记之前所经过的处理的处理项。本公开还可以按照处理项对配置输入源进行处理,并在输入源配置界面中展示处理结果。
作为示例,处理项可以包括关于各第一字段的校验项,校验项可以包括各第一字段的允许格式和/或允许取值范围。此时可以按照第一字段的检验项,对第一字段配置的第二字段的格式和/或取值进行校验,处理结果用于指示针对第一字段配置的第二字段的格式和/或取值是否符合校验项。可选地,校验项还可以包括是否进行校验的指示信息。在此需要说明的是,第二字段的格式和/或取值还可以表述为:第二字段的格式和取值之中的至少一项。
在步骤S340,用获取的至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的模板方案。
以配置输入源包括针对输入表配置的业务数据表和针对第一字段配置的业务数据表下的第二字段为例,可以将模板方案中的输入表替换为所配置的业务数据表。输入表及输入表下的第一字段在模板方案中的处理方式是已知的,在将配置输入源替换掉模板方案中的输入源标记时,可以根据输入表的处理方式设定替换后的业务数据表的处理方式,根据第一字段的处理方式设定为其配置的第二字段的处理方式。由此,修改后的模板方案描述的是针对配置输入源的机器学习过程。
由此,第二用户在使用机器学习方案模板时,不需要根据模板方案的字段要求而产生对应的表格,只需要将相同业务含义的字段做匹配,就可以完成实际业务数据的引入,避免了用户对数据结构的反复修改工作。
在步骤S350,基于修改后的模板方案来执行机器学习过程。
修改后的模板方案描述的是针对配置输入源的机器学习过程,而配置输入源表征的则是实际业务数据。因此基于修改后的模板方案执行机器学习过程,可以得到符合实际业务场景的机器学习结果。
综上,模板方案可以视为建模经验丰富的科学家根据业务场景总结定义的一套该业务场景中通用的机器学习建模方案,模板方案中包括科学家沉淀下来的一些机器学习建模know how,第二用户可以通过使用机器学习建模方案来复用这些建模know how,以降低建模门槛并减少建模耗时;并且在使用模板的过程中,第二用户不需要理解机器学习方案模板的建模原理,也不需要了解建模过程,而仅通过可视化地方式将实际业务数据与模板方案中的数据对应起来,就可以得到符合业务场景的机器学习结果。
如上文所述,模板方案还可以包括至少一个参数占位符,模板文件还包括用于生成参数配置界面的参数配置限定信息。关于参数占位符、参数配置界面、参数配置限定信息可以参见上文相关描述,此处不再赘述。
本公开还可以向第二用户展示基于参数配置限定信息而生成的参数配置界面,获取第二用户经由参数配置界面而配置的至少一个配置参数,用获取的至少一个配置参数替换模板方案中的至少一个参数占位符,以得到修改后的模板方案。由此,可以向第二用户开放一些可配置的参数,这些可配置的参数可以是需要根据业务要求调整的部分,通过这种方式,可以保证机器学习过程的执行效果与实际业务场景相适应,提升机器学习方案模板的使用效果。
参数配置界面中可以包括用于设置配置参数的控件,可以接收第二用户通过控件所设置的配置参数。
参数配置限定信息还可以包括但不限于以下项之中的至少一个:参数配置界面上展示的需要针对参数占位符进行配置的类型信息、输入方式信息、展示信息、默认取值、实际取值。参数配置界面中还可以包括但不限于以下项之中的至少一个:需要配置的参数占位符的类型信息、输入方式信息、展示信息、默认取值。关于类型信息、输入方式信息、展示信息、默认取值、实际取值可以参见上文相关描述,此处不再赘述。
参数配置限定信息还可以包括用于限定经由参数配置界面而配置的至少一个配置参数在替换模板方案中的至少一个参数占位符之前所经过的处理的处理项,本公开还可以按照处理项对配置参数进行处理,并在输入源配置界面中展示处理结果。处理项可以包括对至少一个配置参数进行校验的校验项。其中,校验项可以包括但不限于配置参数的允许格式和/或允许取值范围,还可以包括是否进行校验的指示信息。
参数配置限定信息还可以包括用于限定在参数配置界面上按照分类区域对参数占位符进行配置的分类信息,参数配置界面可以按照分类信息将需要配置的参数占位符进行分类显示,不同分类的参数占位符被显示在不同分类区域。分类显示也即属于同一类别的参数作为一个分组进行显示,使得用户填写的时候更加有逻辑感。
模板文件还可以包括用于辅助第二用户了解和/或配置模板方案的说明文档,本公开还可以向第二用户提供说明文档。说明文档可以是用便于第二用户理解的业务语言描述的,使得第二用户在大部分情况下不需要理解机器学习的概念即可使用机器学习方案模板。在此需要说明的是,模板文件还可以包括用于辅助第二用户了解和/或配置模板方案的说明文档,还可以表述为:模板文件还可以包括用于辅助第二用户了解模板方案的说明文档和用于辅助第二用户配置模板方案的说明文档之中的至少一个。
作为本公开的一个示例,模板文件还可以包括资源配置信息,资源配置信息用于表征执行至少部分机器学习过程的资源配置,其中资源配置信息可以是由第一用户设置的。 在执行步骤S350时,可以基于修改后的模板方案,使用资源配置信息所表征的资源配置执行机器学习过程。
作为本公开的另一个示例,在执行步骤S350时,也可以对修改后的模板方案在执行机器学习过程中所需的资源配置进行预测,使用预测得到的资源配置执行机器学习过程。其中,可以采用但不限规则公式、预测模型的方式对所需的资源配置进行预测。例如可以根据规则公式,结合样本数据量、特征抽取方式、模型训练算法等推算所需的资源配置,再例如也可以提供建模任务的各种数据量在不同资源配置(例如,从小到大递增的资源配置)下的试跑,将试跑成功和失败时的数据量及资源配置作为样本,预先训练用于推测所需资源配置的机器学习模型,使用机器学习模型对所需的资源进行预测。
图3所示的方法可以由用于实现机器学习相关业务的机器学习平台执行。图4示出了由机器学习平台向用户展示的机器学习方案模板的配置界面示意图。此处述及的用户是指机器学习方案模板的使用者,也即上文述及的第二用户。
如图4所示,左上角显示的是建模模板的名称,建模模板配置说明为第一用户创建该建模模板时上传的说明文档。通过建模模板配置说明可了解当前建模模板需要配置哪些数据表、哪些字段、相应业务含义。
用户表、商品表、行为表为模板方案中需要配置的输入表的名称,用户需填写对应的业务数据表,用户可直接填写业务数据表的表名称,在用户填写过程中可以搜索与用户输入的内容相匹配的业务数据表,并由下拉框弹出匹配项,匹配项可以按照时间倒序排列,每页显示5条业务数据表,用户可通过下滑操作查看。
以用户表为例,user_id、age、sex等为用户表下需要配置的字段。用户在为用户表选好对应的业务数据表后,可以针对用户表下需要配置的字段,选择业务数据表中与其对应的字段。字段填写完成后,可以校验所填字段是否符合要求,可以对字段类型、字段阈值进行校验,如不符合字段类型要求,可以提醒“#字段名#要求输入string字段,当前配置字段不符合要求请重新配置”,对应错误字段显示红星;如不符合字段阈值要求,可以提醒“#字段名#只接受#阈值下限#-#阈值上限#的数据,请检查对应表数据”。
时序参数设置、样本参数设置是指需要配置的参数占位符。可以根据参数配置限定信息展示对应的名称和输入内容。如有参数配置限定信息包括设置的一个或多个选择项,则展示选择项,供用户选择。作为示例,可以展示可供用户调整的默认值。
用户填写完成后,可以根据用户填写的内容,对模板方案中的可替换部分(如输入源标记、参数占位符)进行替换,使用替换后的模板方案执行机器学习过程。
本公开的用于创建机器学习方案模板的方法,还可以实现为一种用于创建机器学习方案模板的装置。图5示出了根据本公开示例性实施例的用于创建机器学习方案模板的装置的结构框图。其中,用于创建机器学习方案模板的装置的功能单元可以由实现本公开原理的硬件、软件或硬件和软件的结合来实现。本领域技术人员可以理解的是,图5所描述的功能单元可以组合起来或者划分成子单元,从而实现上述发明的原理。因此,本文的描述可以支持对本文描述的功能单元的任何可能的组合、或者划分、或者更进一步的限定。
下面就用于创建机器学习方案模板的装置可以具有的功能单元以及各功能单元可以执行的操作做简要说明,对于其中涉及的细节部分可以参见上文相关描述,这里不再赘述。
参见图5,用于创建机器学习方案模板的装置500包括第一获取模块510、第二获取模块520以及生成模块530。
第一获取模块510用于获取用于描述针对至少一个输入源标记的至少部分机器学习过程的模板方案,其中,机器学习过程涉及模型训练和/或模型应用。
第二获取模块520用于获取关于模板方案的输入源配置限定信息,其中,输入源配置限定信息用于生成输入源配置界面,使得经由输入源配置界面而配置的至少一个配置输入源替换模板方案中的至少一个输入源标记。
作为示例,第二获取模块520可以基于获取的模板方案来产生用于设置输入源配置 限定信息的控件,向第一用户展示产生的控件,接收第一用户通过所述控件所设置的输入源配置限定信息。
关于模板方案、输入源标记、输入源配置限定信息、输入源配置界面可以参见上文相关描述,此处不再赘述。
生成模块530用于基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件。
模板方案中可以包括至少一个参数占位符,用于创建机器学习方案模板的装置500还可以包括第三获取模块。第三获取模块用于获取关于模板方案的参数配置限定信息,其中,参数配置限定信息用于生成参数配置界面,使得经由参数配置界面而配置的至少一个配置参数替换模板方案中的至少一个参数占位符。生成模块530可以基于获取的模板方案、参数配置限定信息和输入源配置限定信息来生成机器学习方案模板的模板文件。关于参数配置限定信息、参数配置界面可以参见上文相关描述,此处不再赘述。
作为示例,第三获取模块基于获取的模板方案来产生用于设置参数配置限定信息的控件,向第一用户展示产生的控件,接收第一用户通过控件所设置的参数配置限定信息。参数配置限定信息还可以包括用于限定在参数配置界面上按照分类区域对参数占位符进行配置的分类信息。所述第三获取模块还向第一用户展示用于对参数占位符进行分类的控件,根据第一用户通过所述控件对参数占位符进行的分类来获取分类信息。
用于创建机器学习方案模板的装置500还可以包括第一展示模块、第一接收模块以及第一合并模块。第一展示模块用于向第一用户展示用于上传说明文档的控件;第一接收模块用于接收第一用户通过控件所上传的说明文档;第一合并模块用于将说明文档合并入模板文件。
用于创建机器学习方案模板的装置500还可以包括第二展示模块、第二接收模块以及第二合并模块。第二展示模块用于向第一用户展示用于设置资源配置信息的控件;第二接收模块用于接收第一用户通过控件所设置的资源配置信息,资源配置信息用于表征执行至少部分机器学习过程的资源配置;第二合并模块用于将资源配置合并入模板文件。
用于创建机器学习方案模板的装置500还可以包括第三展示模块、第四获取模块、替换模块、执行模块、评估模块以及发布或调试模块。第三展示模块用于向第三用户展示基于输入源配置限定信息生成的输入源配置界面;第四获取模块用于获取第三用户基于与测试场景对应的测试数据表,经由输入源配置界面所配置的至少一个配置输入源;替换模块用于用所述至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的机器学习方案模板;执行模块用于基于修改后的机器学习方案模板执行所述至少部分机器学习过程,以得到所述至少部分机器学习过程的执行结果;评估模块用于对所述执行结果进行评估,以得到测试结果;发布或调试模块用于基于所述测试结果确定是否发布所述机器学习方案模板,或者基于所述测试结果对所述机器学习方案模板进行调试。
应该理解,根据本公开示例性实施例的用于创建机器学习方案模板的装置500的具体实现方式可参照结合图1、图2针对用于创建机器学习方案模板的方法的相关描述来实现,在此不再赘述。
本公开的基于机器学习方案模板执行机器学习过程的方法,还可以实现为一种基于机器学习方案模板执行机器学习过程的装置。图6示出了根据本公开示例性实施例的用于创建机器学习方案模板的装置的结构框图。其中,基于机器学习方案模板执行机器学习过程的装置的功能单元可以由实现本公开原理的硬件、软件或硬件和软件的结合来实现。本领域技术人员可以理解的是,图6所描述的功能单元可以组合起来或者划分成子单元,从而实现上述发明的原理。因此,本文的描述可以支持对本文描述的功能单元的任何可能的组合、或者划分、或者更进一步的限定。
下面就基于机器学习方案模板执行机器学习过程的装置可以具有的功能单元以及各功能单元可以执行的操作做简要说明,对于其中涉及的细节部分可以参见上文相关描述, 这里不再赘述。
参见图6,基于机器学习方案模板执行机器学习过程的装置600包括第一获取模块610、第一展示模块620、第二获取模块630、第一替换模块640以及执行模块650。
第一获取模块610用于获取机器学习方案模板的模板文件,其中,模板文件包括模板方案和输入源配置限定信息,模板方案用于描述针对至少一个输入源标记的至少部分机器学习过程,机器学习过程涉及模型训练和/或模型应用,输入源配置限定信息用于生成输入源配置界面。关于模板方案、输入源标记、输入源配置限定信息可以参见上文相关描述,此处不再赘述。
第一展示模块620用于向第二用户展示基于输入源配置限定信息而生成的输入源配置界面。关于输入源配置界面可以参见上文相关描述,此处不再赘述。
第二获取模块630用于获取第二用户经由输入源配置界面而配置的至少一个配置输入源。输入源配置界面中可以包括用于设置配置输入源的控件,第二获取模块630可以接收第二用户通过控件所设置的配置输入源。关于配置输入源可以参见上文相关描述,此处不再赘述。
第一替换模块640用于用获取的至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的模板方案。
执行模块650用于基于修改后的模板方案来执行机器学习过程。模板文件还可以包括资源配置信息,资源配置信息用于表征执行至少部分机器学习过程的资源配置,执行模块650可以基于修改后的模板方案,使用资源配置信息所表征的资源配置执行机器学习过程.或者执行模块650可以对修改后的模板方案在执行机器学习过程中所需的资源配置进行预测,使用预测得到的资源配置执行机器学习过程。执行模块650的具体实现细节可以参见上文相关描述
输入源配置限定信息还包括用于限定经由输入源配置界面而配置的至少一个配置输入源在替换模板方案中的至少一个输入源标记之前所经过的处理的处理项,基于机器学习方案模板执行机器学习过程的装置600还可以包括第一处理模块和第二展示模块。第一处理模块用于按照处理项对配置输入源进行处理;第二展示模块用于在输入源配置界面中展示处理结果。关于处理项可以参见上文相关描述,此处不再赘述。
模板方案中还可以包括至少一个参数占位符,模板文件还可以包括参数配置限定信息,参数配置限定信息用于生成参数配置界面,基于机器学习方案模板执行机器学习过程的装置600还可以包括第三展示模块、第三获取模块以及第二替换模块。第三展示模块用于向第二用户展示基于参数配置限定信息而生成的参数配置界面;第三获取模块用于获取第二用户经由参数配置界面而配置的至少一个配置参数;第二替换模块用于用获取的至少一个配置参数替换模板方案中的至少一个参数占位符,以得到修改后的模板方案。参数配置界面中可以包括用于设置配置参数的控件,第三获取模块可以接收第二用户通过控件所设置的配置参数。关于参数配置限定信息、参数配置界面可以参见上文相关描述,此处不再赘述。
参数配置限定信息还可以包括用于限定经由参数配置界面而配置的至少一个配置参数在替换模板方案中的至少一个参数占位符之前所经过的处理的处理项,基于机器学习方案模板执行机器学习过程的装置600还可以包括第二处理模块和第四展示模块。第二处理模块用于按照处理项对配置参数进行处理;第四展示模块用于在输入源配置界面中展示处理结果。关于处理项可以参见上文相关描述,此处不再赘述。
模板文件还包括用于辅助第二用户了解和/或配置模板方案的说明文档,基于机器学习方案模板执行机器学习过程的装置600还可以包括提供模块,提供模块用于向第二用户提供说明文档。
应该理解,根据本公开示例性实施例的基于机器学习方案模板执行机器学习过程的装置600的具体实现方式可参照上文结合图3、图4针对基于机器学习方案模板执行机器 学习过程的方法的相关描述来实现,在此不再赘述。
以上参照图1到图6描述了根据本公开示例性实施例的机器学习方案模板的创建方法、使用方法及装置。应理解,上述方法可通过记录在计算可读介质上的程序来实现,例如,根据本公开的示例性实施例,可提供一种存储指令的计算机可读存储介质,其中,在所述计算机可读介质上记录有用于执行本公开的用于创建机器学习方案模板的方法(例如图1所示)或基于机器学习方案模板执行机器学习过程的方法(例如图3所示)的计算机程序。
上述计算机可读介质中的计算机程序可在诸如客户端、主机、代理装置、服务器等计算机设备中部署的环境中运行,应注意,所述计算机程序除了可用于执行除了图1或图3示出的步骤之外,还可用于执行除了上述步骤以外的附加步骤或者在执行上述步骤时执行更为具体的处理,这些附加步骤和进一步处理的内容已经参照图1、图3进行了描述,这里为了避免重复将不再进行赘述。
应注意,根据本公开示例性实施例的用于创建机器学习方案模板的装置和基于机器学习方案模板执行机器学习过程的装置,可完全依赖计算机程序的运行来实现相应的功能,即,各个装置与计算机程序的功能架构中与各步骤相应,使得整个装置通过专门的软件包(例如,lib库)而被调用,以实现相应的功能。
另一方面,图5、图6所示的各个装置也可以通过硬件、软件、固件、中间件、微代码或其任意组合来实现。当以软件、固件、中间件或微代码实现时,用于执行相应操作的程序代码或者代码段可以存储在诸如存储介质的计算机可读介质中,使得处理器可通过读取并运行相应的程序代码或者代码段来执行相应的操作。
例如,本公开的示例性实施例还可以实现为计算装置,该计算装置包括存储部件和处理器,存储部件中存储有计算机可执行指令集合,当所述计算机可执行指令集合被所述处理器执行时,执行用于创建机器学习方案模板的方法或基于机器学习方案模板执行机器学习过程的方法。
具体说来,所述计算装置可以部署在服务器或客户端中,也可以部署在分布式网络环境中的节点装置上。此外,所述计算装置可以是PC计算机、平板装置、个人数字助理、智能手机、web应用或其他能够执行上述指令集合的装置。
这里,所述计算装置并非必须是单个的计算装置,还可以是任何能够单独或联合执行上述指令(或指令集)的装置或电路的集合体。计算装置还可以是集成控制系统或系统管理器的一部分,或者可被配置为与本地或远程(例如,经由无线传输)以接口互联的便携式电子装置。
在所述计算装置中,处理器可包括中央处理器(CPU)、图形处理器(GPU)、可编程逻辑装置、专用处理器系统、微控制器或微处理器。作为示例而非限制,处理器还可包括模拟处理器、数字处理器、微处理器、多核处理器、处理器阵列、网络处理器等。
根据本公开示例性实施例的用于创建机器学习方案模板的方法或基于机器学习方案模板执行机器学习过程的方法中所描述的某些操作可通过软件方式来实现,某些操作可通过硬件方式来实现,此外,还可通过软硬件结合的方式来实现这些操作。
处理器可运行存储在存储部件之一中的指令或代码,其中,所述存储部件还可以存储数据。指令和数据还可经由网络接口装置而通过网络被发送和接收,其中,所述网络接口装置可采用任何已知的传输协议。
存储部件可与处理器集成为一体,例如,将RAM或闪存布置在集成电路微处理器等之内。此外,存储部件可包括独立的装置,诸如,外部盘驱动、存储阵列或任何数据库系统可使用的其他存储装置。存储部件和处理器可在操作上进行耦合,或者可例如通过I/O端口、网络连接等互相通信,使得处理器能够读取存储在存储部件中的文件。
此外,所述计算装置还可包括视频显示器(诸如,液晶显示器)和用户交互接口(诸如,键盘、鼠标、触摸输入装置等)。计算装置的所有组件可经由总线和/或网络而彼此 连接。
根据本公开示例性实施例的用于创建机器学习方案模板的方法或基于机器学习方案模板执行机器学习过程的方法所涉及的操作可被描述为各种互联或耦合的功能块或功能示图。然而,这些功能块或功能示图可被均等地集成为单个的逻辑装置或按照非确切的边界进行操作。
例如,如上所述,根据本公开示例性实施例的用于创建机器学习方案模板的装置或基于机器学习方案模板执行机器学习过程的装置可包括存储部件和处理器,其中,存储部件中存储有计算机可执行指令集合,当所述计算机可执行指令集合被所述处理器执行时,执行上文述及的用于创建机器学习方案模板的方法或基于机器学习方案模板执行机器学习过程的方法。
以上描述了本公开的各示例性实施例,应理解,上述描述仅是示例性的,并非穷尽性的,本公开不限于所披露的各示例性实施例。在不偏离本公开的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。因此,本公开的保护范围应该以权利要求的范围为准。
工业实用性
在根据本公开示例性实施例的机器学习方案模板的创建方法、使用方法及装置中,通过复用机器学习方案模板可以降低建模门槛并减少建模耗时,而机器学习方案模板的模板文件中的输入源配置限定信息可以用于解决实际业务数据与模板方案中固有数据之间的数据匹配问题,使得机器学习方案模板在应用于同一业务方向下不同数据结构的业务数据时,均能够获取较好的建模效果。

Claims (68)

  1. 一种用于创建机器学习方案模板的方法,包括:
    获取用于描述针对至少一个输入源标记的至少部分机器学习过程的模板方案,其中,机器学习过程涉及模型训练和模型应用之中的至少一个;
    获取关于所述模板方案的输入源配置限定信息,其中,所述输入源配置限定信息用于生成输入源配置界面,使得经由所述输入源配置界面而配置的至少一个配置输入源替换模板方案中的至少一个输入源标记;以及
    基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件。
  2. 根据权利要求1所述的方法,其中,获取关于所述模板方案的输入源配置限定信息的步骤包括:
    基于获取的模板方案来产生用于设置输入源配置限定信息的控件;
    向第一用户展示产生的控件;以及
    接收第一用户通过所述控件所设置的输入源配置限定信息。
  3. 根据权利要求2所述的方法,其中,
    所述输入源标记用于标识模板方案中能够被替换的输入表和字段之中的至少一个,所述输入源配置限定信息包括以下项之中的至少一个:输入源配置界面上展示的需要配置的至少一个输入表名称、各输入表对应的处理节点、各输入表下需要配置的各字段的名称、各字段在输入源配置界面上是否展示为可选字段的指示信息。
  4. 根据权利要求3所述的方法,其中,
    所述输入源配置限定信息还包括各字段对应的字段格式,其中,至少一个字段对应的字段格式被设置为允许针对单个字段配置实际业务数据中的一个或多个字段,使得所配置的一个或多个字段均按照模板方案中处理所述单个字段的同样方式进行字段处理。
  5. 根据权利要求3所述的方法,其中,
    所述输入源配置限定信息还包括用于限定经由所述输入源配置界面而配置的至少一个配置输入源在替换模板方案中的至少一个输入源标记之前所经过的处理的处理项。
  6. 根据权利要求5所述的方法,其中,
    所述处理项包括关于各字段的校验项,其中,校验项包括各字段的允许格式和允许取值范围之中的至少一个。
  7. 根据权利要求6所述的方法,其中,
    所述校验项还包括是否进行校验的指示信息。
  8. 根据权利要求1所述的方法,其中,所述模板方案包括至少一个参数占位符,并且,所述方法还包括:
    获取关于所述模板方案的参数配置限定信息,其中,所述参数配置限定信息用于生成参数配置界面,使得经由所述参数配置界面而配置的至少一个配置参数替换模板方案中的至少一个参数占位符,
    所述基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件的步骤包括:基于获取的模板方案、参数配置限定信息和输入源配置限定信息来生成机器学习方案模板的模板文件。
  9. 根据权利要求8所述的方法,其中,
    所述参数配置限定信息包括以下项之中的至少一个:参数配置界面上展示的需要针对参数占位符进行配置的类型信息、输入方式信息、展示信息、默认取值、实际取值。
  10. 根据权利要求9所述的方法,其中,
    所述类型信息指示脚本参数和运行参数之中的至少一个。
  11. 根据权利要求10所述的方法,其中,
    所述参数配置限定信息还包括用于限定经由所述参数配置界面而配置的至少一个配置参数在替换模板方案中的至少一个参数占位符之前所经过的处理的处理项。
  12. 根据权利要求11所述的方法,其中,
    所述处理项包括对所述至少一个配置参数进行校验的校验项。
  13. 根据权利要求8所述的方法,其中,所述获取关于模板方案的参数配置限定信息的步骤包括:
    基于获取的模板方案来产生用于设置参数配置限定信息的控件;
    向第一用户展示产生的控件;以及
    接收第一用户通过所述控件所设置的参数配置限定信息。
  14. 根据权利要求13所述的方法,其中,所述参数配置限定信息还包括:用于限定在参数配置界面上按照分类区域对参数占位符进行配置的分类信息,其中,所述获取关于模板方案的参数配置限定信息的步骤还包括:
    向第一用户展示用于对参数占位符进行分类的控件;以及
    根据第一用户通过所述控件对参数占位符进行的分类来获取分类信息。
  15. 根据权利要求1所述的方法,还包括:
    向第一用户展示用于上传说明文档的控件;
    接收第一用户通过所述控件所上传的说明文档;以及
    将说明文档合并入模板文件。
  16. 根据权利要求1所述的方法,还包括:
    向第一用户展示用于设置资源配置信息的控件;
    接收第一用户通过所述控件所设置的资源配置信息,所述资源配置信息用于表征执行所述至少部分机器学习过程的资源配置;以及
    将资源配置合并入模板文件。
  17. 根据权利要求1至16中任一项所述的方法,还包括:
    向第三用户展示基于输入源配置限定信息生成的输入源配置界面;
    获取第三用户基于与测试场景对应的测试数据表,经由输入源配置界面所配置的至少一个配置输入源;
    用所述至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的机器学习方案模板;
    基于修改后的机器学习方案模板执行所述至少部分机器学习过程,以得到所述至少部分机器学习过程的执行结果;
    对所述执行结果进行评估,以得到测试结果;
    基于所述测试结果确定是否发布所述机器学习方案模板,或者基于所述测试结果对所述机器学习方案模板进行调试。
  18. 一种基于机器学习方案模板执行机器学习过程的方法,包括:
    获取机器学习方案模板的模板文件,其中,所述模板文件包括模板方案和输入源配置限定信息,所述模板方案用于描述针对至少一个输入源标记的至少部分机器学习过程,所述机器学习过程涉及模型训练和模型应用之中的至少一个,所述输入源配置限定信息用于生成输入源配置界面;
    向第二用户展示基于所述输入源配置限定信息而生成的输入源配置界面;
    获取第二用户经由所述输入源配置界面而配置的至少一个配置输入源;
    用获取的至少一个配置输入源替换所述模板方案中的至少一个输入源标记,以得到修改后的模板方案;
    基于修改后的模板方案来执行机器学习过程。
  19. 根据权利要求18所述的方法,其中,所述输入源配置界面中包括用于设置配置输入源的控件,所述获取第二用户经由所述输入源配置界面而配置的至少一个配置输入源的步骤包括:
    接收第二用户通过所述控件所设置的配置输入源。
  20. 根据权利要求19所述的方法,其中,
    所述输入源标记用于标识模板方案中能够被替换的输入表和字段之中的至少一个,
    所述输入源配置限定信息包括以下项之中的至少一个:输入源配置界面上展示的需要配置的至少一个输入表名称、各输入表对应的处理节点、各输入表下需要配置的各第一字段的名称、各第一字段在输入源配置界面上是否展示为可选字段的指示信息,
    所述输入源配置界面中还包括以下项之中的至少一个:需要配置的至少一个输入表名称、各输入表下需要配置的各第一字段的名称、各第一字段是否为可选字段的指示信息,
    所述配置输入源包括以下项之中的至少一个:针对所述输入表配置的业务数据表、针对所述第一字段配置的业务数据表下的第二字段。
  21. 根据权利要求20所述的方法,其中,
    所述输入源配置限定信息还包括各第一字段对应的字段格式,其中,至少一个第一字段对应的字段格式被设置为允许针对单个第一字段配置实际业务数据中的一个或多个第二字段,使得所配置的一个或多个第二字段均按照模板方案中处理所述单个第一字段的同样方式进行字段处理,
    所述输入源配置界面中用于设置针对所述第一字段的第二字段的控件是基于所述第一字段对应的字段格式生成的,以使得第二用户通过该控件能够按照第一字段对应的字段格式针对第一字段配置第二字段。
  22. 根据权利要求20所述的方法,其中,
    所述输入源配置限定信息还包括用于限定经由所述输入源配置界面而配置的至少一个配置输入源在替换模板方案中的至少一个输入源标记之前所经过的处理的处理项,
    该方法还包括:按照所述处理项对所述配置输入源进行处理;以及在所述输入源配置界面中展示处理结果。
  23. 根据权利要求22所述的方法,其中,
    所述处理项包括关于各第一字段的校验项,其中,校验项包括各第一字段的允许格式和允许取值范围之中的至少一个,
    按照所述处理项对所述配置输入源进行处理的步骤包括:按照所述第一字段的检验项,对所述第一字段配置的第二字段的格式和取值之中的至少一个进行校验,
    其中,所述处理结果用于指示针对所述第一字段配置的第二字段的格式和取值之中的至少一个是否符合所述校验项。
  24. 根据权利要求23所述的方法,其中,
    所述校验项还包括是否进行校验的指示信息。
  25. 根据权利要求18所述的方法,其中,所述模板方案包括至少一个参数占位符,所述模板文件还包括参数配置限定信息,所述参数配置限定信息用于生成参数配置界面,该方法还包括:
    向第二用户展示基于参数配置限定信息而生成的参数配置界面;
    获取第二用户经由所述参数配置界面而配置的至少一个配置参数;
    用获取的至少一个配置参数替换模板方案中的至少一个参数占位符,以得到修改后的模板方案。
  26. 根据权利要求25所述的方法,其中,所述参数配置界面中包括用于设置配置参数的控件,所述获取第二用户经由所述参数配置界面而配置的至少一个配置参数的步骤包括:
    接收第二用户通过所述控件所设置的配置参数。
  27. 根据权利要求26所述的方法,其中,
    所述参数配置限定信息包括以下项之中的至少一个:参数配置界面上展示的需要针对参数占位符进行配置的类型信息、输入方式信息、展示信息、默认取值、实际取值,
    所述参数配置界面中还包括以下项之中的至少一个:需要配置的参数占位符的类型信息、输入方式信息、展示信息、默认取值。
  28. 根据权利要求27所述的方法,其中,
    所述参数配置限定信息还包括用于限定经由所述参数配置界面而配置的至少一个配置参数在替换模板方案中的至少一个参数占位符之前所经过的处理的处理项,
    该方法还包括:按照所述处理项对所述配置参数进行处理;以及在所述输入源配置界面中展示处理结果。
  29. 根据权利要求28所述的方法,其中,
    所述处理项包括对所述至少一个配置参数进行校验的校验项。
  30. 根据权利要求27所述的方法,其中,
    所述类型信息指示脚本参数和运行参数之中的至少一个。
  31. 根据权利要求25所述的方法,其中,
    所述参数配置限定信息还包括:用于限定在参数配置界面上按照分类区域对参数占位符进行配置的分类信息,
    所述参数配置界面按照所述分类信息将需要配置的参数占位符进行分类显示,不同分类的参数占位符被显示在不同分类区域。
  32. 根据权利要求18所述的方法,其中,
    所述模板文件还包括用于辅助第二用户了解所述模板方案的说明文档和用于辅助第 二用户配置所述模板方案的说明文档之中的至少一个,该方法还包括:向所述第二用户提供所述说明文档。
  33. 根据权利要求18所述的方法,其中,
    所述模板文件还包括资源配置信息,所述资源配置信息用于表征执行所述至少部分机器学习过程的资源配置,所述基于修改后的模板方案来执行机器学习过程的步骤包括:
    基于修改后的模板方案,使用所述资源配置信息所表征的资源配置执行所述机器学习过程,或者,对修改后的模板方案在执行机器学习过程中所需的资源配置进行预测,使用预测得到的资源配置执行所述机器学习过程。
  34. 一种用于创建机器学习方案模板的装置,包括:
    第一获取模块,用于获取用于描述针对至少一个输入源标记的至少部分机器学习过程的模板方案,其中,机器学习过程涉及模型训练和模型应用之中的至少一个;
    第二获取模块,用于获取关于所述模板方案的输入源配置限定信息,其中,所述输入源配置限定信息用于生成输入源配置界面,使得经由所述输入源配置界面而配置的至少一个配置输入源替换模板方案中的至少一个输入源标记;以及
    生成模块,用于基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件。
  35. 一种基于机器学习方案模板执行机器学习过程的装置,包括:
    第一获取模块,用于获取机器学习方案模板的模板文件,其中,所述模板文件包括模板方案和输入源配置限定信息,所述模板方案用于描述针对至少一个输入源标记的至少部分机器学习过程,所述机器学习过程涉及模型训练和模型应用之中的至少一个,所述输入源配置限定信息用于生成输入源配置界面;
    第一展示模块,用于向第二用户展示基于输入源配置限定信息而生成的输入源配置界面;
    第二获取模块,用于获取第二用户经由所述输入源配置界面而配置的至少一个配置输入源;
    第一替换模块,用于用获取的至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的模板方案;
    执行模块,用于基于修改后的模板方案来执行机器学习过程。
  36. 一种包括至少一个计算装置和至少一个存储指令的存储装置的系统,其中,所述指令在被所述至少一个计算装置运行时,促使所述至少一个计算装置执行用于创建机器学习方案模板的方法或基于机器学习方案模板执行机器学习过程的方法的步骤,
    其中,所述用于创建机器学习方案模板的方法的步骤包括:
    获取用于描述针对至少一个输入源标记的至少部分机器学习过程的模板方案,其中,机器学习过程涉及模型训练和模型应用之中的至少一个;
    获取关于所述模板方案的输入源配置限定信息,其中,所述输入源配置限定信息用于生成输入源配置界面,使得经由所述输入源配置界面而配置的至少一个配置输入源替换模板方案中的至少一个输入源标记;以及
    基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件;
    其中,所述基于机器学习方案模板执行机器学习过程的方法的步骤包括:
    获取机器学习方案模板的模板文件,其中,所述模板文件包括模板方案和输入源配置限定信息,所述模板方案用于描述针对至少一个输入源标记的至少部分机器学习过程,所述机器学习过程涉及模型训练和模型应用之中的至少一个,所述输入源配置限定信息用 于生成输入源配置界面;
    向第二用户展示基于输入源配置限定信息而生成的输入源配置界面;
    获取第二用户经由所述输入源配置界面而配置的至少一个配置输入源;
    用获取的至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的模板方案;
    基于修改后的模板方案来执行机器学习过程。
  37. 根据权利要求36所述的系统,其中,获取关于所述模板方案的输入源配置限定信息的步骤包括:
    基于获取的模板方案来产生用于设置输入源配置限定信息的控件;
    向第一用户展示产生的控件;以及
    接收第一用户通过所述控件所设置的输入源配置限定信息。
  38. 根据权利要求37所述的系统,其中,
    所述输入源标记用于标识模板方案中能够被替换的输入表和字段之中的至少一个,所述输入源配置限定信息包括以下项之中的至少一个:输入源配置界面上展示的需要配置的至少一个输入表名称、各输入表对应的处理节点、各输入表下需要配置的各字段的名称、各字段在输入源配置界面上是否展示为可选字段的指示信息。
  39. 根据权利要求38所述的系统,其中,
    所述输入源配置限定信息还包括各字段对应的字段格式,其中,至少一个字段对应的字段格式被设置为允许针对单个字段配置实际业务数据中的一个或多个字段,使得所配置的一个或多个字段均按照模板方案中处理所述单个字段的同样方式进行字段处理。
  40. 根据权利要求38所述的系统,其中,
    所述输入源配置限定信息还包括用于限定经由所述输入源配置界面而配置的至少一个配置输入源在替换模板方案中的至少一个输入源标记之前所经过的处理的处理项。
  41. 根据权利要求40所述的系统,其中,
    所述处理项包括关于各字段的校验项,其中,校验项包括各字段的允许格式和允许取值范围之中的至少一个。
  42. 根据权利要求41所述的系统,其中,
    所述校验项还包括是否进行校验的指示信息。
  43. 根据权利要求36所述的系统,其中,所述模板方案包括至少一个参数占位符,并且,所述指令在被所述至少一个计算装置运行时,促使所述至少一个计算装置还执行以下步骤:
    获取关于模板方案的参数配置限定信息,其中,所述参数配置限定信息用于生成参数配置界面,使得经由所述参数配置界面而配置的至少一个配置参数替换模板方案中的至少一个参数占位符,
    所述基于获取的模板方案和输入源配置限定信息生成机器学习方案模板的模板文件的步骤包括:基于获取的模板方案、参数配置限定信息和输入源配置限定信息来生成机器学习方案模板的模板文件。
  44. 根据权利要求43所述的系统,其中,
    所述参数配置限定信息包括以下项之中的至少一个:参数配置界面上展示的需要针 对参数占位符进行配置的类型信息、输入方式信息、展示信息、默认取值、实际取值。
  45. 根据权利要求44所述的系统,其中,
    所述类型信息指示脚本参数和运行参数之中的至少一个。
  46. 根据权利要求45所述的系统,其中,
    所述参数配置限定信息还包括用于限定经由所述参数配置界面而配置的至少一个配置参数在替换模板方案中的至少一个参数占位符之前所经过的处理的处理项。
  47. 根据权利要求46所述的系统,其中,
    所述处理项包括对所述至少一个配置参数进行校验的校验项。
  48. 根据权利要求43所述的系统,其中,所述获取关于模板方案的参数配置限定信息的步骤包括:
    基于获取的模板方案来产生用于设置参数配置限定信息的控件;
    向第一用户展示产生的控件;以及
    接收第一用户通过所述控件所设置的参数配置限定信息。
  49. 根据权利要求48所述的系统,其中,所述参数配置限定信息还包括:用于限定在参数配置界面上按照分类区域对参数占位符进行配置的分类信息,所述获取关于模板方案的参数配置限定信息的步骤还包括:
    向第一用户展示用于对参数占位符进行分类的控件;以及
    根据第一用户通过所述控件对参数占位符进行的分类来获取分类信息。
  50. 根据权利要求36所述的系统,所述指令在被所述至少一个计算装置运行时,促使所述至少一个计算装置还执行以下步骤:
    向第一用户展示用于上传说明文档的控件;
    接收第一用户通过所述控件所上传的说明文档;以及
    将说明文档合并入模板文件。
  51. 根据权利要求36所述的系统,所述指令在被所述至少一个计算装置运行时,促使所述至少一个计算装置还执行以下步骤:
    向第一用户展示用于设置资源配置信息的控件;
    接收第一用户通过所述控件所设置的资源配置信息,所述资源配置信息用于表征执行所述至少部分机器学习过程的资源配置;以及
    将资源配置合并入模板文件。
  52. 根据权利要求36至51中任一项所述的系统,所述指令在被所述至少一个计算装置运行时,促使所述至少一个计算装置还执行以下步骤:
    向第三用户展示基于输入源配置限定信息生成的输入源配置界面;
    获取第三用户基于与测试场景对应的测试数据表,经由输入源配置界面所配置的至少一个配置输入源;
    用所述至少一个配置输入源替换模板方案中的至少一个输入源标记,以得到修改后的机器学习方案模板;
    基于修改后的机器学习方案模板执行所述至少部分机器学习过程,以得到所述至少部分机器学习过程的执行结果;
    对所述执行结果进行评估,以得到测试结果;
    基于所述测试结果确定是否发布所述机器学习方案模板,或者基于所述测试结果对所述机器学习方案模板进行调试。
  53. 根据权利要求36所述的系统,其中,所述输入源配置界面中包括用于设置配置输入源的控件,获取第二用户经由所述输入源配置界面而配置的至少一个配置输入源的步骤包括:
    接收第二用户通过所述控件所设置的配置输入源。
  54. 根据权利要求53所述的系统,其中,
    所述输入源标记用于标识模板方案中能够被替换的输入表和字段之中的至少一个,
    所述输入源配置限定信息包括以下项之中的至少一个:输入源配置界面上展示的需要配置的至少一个输入表名称、各输入表对应的处理节点、各输入表下需要配置的各第一字段的名称、各第一字段在输入源配置界面上是否展示为可选字段的指示信息,
    所述输入源配置界面中还包括以下项之中的至少一个:需要配置的至少一个输入表名称、各输入表下需要配置的各第一字段的名称、各第一字段是否为可选字段的指示信息,
    所述配置输入源包括以下项之中的至少一个:针对所述输入表配置的业务数据表、针对所述第一字段配置的业务数据表下的第二字段。
  55. 根据权利要求54所述的系统,其中,
    所述输入源配置限定信息还包括各第一字段对应的字段格式,其中,至少一个第一字段对应的字段格式被设置为允许针对单个第一字段配置实际业务数据中的一个或多个第二字段,使得所配置的一个或多个第二字段均按照模板方案中处理所述单个第一字段的同样方式进行字段处理,
    所述输入源配置界面中用于设置针对所述第一字段的第二字段的控件是基于所述第一字段对应的字段格式生成的,以使得第二用户通过该控件能够按照第一字段对应的字段格式针对第一字段配置第二字段。
  56. 根据权利要求54所述的系统,其中,
    所述输入源配置限定信息还包括用于限定经由所述输入源配置界面而配置的至少一个配置输入源在替换模板方案中的至少一个输入源标记之前所经过的处理的处理项,
    该方法还包括:按照所述处理项对所述配置输入源进行处理;以及在所述输入源配置界面中展示处理结果。
  57. 根据权利要求56所述的系统,其中,
    所述处理项包括关于各第一字段的校验项,其中,校验项包括各第一字段的允许格式和允许取值范围之中的至少一个,
    所述按照所述处理项对所述配置输入源进行处理的步骤包括:按照所述第一字段的检验项,对所述第一字段配置的第二字段的格式和取值之中的至少一个进行校验,
    其中,所述处理结果用于指示针对所述第一字段配置的第二字段的格式和取值之中的至少一个是否符合所述校验项。
  58. 根据权利要求57所述的系统,其中,
    所述校验项还包括是否进行校验的指示信息。
  59. 根据权利要求36所述的系统,其中,所述模板方案包括至少一个参数占位符,所述模板文件还包括参数配置限定信息,所述参数配置限定信息用于生成参数配置界面,该方法还包括:
    向第二用户展示基于参数配置限定信息而生成的参数配置界面;
    获取第二用户经由所述参数配置界面而配置的至少一个配置参数;
    用获取的至少一个配置参数替换模板方案中的至少一个参数占位符,以得到修改后的模板方案。
  60. 根据权利要求59所述的系统,其中,所述参数配置界面中包括用于设置配置参数的控件,所述获取第二用户经由所述参数配置界面而配置的至少一个配置参数的步骤包括:
    接收第二用户通过所述控件所设置的配置参数。
  61. 根据权利要求60所述的系统,其中,
    所述参数配置限定信息包括以下项之中的至少一个:参数配置界面上展示的需要针对参数占位符进行配置的类型信息、输入方式信息、展示信息、默认取值、实际取值,
    所述参数配置界面中还包括以下项之中的至少一个:需要配置的参数占位符的类型信息、输入方式信息、展示信息、默认取值。
  62. 根据权利要求61所述的系统,其中,
    所述参数配置限定信息还包括用于限定经由所述参数配置界面而配置的至少一个配置参数在替换模板方案中的至少一个参数占位符之前所经过的处理的处理项,
    该方法还包括:按照所述处理项对所述配置参数进行处理;以及在所述输入源配置界面中展示处理结果。
  63. 根据权利要求62所述的系统,其中,
    所述处理项包括对所述至少一个配置参数进行校验的校验项。
  64. 根据权利要求61所述的系统,其中,
    所述类型信息指示脚本参数和运行参数之中的至少一个。
  65. 根据权利要求59所述的系统,其中,
    所述参数配置限定信息还包括:用于限定在参数配置界面上按照分类区域对参数占位符进行配置的分类信息,
    所述参数配置界面按照所述分类信息将需要配置的参数占位符进行分类显示,不同分类的参数占位符被显示在不同分类区域。
  66. 根据权利要求36所述的系统,其中,
    所述模板文件还包括用于辅助第二用户了解所述模板方案的说明文档和用于辅助第二用户配置所述模板方案的说明文档之中的至少一个,该方法还包括:向所述第二用户提供所述说明文档。
  67. 根据权利要求36所述的系统,其中,
    所述模板文件还包括资源配置信息,所述资源配置信息用于表征执行所述至少部分机器学习过程的资源配置,基于修改后的模板方案来执行机器学习过程的步骤包括:
    基于修改后的模板方案,使用所述资源配置信息所表征的资源配置执行所述机器学习过程,或者,对修改后的模板方案在执行机器学习过程中所需的资源配置进行预测,使用预测得到的资源配置执行所述机器学习过程。
  68. 一种存储指令的计算机可读存储介质,其中,当所述指令被至少一个计算装置 运行时,促使所述至少一个计算装置执行如权利要求1到33中的任一权利要求所述的方法。
PCT/CN2020/132093 2019-12-04 2020-11-27 机器学习方案模板的创建方法、使用方法及装置 WO2021109928A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911225347.5A CN110990053A (zh) 2019-12-04 2019-12-04 机器学习方案模板的创建方法、使用方法及装置
CN201911225347.5 2019-12-04

Publications (1)

Publication Number Publication Date
WO2021109928A1 true WO2021109928A1 (zh) 2021-06-10

Family

ID=70089913

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/132093 WO2021109928A1 (zh) 2019-12-04 2020-11-27 机器学习方案模板的创建方法、使用方法及装置

Country Status (2)

Country Link
CN (1) CN110990053A (zh)
WO (1) WO2021109928A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610242A (zh) * 2021-08-10 2021-11-05 中国工商银行股份有限公司 数据处理方法、装置和服务器
CN113742242A (zh) * 2021-09-16 2021-12-03 中国银行股份有限公司 一种接口测试方法及装置
CN114615027A (zh) * 2022-02-24 2022-06-10 奇安信科技集团股份有限公司 行为数据处理方法、装置、设备和存储介质
CN118194292A (zh) * 2024-03-15 2024-06-14 北京奇虎科技有限公司 机器学习框架的测试数据生成方法、装置及电子设备

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990053A (zh) * 2019-12-04 2020-04-10 第四范式(北京)技术有限公司 机器学习方案模板的创建方法、使用方法及装置
CN111523676B (zh) * 2020-04-17 2024-04-12 第四范式(北京)技术有限公司 辅助机器学习模型上线的方法及装置
CN111552713A (zh) * 2020-04-30 2020-08-18 国网信息通信产业集团有限公司 一种数据校验方法及装置
CN111666100B (zh) * 2020-05-13 2023-12-15 深圳思为科技有限公司 软件框架生成方法、装置、电子设备及存储介质
WO2022037689A1 (zh) * 2020-08-20 2022-02-24 第四范式(北京)技术有限公司 一种基于数据形式的数据处理方法和应用机器学习的方法
CN112884166A (zh) * 2021-03-31 2021-06-01 联想(北京)有限公司 机器学习流程图的生成方法及装置、设备
CN113971032B (zh) * 2021-12-24 2022-03-18 百融云创科技股份有限公司 一种代码生成的机器学习模型全流程自动部署方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8521757B1 (en) * 2008-09-26 2013-08-27 Symantec Corporation Method and apparatus for template-based processing of electronic documents
CN108710949A (zh) * 2018-04-26 2018-10-26 第四范式(北京)技术有限公司 用于创建机器学习建模模板的方法及系统
CN110414689A (zh) * 2019-08-06 2019-11-05 中国工商银行股份有限公司 一种机器学习模型线上更新方法及装置
CN110990053A (zh) * 2019-12-04 2020-04-10 第四范式(北京)技术有限公司 机器学习方案模板的创建方法、使用方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8521757B1 (en) * 2008-09-26 2013-08-27 Symantec Corporation Method and apparatus for template-based processing of electronic documents
CN108710949A (zh) * 2018-04-26 2018-10-26 第四范式(北京)技术有限公司 用于创建机器学习建模模板的方法及系统
CN110414689A (zh) * 2019-08-06 2019-11-05 中国工商银行股份有限公司 一种机器学习模型线上更新方法及装置
CN110990053A (zh) * 2019-12-04 2020-04-10 第四范式(北京)技术有限公司 机器学习方案模板的创建方法、使用方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610242A (zh) * 2021-08-10 2021-11-05 中国工商银行股份有限公司 数据处理方法、装置和服务器
CN113742242A (zh) * 2021-09-16 2021-12-03 中国银行股份有限公司 一种接口测试方法及装置
CN114615027A (zh) * 2022-02-24 2022-06-10 奇安信科技集团股份有限公司 行为数据处理方法、装置、设备和存储介质
CN118194292A (zh) * 2024-03-15 2024-06-14 北京奇虎科技有限公司 机器学习框架的测试数据生成方法、装置及电子设备

Also Published As

Publication number Publication date
CN110990053A (zh) 2020-04-10

Similar Documents

Publication Publication Date Title
WO2021109928A1 (zh) 机器学习方案模板的创建方法、使用方法及装置
CN108628741B (zh) 网页页面测试方法、装置、电子设备和介质
EP3433732B1 (en) Converting visual diagrams into code
CN109993316B (zh) 执行机器学习流程的方法及系统
CA2925015C (en) System and method for testing data representation for different mobile devices
US7784025B2 (en) Mechanism for using processlets to model service processes
US9015666B2 (en) Updating product documentation using automated test scripts
CN109739855B (zh) 实现数据表拼接及自动训练机器学习模型的方法和系统
CN108830383B (zh) 用于展示机器学习建模过程的方法及系统
US20150094997A1 (en) Explaining partially illegal combinations in combinatorial models
CN108898229B (zh) 用于构建机器学习建模过程的方法及系统
CN108228861A (zh) 用于执行机器学习的特征工程的方法及系统
US11106569B1 (en) Requirements to test system and method
WO2021208774A1 (zh) 辅助机器学习模型上线的方法及装置
CN109242040A (zh) 自动生成组合特征的方法及系统
CN114036501A (zh) 一种app的检测方法、系统、装置、设备及存储介质
JP2003114813A (ja) 分析サーバ、プログラム分析ネットワークシステム、およびプログラム分析方法
US20210272023A1 (en) Information processing system and information processing method
TWI643083B (zh) 電子應用程式開發之方法、非暫時性電腦可讀儲存媒體及系統
CN108960433B (zh) 用于运行机器学习建模过程的方法及系统
CN111444170B (zh) 基于预测业务场景的自动机器学习方法和设备
US10579340B2 (en) Model element characteristic preservation in modeling environments
CN113010129A (zh) 虚拟演播厅全流程多终端板书提取方法和装置
US20080066005A1 (en) Systems and Methods of Interfacing with Enterprise Resource Planning Systems
Kaufmann et al. Intra-and interdiagram consistency checking of behavioral multiview models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20897041

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20897041

Country of ref document: EP

Kind code of ref document: A1