CN110209902B - Method and system for visualizing feature generation process in machine learning process - Google Patents

Method and system for visualizing feature generation process in machine learning process Download PDF

Info

Publication number
CN110209902B
CN110209902B CN201810941689.6A CN201810941689A CN110209902B CN 110209902 B CN110209902 B CN 110209902B CN 201810941689 A CN201810941689 A CN 201810941689A CN 110209902 B CN110209902 B CN 110209902B
Authority
CN
China
Prior art keywords
information
data
feature
display control
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810941689.6A
Other languages
Chinese (zh)
Other versions
CN110209902A (en
Inventor
方荣
杨博文
黄亚建
杨慧斌
詹镇江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4Paradigm Beijing Technology Co Ltd
Original Assignee
4Paradigm Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4Paradigm Beijing Technology Co Ltd filed Critical 4Paradigm Beijing Technology Co Ltd
Priority to CN201810941689.6A priority Critical patent/CN110209902B/en
Publication of CN110209902A publication Critical patent/CN110209902A/en
Application granted granted Critical
Publication of CN110209902B publication Critical patent/CN110209902B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor

Abstract

A method and system for visualizing a feature generation process in a machine learning process is provided. The method comprises the following steps: determining a feature to be visualized for its generation process; analyzing at least one data processing step used for generating the feature in the machine learning process to acquire generation process information of the feature, wherein the generation process information comprises data information and/or processing information of the at least one data processing step; generating a process presentation view of a generation process for depicting the feature based on the generation process information; and graphically displaying the process display view.

Description

Method and system for visualizing feature generation process in machine learning process
Technical Field
The present invention relates to the field of machine learning, and more particularly, to a method and system for visualizing a feature generation process in a machine learning process.
Background
With the advent of the big data age, many industries are producing massive amounts of data, and the variety of data, the size of data, and the dimensions of data are expanding. In order to discover knowledge and value from mass data, machine learning techniques are increasingly used.
Here, data is a raw material of the machine learning process, and has an important meaning for the effect of the machine learning model, and in order to be able to apply the data to the machine learning, it is often necessary to perform corresponding processing on the data, such as data cleaning, data filling, data stitching, or feature extraction.
In practice, the data processing process may be implemented by running code written by a programmer, or may be implemented by a machine learning platform according to scripts, configurations and/or interactions input by a user, where the whole data processing process often involves a huge amount of data or complex processing operations. The interactivity between the existing machine learning platform and the user is poor, and a general user cannot intuitively understand the logic thought and the working details of the data processing process, namely, cannot understand the generation process of a specific feature. Even if the user knows each step of the overall machine learning process, it is difficult to quickly discern which data processing steps a particular feature is associated with. Thus, for example, when an abnormality or error occurs in the machine learning process, it is difficult for the user to quickly trace back to the source of the abnormality or error, or when some features in the machine learning process are of interest, it is difficult for the user to quickly learn details related to only the features. On the existing machine learning platform, the whole machine learning process can only be decomposed and analyzed step by a user, and then the meaning of specific features and the generation process thereof are extracted by the user, but the use burden of the user is increased, and the popularization and application of the machine learning technology are seriously affected.
Disclosure of Invention
According to an exemplary embodiment of the present invention, there is provided a method of visualizing a feature generation process in a machine learning process, the method comprising: determining a feature to be visualized for its generation process; analyzing at least one data processing step used for generating the feature in the machine learning process to acquire generation process information of the feature, wherein the generation process information comprises data information and/or processing information of the at least one data processing step; generating a process presentation view of a generation process for depicting the feature based on the generation process information; and graphically displaying the process display view.
Optionally, the data information of the at least one data processing step comprises information about an input item and/or an output item of the at least one data processing step, and the processing information of the at least one data processing step comprises information about a processing procedure of the at least one data processing step.
Optionally, the process presentation view is a flowchart representing a generation process of the feature, wherein nodes in the flowchart represent input items, output items and/or processing processes of corresponding data processing steps, respectively. The process of graphically presenting the process presentation view includes: information about the input items, output items and/or processes of the corresponding data processing steps is presented in the display control of each node.
Optionally, the at least one data processing step comprises a feature extraction step for generating the feature. The data information of the feature extraction step comprises information about input items and/or output items of the feature extraction step. The processing information of the feature extraction step includes information on a processing procedure of the feature extraction step.
Optionally, the flowchart includes: a node representing a source field of an input item of the feature extraction step, a node representing an extraction process as a process of the feature extraction step, and/or a node representing the feature as an output item of the feature extraction step. The process of graphically presenting the process presentation view further includes: the name of the source field is presented in a display control of the node representing the source field, the name and/or flow information of the extraction process is presented in a display control of the node representing the extraction process, and/or the name of the feature is presented in a display control of the node representing the feature.
Optionally, the flow information of the extraction process includes names of one or more processing methods applied in the extraction process. The nodes representing the extraction process include child nodes respectively representing the one or more processing methods. The process of graphically presenting the process presentation view further includes: and respectively displaying names of the one or more processing methods applied in the extraction processing process in the display control of the child node.
Optionally, the flowchart further includes: a node of a source data table representing the source field, and. The process of graphically presenting the process presentation view further includes: the name of the source data table is shown in a display control representing the node of the source data table.
Optionally, the at least one data processing step further comprises an upstream processing step of the feature extraction step, wherein the upstream processing step is used for generating a source data table of the source field.
Optionally, the upstream processing step includes one or more data table stitching steps. The data information of the one or more data table stitching steps comprises information about input items and/or output items of the one or more data table stitching steps. The processing information of the one or more data sheet stitching steps includes information regarding the processing of the one or more data sheet stitching steps.
Optionally, the flowchart further includes: nodes representing input data tables as input to the one or more data table stitching steps and/or nodes representing stitching processes as processing of the one or more data table stitching steps. The process of graphically presenting the process presentation view further includes: the names of the input data tables are respectively displayed in display controls of nodes representing the input data tables, and/or the names of the splicing processing procedures are respectively displayed in display controls of nodes representing the splicing processing procedures.
Optionally, the display control of the node corresponding to the feature extraction step, the display control of the node corresponding to the source field, the display control of the node corresponding to the splicing process, the display control of the node corresponding to the source data table and/or the display control of the node corresponding to the input data table have respective forms.
Optionally, the process of graphically presenting the process presentation view further comprises: in response to a user selection operation on a specific display control, listing detail information about input items, output items and/or processing procedures displayed in the specific display control in detail display controls corresponding to the specific display control.
Optionally, the detail information about the input item and/or the output item includes at least one of a name corresponding to the input item and/or the output item, a description added by a user, a number of rows of the data table, a number of columns of the data table, a field name of the data table, a field type of the data table, at least a part of data in the data table, statistical analysis information of data in the data table, and statistical analysis information of data of the field. The detail information about the process includes at least one of a name corresponding to the process, a description added by a user, code information, and a transformation process of example data.
According to an exemplary embodiment of the present invention, a computer readable medium for visualizing a feature generation process in a machine learning process is provided, wherein a computer program for executing the aforementioned method for visualizing a feature generation process in a machine learning process by one or more processors is recorded on the computer readable medium.
According to an exemplary embodiment of the present invention, a computing device for visualizing a feature generation process in a machine learning process is provided, comprising one or more storage devices and one or more processors, wherein a set of computer-executable instructions is stored in the one or more storage devices, which when executed by the one or more processors, perform the aforementioned method of visualizing a feature generation process in a machine learning process.
According to an exemplary embodiment of the present invention, there is provided a system for visualizing a feature generation process in a machine learning process, including: determining means for determining a feature for which the generation process is to be visualized; interpretation means for resolving at least one data processing step in the machine learning process for generating the feature to obtain generation process information of the feature, wherein the generation process information comprises data information and/or processing information of the at least one data processing step; view generation means for generating a process presentation view of a generation process for depicting the feature based on the generation process information; and a display device for graphically displaying the process display view.
Optionally, the data information of the at least one data processing step comprises information about input items and/or output items of the at least one data processing step. The processing information of the at least one data processing step includes information about a processing procedure of the at least one data processing step.
Optionally, the process presentation view is a flowchart representing a generation process of the feature, wherein nodes in the flowchart represent input items, output items and/or processing processes of corresponding data processing steps, respectively. The display device is also used for: information about the input items, output items and/or processes of the corresponding data processing steps is presented in the display control of each node.
Optionally, the at least one data processing step comprises a feature extraction step for generating the feature. The data information of the feature extraction step comprises information about input items and/or output items of the feature extraction step. The processing information of the feature extraction step includes information on a processing procedure of the feature extraction step.
Optionally, the flowchart includes: a node representing a source field of an input item of the feature extraction step, a node representing an extraction process as a process of the feature extraction step, and/or a node representing the feature as an output item of the feature extraction step. The display device is also used for: the name of the source field is presented in a display control of the node representing the source field, the name and/or flow information of the extraction process is presented in a display control of the node representing the extraction process, and/or the name of the feature is presented in a display control of the node representing the feature.
Optionally, the flow information of the extraction process includes names of one or more processing methods applied in the extraction process, and the node representing the extraction process includes child nodes respectively representing the one or more processing methods. The display device is also used for: and respectively displaying the names of the one or more processing methods in the display control of the child node.
Optionally, the flowchart further includes: a node of a source data table representing the source field, and the presentation means is further for: the name of the source data table is shown in a display control representing the node of the source data table.
Optionally, the at least one data processing step further comprises an upstream processing step of the feature extraction step, wherein the upstream processing step is used for generating a source data table of the source field.
Optionally, the upstream processing step comprises one or more data sheet splicing steps, and the data information of the one or more data sheet splicing steps comprises information about the input item and/or the output item of the one or more data sheet splicing steps, and the processing information of the one or more data sheet splicing steps comprises information about the processing procedure of the one or more data sheet splicing steps.
Optionally, the flowchart further includes: nodes representing input data tables as input to the one or more data table stitching steps and/or nodes representing stitching processes as processing of the one or more data table stitching steps. Accordingly, the display device is also for: the names of the input data tables are respectively displayed in display controls of nodes representing the input data tables, and/or the names of the splicing processing procedures are respectively displayed in display controls of nodes representing the splicing processing procedures.
Optionally, the display control of the node corresponding to the feature extraction step, the display control of the node corresponding to the source field, the display control of the node corresponding to the splicing process, the display control of the node corresponding to the source data table and/or the display control of the node corresponding to the input data table have respective forms.
Optionally, the display device is further configured to: in response to a user selection operation on a specific display control, listing detail information about input items, output items and/or processing procedures displayed in the specific display control in detail display controls corresponding to the specific display control.
Optionally, the detail information about the input item and/or the output item includes at least one of a name corresponding to the input item and/or the output item, a description added by a user, a number of rows of the data table, a number of columns of the data table, a field name of the data table, a field type of the data table, at least a part of data in the data table, statistical analysis information of data in the data table, and statistical analysis information of data of the field,
the detail information about the process includes at least one of a name corresponding to the process, a description added by a user, code information, and a transformation process of example data.
Advantageous effects
By applying the method and the system for visualizing the feature generation process in the machine learning process according to the exemplary embodiment of the invention, a user can visually trace back the feature generation process by using the machine learning platform, intuitively know the related details of the generation process, and enhance the interaction between the machine learning platform and the user, thereby facilitating the user to control the machine learning process more accurately.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
The foregoing and other objects and features of exemplary embodiments of the invention will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate exemplary embodiments in which:
fig. 1 is an example of a machine learning process configured by building a Directed Acyclic Graph (DAG) in a prior art machine learning platform.
Fig. 2 illustrates a system for visualizing a feature generation process in a machine learning process according to an exemplary embodiment of the invention.
Fig. 3 shows a flowchart of a method of visualizing a feature generation process in a machine learning process according to an exemplary embodiment of the invention.
Fig. 4 shows a process presentation view of a generation process for depicting features according to an exemplary embodiment of the invention.
Hereinafter, the present invention will be described in detail with reference to the drawings, wherein the same or similar elements will be designated with the same or similar reference numerals throughout the drawings.
Detailed Description
The following description is provided with reference to the accompanying drawings to assist in a comprehensive understanding of exemplary embodiments of the invention defined by the claims and their equivalents. The description includes various specific details to aid in understanding, but these are to be considered exemplary only. Thus, one of ordinary skill in the art will recognize that: various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.
With the advent of mass data, artificial intelligence technology has evolved rapidly. Machine learning (including deep learning) and the like are necessarily the products of the development of artificial intelligence to a certain stage, and are aimed at mining valuable potential information from massive data through calculation means, and improving the performance of the system by experience. In computer systems, "experience" is usually present in the form of "data" from which "models" can be generated by means of machine learning algorithms, i.e. experience data are provided to the machine learning algorithm, so that models can be generated based on these experience data, which models, when faced with new situations, provide corresponding decisions, i.e. prediction results. Machine learning may be implemented in the form of "supervised learning", "unsupervised learning", or "semi-supervised learning", it being noted that the exemplary embodiments of the present invention are not particularly limited to particular machine learning algorithms. In embodiments of the invention, the data processing process is at least a part of the process from the introduction of raw data to the output of samples, the whole process may also be referred to as feature engineering. The data processing procedure may comprise one or more data processing steps, the details of which may be obtained by parsing according to an exemplary embodiment of the invention.
The present invention proposes a method and system for visualizing a feature generation process in a machine learning process, so that a user quickly and intuitively understands the feature generation process.
Fig. 1 is an example of a machine learning process configured by building a Directed Acyclic Graph (DAG) in a prior art machine learning platform.
In the example shown in fig. 1, each module may represent a relevant step in the machine learning process, and it can be seen that the data processing process of the thick wire frame selection takes a significant part of the work. This is because, in the machine learning process, the data field on which the sample feature is based may be from a wide table generated after the plurality of data tables are spliced, for example, in the case that a bank judges a fraudulent transaction using a machine learning model, the user information table, the bank card information table, and the transaction record table are spliced into one wide table to be processed. Further, as an example, the original data records in the data table may need to undergo a series of operations such as cleansing, format conversion (e.g., date format conversion), time-series stitching, and the like.
Although in the example shown in fig. 1, the relevant processes are modularized by exposing the steps as nodes in the DAG, this approach does not help the user quickly understand the overall idea of the data processing process, which works are specifically done, and the processing results, which may be a sample of multiple features. This modular approach does not help the user intuitively understand the specific details in the specific generation of a feature in the sample. If the user wishes to see details, he has to look at the specific content and processing results of the individual modules one by one or to add instructions for each module actively, so that he can only rely on the user to analyze the specific details in the process of generating one or more features. Not only does this burden the user, but in many cases the specific content of these modules is in turn entirely the original code processed accordingly, the user has to have some knowledge to understand the meaning of each feature and its generation process by the code.
Fig. 2 illustrates a system 10 for visualizing a feature generation process in a machine learning process in accordance with an exemplary embodiment of the present invention. The system 10 comprises determining means 100, interpreting means 101, view generating means 102 and presenting means 103.
The determining means 100 is configured to determine the features for which the generation process is to be visualized. Specifically, before, at the same time as, or after the machine learning process is run, the determining apparatus 100 may determine or identify a feature (i.e., a feature to be traced back) to be visualized for its generation process from all features that have been generated or are to be generated by the machine learning process. For example, the determining apparatus 100 may identify important features generated in the machine learning process according to a preset of the machine learning platform to determine the important features as features to be visualized for the generation process thereof. Alternatively, the determining apparatus 100 may also determine all the features that will be generated in the machine learning process as features whose generation process is to be visualized. Alternatively, the determining apparatus 100 may identify the feature related to the abnormality or error from the result report of the machine learning model, and thereby determine that the generation process of the feature related to the abnormality or error is to be visualized in order to trace back the cause of the abnormality or error. Alternatively, the user may select a feature to be focused on the display interface of the machine learning platform, and the determining apparatus 100 determines that the generation process of the user-selected feature is to be visualized in response to the selection of the user. The operation of the determining apparatus 100 is not limited to the above example, and the feature to be traced may be determined by other means according to the user's demand.
The interpretation means 101 may parse at least one data processing step of the machine learning process for generating the feature to obtain generation process information of the feature, wherein the generation process information comprises data information and/or processing information of the at least one data processing step. Here, according to actual circumstances, the parsing process may be performed on the corresponding at least one data processing step before, at the same time as, or after the machine learning process is run, so that information such as input, output, intermediate result, processing details, etc. about the at least one data processing step can be acquired. Here, it should be noted that the at least one data processing step parsed by the interpretation means 101 is traced back from the point of view of generating the feature, i.e. the processing object or processing result for which the at least one data processing step is directed can be used directly or indirectly for generating the feature. The following will specifically explain with reference to examples.
The view generation means 102 may generate a process presentation view for depicting a generation process of the feature based on the acquired generation process information. Here, the view generating means 102 may form a process presentation view capable of reflecting the dependency relationship between the data processing steps on the one hand and the data information and/or the processing information of each data processing step itself on the other hand, based on the analyzed information of the respective data processing steps themselves.
The presentation means 103 may present the process presentation view in a graphical manner. Here, the presentation device 103 may present the process presentation view to the user via an output device such as a display (not shown), for example, the presentation device 103 may present the process presentation view in a specific form or effect to help the user understand the generation process of the corresponding feature through the process presentation view.
The process by which the system 10 visualizes the feature generation process in the machine learning process is described in detail below in conjunction with fig. 3 and 4.
In an embodiment of the invention, the machine learning process is set by a user of the machine learning platform. For example, the machine learning process may be represented as a Directed Acyclic Graph (DAG) generated by a user by dragging node modules, wherein the user may configure data and/or operations corresponding to each node module. For another example, the machine learning process may be embodied as computer program code that is written manually by a user.
Fig. 3 shows a flowchart of a method of visualizing a feature generation process in a machine learning process according to an exemplary embodiment of the invention. As shown in fig. 3, the method includes steps S21, S22, S23, and S24.
In step S21, the determination device 100 may determine a feature for which the generation process is to be visualized. Here, the generating process of the feature may include at least one data processing step, which may include a data introducing step, a data cleaning step, a data stitching step, a time-sequential aggregation step, and/or a feature extraction step, etc. The processing results of these data processing steps may be fields related to the extraction process of the feature or a complete output table comprising the fields. The feature generation process in the machine learning process may be selectively visualized according to user requirements or predetermined settings before, while, or after the machine learning process has ended.
For example, before, concurrently with, or after the machine learning process is run, the determining apparatus 100 may determine or identify a feature (i.e., a feature to be traced) for which the generation process is to be visualized, from among features generated by the machine learning process. Alternatively, the determining apparatus 100 may identify important features generated in the machine learning process according to a preset of the machine learning platform to determine the important features as features to be visualized for the generation process thereof. Alternatively, the determining apparatus 100 may also determine all the features generated in the machine learning process as features whose generation process is to be visualized. Alternatively, the determining apparatus 100 may identify the feature related to the abnormality or error from the result report of the machine learning model, and thereby determine that the generation process of the feature related to the abnormality or error is to be visualized in order to trace back the cause of the abnormality or error. Alternatively, the user may select a feature to be focused on the display interface of the machine learning platform, and the determining apparatus 100 determines that the generation process of the user-selected feature is to be visualized in response to the selection of the user. The operation of the determining apparatus 100 is not limited to the above example, and the feature to be traced may be determined by other means according to the user's demand.
Then, in step S22, the interpretation apparatus 101 parses at least one data processing step for generating the feature in the machine learning process to obtain generation process information of the feature, wherein the generation process information includes data information and/or processing information of the at least one data processing step.
Here, according to actual circumstances, the parsing process may be performed on the corresponding at least one data processing step before, at the same time as, or after the machine learning process is run, so that information such as input, output, intermediate result, processing details, etc. about the at least one data processing step can be acquired. Here, it should be noted that the at least one data processing step parsed by the interpretation means 101 is traced back from the point of view of generating the feature, i.e. the processing object or processing result for which the at least one data processing step is directed can be used directly or indirectly for generating the feature. For example, the at least one data processing step may involve a feature extraction process for generating the feature, where the feature extraction process may indicate an extraction process for generating the feature only (without involving extraction processes of other features). For another example, the at least one data processing step may involve a concatenation process for concatenating a data table (which may be a direct source data table or an indirect source data table of the fields on which the feature depends), and the data information associated with the concatenation process may involve all the fields in the data table or may involve only the fields related to the generation of the feature. In this way, the data processing steps related to the feature of interest can be selected among the complex data processing steps of the overall machine learning process to help the user understand the meaning of the feature more clearly.
In step S23, the view generating means 102 may generate a process presentation view for describing the generation process of the feature based on the generation process information. Here, the view generating means 102 may form a process presentation view capable of reflecting the dependency relationship between the data processing steps on the one hand and the data information and/or the processing information of each data processing step itself on the other hand, based on the analyzed information of the respective data processing steps themselves.
The presentation means 103 may present the process presentation view graphically at step S24. Here, the presentation device 103 may present the process presentation view to the user via an output device such as a display (not shown), for example, the presentation device 103 may present the process presentation view in a specific form or effect to help the user understand the generation process of the feature through the presented process presentation view.
Optionally, the data information of the at least one data processing step may comprise information about an input item and/or an output item of the at least one data processing step, and the processing information of the at least one data processing step may comprise information about a processing procedure of the at least one data processing step. Here, as described above, the input or output of the at least one data processing step may relate only to a field related to the extraction operation of the feature, or may relate to a complete output table including the above-mentioned field. In addition, the processing information of the at least one data processing step may relate to a respective processing procedure of each data processing step, which may include at least one sub-step, where the information of each sub-step may be acquired by an parsing process.
Alternatively, the process presentation view may be a flow chart representing a generation process of the feature, wherein nodes in the flow chart may represent input items, output items and/or processing processes of corresponding data processing steps, respectively. Accordingly, the process of graphically presenting the process presentation view may include: presentation device 103 may present information regarding the input items, output items, and/or processes of the corresponding data processing steps in the display controls of each node. Here, each node may have a corresponding display control, which may be a display box having various shapes, within or around which information about the input item, the output item, and/or the process may be further presented. It should be noted that the above information may be displayed directly within or around the display frame; in addition, the information can also be displayed in a hidden manner, so that the related content is displayed after the user performs the corresponding triggering operation (for example, clicks the display control). Here, as an example, which information will be listed in the display control of each node may be preset by the machine learning platform, and the information to be listed in the respective display control may also be set or adjusted according to the selection of the user.
Optionally, the at least one data processing step may comprise a feature extraction step for generating the feature. The data information of the feature extraction step may include information about an input item and/or an output item of the feature extraction step, and the processing information of the feature extraction step may include information about a processing procedure of the feature extraction step. Here, the feature extraction step refers to a process of processing one or more source fields in the corresponding data table according to a specific extraction method to obtain features. As an example, the extraction methods herein include, but are not limited to: such as rounding for numeric fields, logarithmic arithmetic means such as directly taking complete fields as features, conversion means to intercept partial fields (e.g., year portions in complete date fields), feature calculation means such as discretizing continuous value features, combining different features, and the like. Accordingly, the data information may include information about the source field, information about the output characteristics or intermediate results, and/or information about a data table including the source field, etc. The processing information may include information about the various feature extraction means or further refinement operations thereof.
Optionally, the flow chart in the process presentation view may include: a node representing a source field of an input item of the feature extraction step, a node representing an extraction process as a process of the feature extraction step, and/or a node representing the feature as an output item of the feature extraction step. Accordingly, the process of graphically presenting the process presentation view may further comprise: the presentation means 103 may present the name of the source field in a display control of the node representing the source field, the name and/or flow information of the extraction process in a display control of the node representing the extraction process, and/or the name of the feature in a display control of the node representing the feature. According to an exemplary embodiment of the present invention, separate nodes may be provided in the process presentation view to represent corresponding input items, output items, and processes, respectively. That is, to more clearly trace back the critical information involved in the generation of the features, separate display controls may be provided for the critical information corresponding to the individual data processing steps. In the display control, the names of the key information and/or the flow information may be further listed.
Alternatively, the flow information of the extraction process may include names of one or more processing methods applied in the extraction process, and the node representing the extraction process includes child nodes that may represent the one or more processing methods, respectively. Accordingly, the process of graphically presenting the process presentation view may further comprise: the presentation device 103 may present the names of the one or more processing methods in the display control of the child node, respectively. Here, the extraction process may involve one or more processing methods, such as operations that round the numeric field and then log. The processing methods may generally correspond to a sub-flowchart, where each processing method may correspond to a sub-node, a connection relationship between the sub-nodes reflects a dependency relationship between the processing methods, and names of the corresponding processing methods may be listed in display controls of the sub-nodes respectively.
Optionally, the flowchart may further include: a node of a source data table representing the source field. Accordingly, the process of graphically presenting the process presentation view may further comprise: presentation device 103 may present the name of the source data table in a display control representing the node of the source data table. Here, in order to more clearly understand the data involved in the feature generation process, the nodes of the data table where the source field representing the feature is located may be further introduced into the flowchart. That is, in an exemplary embodiment of the present invention, the presentation of the input item may be accomplished by having a plurality of nodes that include relationships or progressive relationships, for example, a data table that is an indirect source of the feature (e.g., a data table in which the source field is located) may be further displayed in addition to the nodes that are the source fields of the direct source of the feature in the flowchart. Here, the name of the source data table and/or other relevant information may be listed in a display control of the source data table.
Optionally, the at least one data processing step may further comprise an upstream processing step of the feature extraction step, wherein the upstream processing step may be used to generate a source data table of the source field. Here, in order to trace back the root source of the feature generation more clearly, the flowchart may further include steps other than the feature extraction step, and these steps may mainly obtain the data table where the source field of the feature is located by means of introducing or stitching.
Optionally, the upstream processing step may include one or more data table stitching steps. Accordingly, the data information of the one or more data table stitching steps may include information regarding an input item and/or an output item of the one or more data table stitching steps, and the processing information of the one or more data table stitching steps may include information regarding a processing procedure of the one or more data table stitching steps. According to an exemplary embodiment of the present invention, the source data table in which the source field of the feature is located may be a final output result of one or more data spellings, as an example, in which case the at least one data processing step displayed in the process presentation view may further include a data table stitching step corresponding to each spellings operation. The parsing process for the data table splicing step can obtain the name of the spliced data table, the field actually spliced in the data table, the name of the data table generated after splicing, the fields included in the data table generated after splicing, and the like, and can also obtain information about the specific splicing process, such as master-slave splicing relationship, alignment fields, and the like when two or more data tables are spliced.
Optionally, the flowchart may further include: nodes representing input data tables as input to the one or more data table stitching steps and/or nodes representing stitching processes as processing of the one or more data table stitching steps. Accordingly, the process of graphically presenting the process presentation view may further comprise: the presentation means 103 may present the names of the input data tables in display controls representing the nodes of the input data tables, respectively, and/or the presentation means 103 may present the names of the splicing process in display controls representing the nodes of the splicing process, respectively. According to an exemplary embodiment of the present invention, the data information and/or processing information of the data table stitching step may be presented in various manners similar to the feature extraction step. Further, as an example, for the case of multiple data stitching, in order to avoid duplication, only a node corresponding to an input item may be set, and no node corresponding to an output item may be set. This is because, in some cases, the input table of the subsequent data splicing step is also the output table of the previous data splicing step, and thus, the above-described manner can avoid the occurrence of duplicate nodes indicating the same data table.
Optionally, the display control of the node corresponding to the feature extraction step, the display control of the node corresponding to the source field, the display control of the node corresponding to the splicing process, the display control of the node corresponding to the source data table and/or the display control of the node corresponding to the input data table have respective forms. For example, at least one of the shape of the display control, the border line type, the border color, the background pattern, the font format in the display control, the font style (e.g., bolded, italic, and/or underlined), the font color, etc., may differ depending on the node corresponding to the different content.
Optionally, the process of graphically presenting the process presentation view may further include: presentation device 103 may, in response to a user selection operation of a particular display control in a process presentation view, list, in a detail display control corresponding to the particular display control, details information regarding an input item, an output item, and/or a process presented in the particular display control. According to an exemplary embodiment of the present invention, in addition to presenting at least a portion of the information of each relevant data processing step by means of the above described flow chart node, further details regarding the input items, output items and/or the processing procedure of each step listed in the flow chart node may be presented in a special detail display control. Here, the detail display control may be disposed around the corresponding display control, or may be arranged at any position in the entire display interface. Further, as another example, the detail display control may be augmented by an original display control, for example, when a user selects a particular display control, the particular display control may be further augmented to accommodate the detail information that needs to be displayed.
Optionally, the detail information about the input item and/or the output item may include at least one of a name corresponding to the input item and/or the output item, a description added by a user, a number of rows of the data table, a number of columns of the data table, a field name of the data table, a field type of the data table, at least a part of data in the data table, and statistical analysis information of data in the data table. The detail information about the process may include at least one of a name corresponding to the process, a description added by a user, code information, and a transformation process of example data. Here, the detail information about the data content may include not only attribute information or statistical information about the data but also at least a part of the example data itself. Further, the details regarding the process may relate to code content, such as configuration or script, related to the data processing, or may further include a process presentation of at least a portion of the example data. By further displaying the detail information corresponding to each display content on the basis of the process display view, the user is helped to intuitively understand various details related to the whole feature generation process in all aspects, and therefore the machine learning process is designed or operated more effectively.
To more intuitively describe the process presentation view, it is assumed that in one embodiment according to the invention, the user is interested in the feature f_trxdate_register_diff generated in the machine learning process, and wishes to further understand the process of generating the feature. A process presentation view depicting the generation process of the features, which may be any one or more features generated in a machine learning process, will be described in detail below with reference to fig. 4, but the invention is not so limited.
Fig. 4 shows a process presentation view of a generation process for depicting features according to an exemplary embodiment of the invention. The process presentation view shown in fig. 4 is generated using the method of visualizing the generation of features according to the present invention.
The left flowchart in fig. 4 is a flowchart in which display controls are connected according to the dependency relationships between corresponding generation process elements, wherein the dependency relationships between display controls are indicated by arrows between the display controls. Herein, the generation process elements include various elements involved in the generation process of the feature, for example, the feature, the process, the processing method in the process, the source field, the source data table, and the input data table.
As shown in fig. 4, the feature name f_trxdate_region_diff of the feature of interest to the user is listed in the display control 401.
The generation process information of the feature can be obtained by parsing the data processing step used for generating the feature in the machine learning process. From the generation process information, it may be determined that the feature is generated by a feature extraction step. By analyzing the feature extraction step, data information and/or processing information of the feature extraction step can be obtained. The data information of the feature extraction step may comprise information about input items and/or output items of the feature extraction step. The processing information of the feature extraction step may include information on how to generate the feature f_trxdate_register_diff based on the source field.
In the embodiment shown in fig. 4, the data information of the feature extraction step may include names trx_date and register_date of source fields as input items of the feature extraction step, and a feature name f_trxdate_register_diff of the feature as an output item of the feature extraction step. Further, the processing information of the feature extraction step may include information on the extraction processing procedure of the feature extraction step, that is, may include the name and/or flow information of the feature extraction step. In this embodiment, by parsing the feature extraction step, the extraction process for generating the feature may be determined:
f_trxdate_registerdate_diff=discrete(lineartrans(datediff(trx_date,register_date),"0.01","0"))
Wherein datediff, lineartrans, ("0.01", "0") and discrete are names of processing methods applied during the extraction processing, respectively, and the execution order of the processing methods is datediff→linear, ("0.01", "0") →discrete. Such information may be included in the flow information of the feature extraction step.
As shown in fig. 4, the name (FE) and flow information of the extraction process are presented in a display control 402. Alternatively, the flow information may be presented through a sub-flow diagram constituted by display controls of the sub-nodes. Names of corresponding processing methods are shown in display controls 402a, 402b, and 402c, respectively. In addition, the names of the corresponding source fields may also be presented in display control 402 or display controls 403 and 404 upstream of display control 402a, respectively.
Optionally, a source data table and/or a generation process of the source field may be further shown. And generating the source data table through a data table splicing step according to the generation process information of the features. Optionally, the input data table and the splicing process of the data table splicing step may be further shown. The source data table of the source field may be the output table of the data table stitching step (not shown in the example of fig. 4).
As shown in fig. 4, in display control 405 upstream of display controls 403 and 404, the name sql 01_join_fraud of the splicing process of the data sheet splicing step is shown. In display controls 406 and 407 upstream of display control 405, names cmb0404_app_trx_detail and cmb0404_fraud of the input data table of the data table stitching step are shown, respectively.
The generation of a particular feature (named f_trxdate_register_diff) can be intuitively understood from the left-hand flow chart in fig. 4: two input data tables (named cmb0404_app_trx_detail and cmb0404_fraud, respectively) are input to a data table splicing step (the name of a splicing process is sql: 01_join_fraud) to perform data table splicing; performing a feature extraction step after the data table stitching step, and only two fields (named trx_date and register_date, respectively) in the output data table of the data table stitching step are associated with the extraction process of the specific feature, which may be referred to as source fields; then, the specific feature is generated by applying a plurality of processing methods (datediff → linear, ("0.01", "0") → discrete) to the source field during the extraction processing of the feature extraction step.
Alternatively, the display controls may have different respective morphologies according to the types of the corresponding generation process elements. For example, as shown in FIG. 4, display controls 406 and 407, corresponding to an input data table, may be presented as oval controls; the display control 405 corresponds to a stitching process and may be presented as a rectangular control; display controls 403 and 404 correspond to source fields and may be presented as parallelogram controls; display control 402 corresponds to the extraction process and contains display controls 402a, 402b, and 402c corresponding to the process method, and thus display control 402 may appear as a rectangular control embedded with a plurality of oval controls, display controls 402a, 402b, and 402c, respectively; the display control 401 corresponds to a particular feature and may be presented as a rounded rectangle control.
Optionally, the differences in morphology are not limited to differences in shape of the display control, but may include differences in at least one of shape of the display control, bezel linearity, bezel color, background pattern, font format in the display control, font style (e.g., bolded, italic, and/or underlined), font color, and the like.
A process presentation view according to the present invention may comprise only the left-hand flow chart in fig. 4. Additionally, as an alternative, in response to a user selection operation on a specific display control in the flowchart, detailed information about an input item, an output item and/or a processing procedure displayed in the specific display control may be listed in a detailed display control corresponding to the specific display control.
As shown in fig. 4, if a user clicks on display control 406, a corresponding detail display control 506 may be generated and presented. The name of the input data table (cmb0404_app_trx_details), the description added by the user (transaction table), the number of rows and columns of the input data table (80000 rows 18 columns) corresponding to the display control 406 are listed in the detail display control 506.
If the user clicks on the display control 405, a corresponding detail display control 505 may be generated and presented. The name of the splice processing procedure (sql: 01_join_fraud5), the description of the user addition (splice transaction table and determined risk transaction, generate label field flag), code information (1 st-4 th line code), the number of lines and columns of the output data table (80000 lines 18 columns) corresponding to the display control 405 are listed in the detail display control 505.
If the user clicks on display control 403, a corresponding detail display control 503 may be generated and presented. The detail display control 503 lists data statistics of the source field corresponding to the display control 403, where the data statistics may include summary, statistics, high frequency values, and the like.
If the user clicks on display control 402a, a corresponding detail display control 502a may be generated and presented. A transformation procedure of example data generated due to a processing method (named DateDiff) corresponding to the display control 402a is listed in the detail display control 502a, for example, a transformation procedure from input example data (data of trx_date and register_date fields, respectively) to output example data (processing result corresponding to the DateDiff processing method) in which a field type of the output example data is integer (Int) is listed in the detail display control 502a. Thus schematically illustrating the processing of data by a processing method (named datediff). Here, it should be understood that a portion of the example data record may be displayed through a transformation process of some or all of the feature extraction steps.
In addition, an entry for quickly entering a data preview and/or an entry for quickly entering a program configuration of a processing procedure can be further arranged in each detail display control.
The process presentation view according to the present invention is not limited to the example shown in fig. 4, in which more or less generated process information may be presented for a specific feature according to user needs or settings. For example, only the related information of the processing procedure of directly generating the specific feature may be displayed, the related information of the entire generation procedure from the introduction of the original data until the generation of the specific feature may be displayed, or the related information of a part of the generation procedure in the entire generation procedure may be displayed in detail while the related information of the remaining generation procedure may be simplified or omitted.
On the other hand, the respective means included in the system 10 for visualizing the feature generation process in the machine learning process according to the exemplary embodiment of the present invention may also be implemented by hardware, software, firmware, middleware, microcode, or any combination thereof.
When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the corresponding operations may be stored in a computer-readable medium, such as a storage medium, so that the processor can perform the corresponding operations by reading and executing the corresponding program code or code segments. For example, exemplary embodiments of the present invention may be implemented as a computer-readable medium on which a computer program for executing a method of visualizing a feature generation process in a machine learning process by one or more processors is recorded.
As another example, exemplary embodiments of the invention may also be implemented as a computing device for visualizing a feature generation process in a machine learning process, the computing device comprising one or more storage devices and one or more processors, wherein a set of computer-executable instructions is stored in the one or more storage devices, which when executed by the one or more processors, perform a method for performing the visualization of a feature generation process in a machine learning process.
In particular, the computing devices may be deployed in servers or clients, as well as on node devices in a distributed network environment. Further, the computing device may be a PC computer, tablet device, personal digital assistant, smart phone, web application, or other device capable of executing the above-described set of instructions.
Here, the computing device need not be a single computing device, but may be any device or collection of circuits capable of executing the above-described instructions (or instruction set) alone or in combination. The computing device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with locally or remotely (e.g., via wireless transmission).
In the computing device, the processor may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
Some of the operations described in the method of visualizing a feature generation process in a machine learning process according to an exemplary embodiment of the present invention may be implemented in software, some of the operations may be implemented in hardware, and furthermore, the operations may be implemented in a combination of software and hardware.
The processor may execute instructions or code stored in one of the memory devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.
The storage device may be integral to the processor, for example, RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage devices may include stand-alone devices, such as external disk drives, storage arrays, or other storage devices usable by any database system. The storage device and the processor may be operatively coupled or may communicate with each other, such as through an I/O port, network connection, etc., such that the processor is able to read files stored in the storage device.
In addition, the computing device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the computing device may be connected to each other via buses and/or networks.
Operations involved in a method of visualizing feature generation processes in a machine learning process according to exemplary embodiments of the present invention may be described as various interconnected or coupled functional blocks or functional diagrams. However, these functional blocks or functional diagrams may be equally integrated into a single logic device or operate at non-exact boundaries.
The foregoing description of exemplary embodiments of the invention has been presented only to be understood as illustrative and not exhaustive, and the invention is not limited to the exemplary embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. Therefore, the protection scope of the present invention shall be subject to the scope of the claims.

Claims (22)

1. A method of visualizing a feature generation process in a machine learning process, comprising:
Determining a feature to be traced for visualizing the generation process thereof;
parsing at least one data processing step of the machine learning process for generating the feature to obtain generation process information of the feature, wherein the generation process information comprises data information and/or processing information of the at least one data processing step, the data information of the at least one data processing step comprises information about an input item and/or an output item of the at least one data processing step, and the processing information of the at least one data processing step comprises information about a processing process of the at least one data processing step;
generating a process presentation view for depicting a generation process of the feature based on the generation process information, wherein the process presentation view is a flow chart representing the generation process of the feature, nodes in the flow chart representing input items, output items and/or processing processes of corresponding data processing steps, respectively; and
graphically presenting the process presentation view, wherein graphically presenting the process presentation view comprises: displaying information about the input items, the output items and/or the processing procedures of the corresponding data processing steps in a display control of each node; and responding to the selection operation of a user on a specific display control in a process display view, listing the detailed information about the input item, the output item and/or the processing process displayed in the specific display control in the detailed display control corresponding to the specific display control, wherein an entry for quickly entering a data preview and/or an entry for quickly entering the program configuration of the processing process are arranged in each detailed display control.
2. The method of claim 1, wherein the at least one data processing step includes a feature extraction step for generating the feature, and,
the data information of the feature extraction step comprises information about input items and/or output items of the feature extraction step,
the processing information of the feature extraction step includes information on a processing procedure of the feature extraction step.
3. The method of claim 2, wherein the flow chart comprises:
a node representing a source field of an input item of the feature extraction step, a node representing an extraction process as a process of the feature extraction step, and/or a node representing the feature as an output item of the feature extraction step, and,
the process of graphically presenting the process presentation view further includes: the name of the source field is presented in a display control of the node representing the source field, the name and/or flow information of the extraction process is presented in a display control of the node representing the extraction process, and/or the name of the feature is presented in a display control of the node representing the feature.
4. The method of claim 3, wherein the flow information of the extraction process includes names of one or more processes applied in the extraction process,
the nodes representing the extraction process include sub-nodes respectively representing the one or more processes,
the process of graphically presenting the process presentation view further includes: and respectively displaying names of the one or more processing methods applied in the extraction processing process in the display control of the child node.
5. The method of claim 4, wherein the flow chart further comprises: a node of a source data table representing the source field, and,
the process of graphically presenting the process presentation view further includes: the name of the source data table is shown in a display control representing the node of the source data table.
6. The method of claim 5, wherein the at least one data processing step further comprises an upstream processing step of a feature extraction step, wherein the upstream processing step is used to generate a source data table of the source field.
7. The method of claim 6, wherein the upstream processing step comprises one or more data sheet stitching steps and,
The data information of the one or more data table stitching steps comprises information about the input items and/or the output items of the one or more data table stitching steps,
the processing information of the one or more data sheet stitching steps includes information regarding the processing of the one or more data sheet stitching steps.
8. The method of claim 7, wherein,
the flow chart further includes: nodes representing input data tables as input items to the one or more data table stitching steps and/or nodes representing stitching processes as processes of the one or more data table stitching steps, and,
the process of graphically presenting the process presentation view further includes: the names of the input data tables are respectively displayed in display controls of nodes representing the input data tables, and/or the names of the splicing processing procedures are respectively displayed in display controls of nodes representing the splicing processing procedures.
9. The method of claim 8, wherein the display control of the node corresponding to the feature, the display control of the node corresponding to the feature extraction step, the display control of the node corresponding to the source field, the display control of the node corresponding to the stitching process, the display control of the node corresponding to the source data table, and/or the display control of the node corresponding to the input data table each have a respective modality.
10. The method of claim 1, wherein the detail information about the input item and/or the output item includes at least one of a name corresponding to the input item and/or the output item, a user-added description, a number of rows of the data table, a number of columns of the data table, a field name of the data table, a field type of the data table, at least a portion of data in the data table, statistical analysis information of data in the data table, and statistical analysis information of data of the field,
the detail information about the process includes at least one of a name corresponding to the process, a description added by a user, code information, and a transformation process of example data.
11. A computer readable medium on which a feature generation process in a machine learning process is recorded for execution by one or more processors of a method of visualizing a feature generation process in a machine learning process as claimed in any one of claims 1 to 10.
12. A computing device for visualizing a feature generation process in a machine learning process, comprising one or more storage devices and one or more processors, wherein a set of computer-executable instructions is stored in the one or more storage devices that, when executed by the one or more processors, perform the method of visualizing a feature generation process in a machine learning process as recited in any one of claims 1 to 10.
13. A system for visualizing a feature generation process in a machine learning process, comprising:
a determining device configured to determine a feature to be traced for visualizing a generation process thereof;
interpretation means configured to parse at least one data processing step of the machine learning process for generating the feature to obtain generation process information of the feature, wherein the generation process information comprises data information and/or processing information of the at least one data processing step, the data information of the at least one data processing step comprises information about an input item and/or an output item of the at least one data processing step, the processing information of the at least one data processing step comprises information about a processing procedure of the at least one data processing step;
view generation means configured to generate a process presentation view for depicting a generation process of the feature based on the generation process information, wherein the process presentation view is a flowchart representing the generation process of the feature, nodes in the flowchart representing input items, output items, and/or processing processes of corresponding data processing steps, respectively; and
A presentation means configured to graphically present the process presentation view, wherein information about input items, output items and/or processes of a corresponding data processing step is presented in a display control of each node of the process presentation view; and responding to the selection operation of a user on a specific display control in a process display view, listing the detailed information about the input item, the output item and/or the processing process displayed in the specific display control in the detailed display control corresponding to the specific display control, wherein an entry for quickly entering a data preview and/or an entry for quickly entering the program configuration of the processing process are arranged in each detailed display control.
14. The system of claim 13, wherein the at least one data processing step includes a feature extraction step for generating the feature, and,
the data information of the feature extraction step comprises information about input items and/or output items of the feature extraction step,
the processing information of the feature extraction step includes information on a processing procedure of the feature extraction step.
15. The system of claim 14, wherein,
The flow chart includes: a node representing a source field of an input item of the feature extraction step, a node representing an extraction process as a process of the feature extraction step, and/or a node representing the feature as an output item of the feature extraction step, and,
the display device is further configured to: the name of the source field is presented in a display control of the node representing the source field, the name and/or flow information of the extraction process is presented in a display control of the node representing the extraction process, and/or the name of the feature is presented in a display control of the node representing the feature.
16. The system of claim 15, wherein the flow information of the extraction process includes names of one or more processes applied in the extraction process,
the nodes representing the extraction process include sub-nodes respectively representing the one or more processes,
the display device is further configured to: and respectively displaying the names of the one or more processing methods in the display control of the child node.
17. The system of claim 16, wherein the flow chart further comprises: a node of a source data table representing the source field, and,
The display device is further configured to: the name of the source data table is shown in a display control representing the node of the source data table.
18. The system of claim 17, wherein the at least one data processing step further comprises an upstream processing step of a feature extraction step, wherein the upstream processing step is used to generate a source data table of the source fields.
19. The system of claim 18, wherein the upstream processing step comprises one or more data sheet stitching steps and,
the data information of the one or more data table stitching steps comprises information about the input items and/or the output items of the one or more data table stitching steps,
the processing information of the one or more data sheet stitching steps includes information regarding the processing of the one or more data sheet stitching steps.
20. The system of claim 19, wherein the flow chart further comprises:
nodes representing input data tables as input items to the one or more data table stitching steps and/or nodes representing stitching processes as processes of the one or more data table stitching steps, and,
The display device is further configured to: the names of the input data tables are respectively displayed in display controls of nodes representing the input data tables, and/or the names of the splicing processing procedures are respectively displayed in display controls of nodes representing the splicing processing procedures.
21. The system of claim 20, wherein the display control of the node corresponding to the feature, the display control of the node corresponding to the feature extraction step, the display control of the node corresponding to the source field, the display control of the node corresponding to the stitching process, the display control of the node corresponding to the source data table, and/or the display control of the node corresponding to the input data table each have a respective modality.
22. The system of claim 13, wherein the detail information about the input item and/or the output item includes at least one of a name corresponding to the input item and/or the output item, a user-added description, a number of rows of the data table, a number of columns of the data table, a field name of the data table, a field type of the data table, at least a portion of data in the data table, statistical analysis information of data in the data table, and statistical analysis information of data of the field,
The detail information about the process includes at least one of a name corresponding to the process, a description added by a user, code information, and a transformation process of example data.
CN201810941689.6A 2018-08-17 2018-08-17 Method and system for visualizing feature generation process in machine learning process Active CN110209902B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810941689.6A CN110209902B (en) 2018-08-17 2018-08-17 Method and system for visualizing feature generation process in machine learning process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810941689.6A CN110209902B (en) 2018-08-17 2018-08-17 Method and system for visualizing feature generation process in machine learning process

Publications (2)

Publication Number Publication Date
CN110209902A CN110209902A (en) 2019-09-06
CN110209902B true CN110209902B (en) 2023-11-14

Family

ID=67779990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810941689.6A Active CN110209902B (en) 2018-08-17 2018-08-17 Method and system for visualizing feature generation process in machine learning process

Country Status (1)

Country Link
CN (1) CN110209902B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111047046A (en) * 2019-11-01 2020-04-21 东方微银科技(北京)有限公司 Visual generation method and equipment of machine learning model
CN112256537B (en) * 2020-11-12 2024-03-29 腾讯科技(深圳)有限公司 Model running state display method and device, computer equipment and storage medium
CN112434032B (en) * 2020-11-17 2024-04-05 北京融七牛信息技术有限公司 Automatic feature generation system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677353A (en) * 2016-01-08 2016-06-15 北京物思创想科技有限公司 Feature extraction method and machine learning method and device thereof
CN107169575A (en) * 2017-06-27 2017-09-15 北京天机数测数据科技有限公司 A kind of modeling and method for visualizing machine learning training pattern
CN108228861A (en) * 2018-01-12 2018-06-29 第四范式(北京)技术有限公司 For performing the method and system of the Feature Engineering of machine learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10572822B2 (en) * 2016-07-21 2020-02-25 International Business Machines Corporation Modular memoization, tracking and train-data management of feature extraction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677353A (en) * 2016-01-08 2016-06-15 北京物思创想科技有限公司 Feature extraction method and machine learning method and device thereof
CN107169575A (en) * 2017-06-27 2017-09-15 北京天机数测数据科技有限公司 A kind of modeling and method for visualizing machine learning training pattern
CN108228861A (en) * 2018-01-12 2018-06-29 第四范式(北京)技术有限公司 For performing the method and system of the Feature Engineering of machine learning

Also Published As

Publication number Publication date
CN110209902A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
JP7398068B2 (en) software testing
US8261233B2 (en) System and method for synchronized workflow management
WO2019129060A1 (en) Method and system for automatically generating machine learning sample
EP3695310A1 (en) Blackbox matching engine
CN108228861B (en) Method and system for performing feature engineering for machine learning
CN110209902B (en) Method and system for visualizing feature generation process in machine learning process
US20200313994A1 (en) Automated analysis and recommendations for highly performant single page web applications
US11347864B2 (en) Ace: assurance, composed and explained
CN110968294B (en) Business domain model establishing system and method
Bors et al. Visual interactive creation, customization, and analysis of data quality metrics
US20180232299A1 (en) Composing future tests
CN110618926A (en) Source code analysis method and source code analysis device
US11119899B2 (en) Determining potential test actions
CN110188886B (en) Method and system for visualizing data processing steps of a machine learning process
CN111523676A (en) Method and device for assisting machine learning model to be online
Jiang et al. Log-it: Supporting Programming with Interactive, Contextual, Structured, and Visual Logs
CN115454702A (en) Log fault analysis method and device, storage medium and electronic equipment
CN108885574B (en) System for monitoring and reporting performance and correctness issues at design, compilation, and runtime
CN113157741B (en) Service state visualization method and device based on dimension conversion and electronic equipment
CN110472292B (en) Industrial equipment data simulation configuration system and method
CN111444170B (en) Automatic machine learning method and equipment based on predictive business scene
Kulkarni et al. Novel Approach to Abstract the Data Flow Diagram from Java Application Program
CN108960433B (en) Method and system for running machine learning modeling process
US20220350730A1 (en) Test data generation apparatus, test data generation method and program
Weber et al. Detecting inconsistencies in multi-view uml models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant