CN112883242A - Tree-shaped machine learning model visualization method and device, electronic equipment and storage medium - Google Patents

Tree-shaped machine learning model visualization method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112883242A
CN112883242A CN202110462855.6A CN202110462855A CN112883242A CN 112883242 A CN112883242 A CN 112883242A CN 202110462855 A CN202110462855 A CN 202110462855A CN 112883242 A CN112883242 A CN 112883242A
Authority
CN
China
Prior art keywords
model
data
node
tree
tree structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110462855.6A
Other languages
Chinese (zh)
Inventor
王小东
吕文勇
周智杰
杨军
赵小诣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu New Hope Finance Information Co Ltd
Original Assignee
Chengdu New Hope Finance Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu New Hope Finance Information Co Ltd filed Critical Chengdu New Hope Finance Information Co Ltd
Priority to CN202110462855.6A priority Critical patent/CN112883242A/en
Publication of CN112883242A publication Critical patent/CN112883242A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a visualization method and device for a tree-shaped machine learning model, electronic equipment and a storage medium, and relates to the technical field of machine learning. The method comprises the following steps: obtaining a model file; extracting tree structure data of the model file, and combining the tree structure data into appointed Json format data, wherein the appointed Json format data comprises node data of the tree structure data, and the node data comprises whether each node is a leaf node, a node name and a node rule; and performing tree display and rendering on the data in the designated Json format on a page through D3. js. The method can analyze the tree structure of the trained model to form Json, load the Json by using D3 and visualize the model so as to better understand the model and improve the interpretability of the model.

Description

Tree-shaped machine learning model visualization method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of machine learning, in particular to a tree-shaped machine learning model visualization method and device, electronic equipment and a storage medium.
Background
Benefiting from the rapid development of the artificial intelligence technology, many practical application problems can be solved based on the artificial intelligence technology, such as risk assessment, user credit scoring, user repayment willingness prediction, user fund adequacy prediction, user risk prediction, user suspected compensation identification and the like in the financial field. The application problems can be well solved by using machine learning algorithms, the machine learning algorithms are various, and the algorithms which are better represented in various fields at present are mainly tree models, such as decision trees, random forests, GBDT, Xgboost, Catboost and the like. The advantage of these models is that they can be used for both regression and classification, do not require feature scaling, and have good interpretability. The visualization means is not only a good way to understand the model, but also an advantageous tool to introduce to others the mechanism of operation of your model. However, it is a significant topic to visually analyze the tree model so that more people can understand the decision-making mechanism of the tree model.
In particular, decision trees, random forests, Xgboost, CatBoost, GBDT models are the most common tree-like machine learning models, because these models are not black box models, they can be understood through visualization, and are all tree-like structures, which can be demonstrated using some means of tree visualization. Most machine learning packages in Python output a model file, and model visualization can be performed by using a Graphviz tool, but most of the machine learning packages are static images, lack of interactive function, sometimes have large tree models and incomplete display, and cannot see specific information of each node. Meanwhile, how to perform structured tree analysis on a model file to form a recursive Json file is also a difficult problem, so that the visual display effect of the existing machine model is poor.
Disclosure of Invention
In view of the above, an object of the embodiments of the present application is to provide a method and an apparatus for visualizing a tree-shaped machine learning model, an electronic device, and a storage medium, so as to solve the problem in the prior art that the visualization display effect of the existing machine model is not good.
The embodiment of the application provides a visualization method for a tree-shaped machine learning model, which comprises the following steps: obtaining a model file; extracting tree structure data of the model file, and combining the tree structure data into appointed Json format data, wherein the appointed Json format data comprises node data of the tree structure data, and the node data comprises whether each node is a leaf node, a node name and a node rule; and performing tree display and rendering on the data in the designated Json format on a page through D3. js.
In the implementation mode, the tree structure of the model is analyzed to form Json format data, Json format data are loaded by using D3, the model is visualized, the model visualization efficiency is improved, a user can better understand the model, the interpretability of the model is improved, and whether each node is a leaf node, a node name and a node rule is displayed on a page, so that the visualization display information integrity of the model is improved.
Optionally, the obtaining the model file includes: obtaining Csv format model training data; setting model parameters according to the model type; and performing model training based on the model training data and the model parameters to obtain the model file.
In the implementation mode, the model can be automatically trained in a subsequent Python environment through Csv format model training data and model parameters, so that the model training efficiency is improved.
Optionally, the performing model training based on the model training data and the model parameters includes: reading the model training data by using a file reading function read _ csv of Python; converting the model training data into dataFrame format data, and extracting a characteristic X and a label column Y in the dataFrame format data; obtaining the model parameters, and constructing a model based on the model parameters; and substituting the characteristic X and the label column Y into a model.
In the implementation mode, the Python is used for reading model data, the dataFrame format data is used for extracting features and label columns, and the features and the label columns are substituted into a model.
Optionally, the extracting tree structure data of the model file includes: obtaining a tree list of models in the model file by using models; analyzing each tree in the model in a recursive manner based on the tree list to obtain model data, wherein the model data comprises tree information, characteristics, labels and/or tree indexes; and assembling the model data into the tree structure data.
In the implementation manner, the tree structure data of the model file is analyzed to determine information required by the tree structure visual display, such as tree information, characteristics, tags and/or tree indexes, and the information is assembled into basic tree structure data for subsequent tree structure display, so that a data base is provided for the subsequent tree structure visual display.
Optionally, the performing tree-shaped display and rendering on the data in the designated Json format on a page through d3.js includes: applying the tree pattern of D3.js to a page container of the page through the tree structure layout of the D3. js; constructing nodes and connecting lines of the D3.js based on the specified Json format data; defining a display attribute of each node; and performing tree display and rendering on the page based on the nodes, the connecting lines and the display attributes of the D3. js.
In the implementation mode, the designated Json format data is visually displayed and rendered in a tree structure through D3.js, the model, the internal structure of the model and the internal parameters can be displayed, the visual display of the tree machine learning model based on the designated Json format data is realized, and the visual display integrity of the tree machine learning model is improved.
Optionally, after the defining the display attribute of each node, the method further includes: and adding a mouse moving-in and removing event to each node, wherein the mouse moving-in and removing event is used for expanding and displaying the content when the content is overlong.
In the implementation mode, by adding the mouse moving-in removal event, more detailed information such as a decision principle, a threshold, characteristics, a sample and the like of each node of the model can be displayed during visual display, so that the data integrity of the visual display is improved.
Optionally, after the defining the display attribute of each node, the method further includes: and adding a click event to each node, wherein the click event is used for contracting or expanding the lower-layer node when the node is a father node.
In the implementation mode, the click events are added to the nodes, the models can be visualized in different angles and different dimensions, more contents can be contained and displayed while the overall simplicity of the visual display of the models is not affected, and the data integrity of the visual display is further improved.
The embodiment of the application provides a visual device of tree machine learning model, visual device of tree machine learning model includes: the model file acquisition module is used for acquiring a model file; the data extraction module is used for extracting tree structure data of the model file and combining the tree structure data into appointed Json format data, the appointed Json format data comprises node data of the tree structure data, and the node data comprises whether each node is a leaf node, a node name and a node rule; and the visual rendering module is used for performing tree display and rendering on the data in the designated Json format on a page through D3. js.
In the implementation mode, the tree structure of the model is analyzed to form Json format data, Json format data are loaded by using D3, the model is visualized, the model visualization efficiency is improved, a user can better understand the model, the interpretability of the model is improved, and whether each node is a leaf node, a node name and a node rule is displayed on a page, so that the visualization display information integrity of the model is improved.
Optionally, the model file obtaining module is specifically configured to: obtaining Csv format model training data; setting model parameters according to the model type; and performing model training based on the model training data and the model parameters to obtain the model file.
In the implementation mode, the model can be automatically trained in a subsequent Python environment through Csv format model training data and model parameters, so that the model training efficiency is improved.
Optionally, the model file obtaining module is specifically configured to: reading the model training data by using a file reading function read _ csv of Python; converting the model training data into dataFrame format data, and extracting a characteristic X and a label column Y in the dataFrame format data; obtaining the model parameters, and constructing a model based on the model parameters; and substituting the characteristic X and the label column Y into a model.
In the implementation mode, the Python is used for reading model data, the dataFrame format data is used for extracting features and label columns, and the features and the label columns are substituted into a model.
Optionally, the data extraction module is specifically configured to: obtaining a tree list of models in the model file by using models; analyzing each tree in the model in a recursive manner based on the tree list to obtain model data, wherein the model data comprises tree information, characteristics, labels and/or tree indexes; and assembling the model data into the tree structure data.
In the implementation manner, the tree structure data of the model file is analyzed to determine information required by the tree structure visual display, such as tree information, characteristics, tags and/or tree indexes, and the information is assembled into basic tree structure data for subsequent tree structure display, so that a data base is provided for the subsequent tree structure visual display.
Optionally, the visualization rendering module is specifically configured to: applying the tree pattern of D3.js to a page container of the page through the tree structure layout of the D3. js; constructing nodes and connecting lines of the D3.js based on the specified Json format data; defining a display attribute of each node; and performing tree display and rendering on the page based on the nodes, the connecting lines and the display attributes of the D3. js.
In the implementation mode, the designated Json format data is visually displayed and rendered in a tree structure through D3.js, the model, the internal structure of the model and the internal parameters can be displayed, the visual display of the tree machine learning model based on the designated Json format data is realized, and the visual display integrity of the tree machine learning model is improved.
Optionally, the visualization rendering module is specifically configured to: and adding a mouse moving-in and removing event to each node, wherein the mouse moving-in and removing event is used for expanding and displaying the content when the content is overlong.
In the implementation mode, by adding the mouse moving-in removal event, more detailed information such as a decision principle, a threshold, characteristics, a sample and the like of each node of the model can be displayed during visual display, so that the data integrity of the visual display is improved.
Optionally, the visualization rendering module is specifically configured to: and adding a click event to each node, wherein the click event is used for contracting or expanding the lower-layer node when the node is a father node.
In the implementation mode, the click events are added to the nodes, the models can be visualized in different angles and different dimensions, more contents can be contained and displayed while the overall simplicity of the visual display of the models is not affected, and the data integrity of the visual display is further improved.
An embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and the processor executes steps in any one of the above implementation manners when reading and executing the program instructions.
The embodiment of the present application further provides a readable storage medium, in which computer program instructions are stored, and the computer program instructions are read by a processor and executed to perform the steps in any of the above implementation manners.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a tree machine learning model visualization method according to an embodiment of the present disclosure.
Fig. 2 is a schematic flowchart of a process for obtaining a model file by model training according to an embodiment of the present application.
Fig. 3 is a schematic flowchart of a model training step according to an embodiment of the present disclosure.
Fig. 4 is a schematic flowchart of a step of parsing model file data according to an embodiment of the present application.
Fig. 5 is a schematic diagram of a visualization step provided in an embodiment of the present application.
Fig. 6 is a flowchart illustrating a visualization step according to an embodiment of the present application.
Fig. 7 is a schematic diagram illustrating display attributes of random forest nodes according to an embodiment of the present application.
Fig. 8 is a schematic block diagram of a tree machine learning model visualization apparatus according to an embodiment of the present disclosure.
Icon: 20-tree machine learning model visualization means; 21-a model file acquisition module; 22-a data extraction module; 23-visualization rendering module.
Detailed Description
The technical solution in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
An embodiment of the present application provides a tree machine learning model visualization method, please refer to fig. 1, where fig. 1 is a flowchart illustrating the tree machine learning model visualization method provided in the embodiment of the present application, and the specific steps of the tree machine learning model visualization method may be as follows:
step S12: and obtaining a model file.
Optionally, the model file for visualization in this embodiment may be a trained tree-shaped machine learning model, or may be trained to obtain a tree-shaped machine learning model.
Referring to fig. 2, fig. 2 is a schematic flowchart of a process for obtaining a model file by model training according to an embodiment of the present application, where the steps may be as follows:
step S121: model training data in Csv format is obtained.
Wherein, the file of Csv (Comma-Separated Values, Comma Separated Values or character Separated Values) stores table data (numbers and texts) in plain text form. And the Csv format model training data is adopted for subsequent model training, and the Python environment can be directly input during the subsequent model training, so that the model training efficiency is improved.
Step S122: and setting model parameters according to the model type.
At present, mainstream tree models comprise decision trees, random forests, Xgboost, GBDT, Catboost and the like, the tree models are different, the parameter formats and the forms of the tree models are also different, and different model parameters can be set based on different models. For example, the decision tree has 9 parameters, criterion represents feature selection criterion, split represents feature partition criterion, max _ depth represents maximum depth of the decision tree, min _ importance _ hierarchy represents minimum impure degree of node partition, min _ samples _ split represents minimum number of samples required for inner node repartitioning, min _ samples _ leaf represents minimum number of samples of leaf node, max _ leaf _ nodes represents maximum number of leaf nodes, min _ importance _ split represents threshold of information gain, min _ weight _ fraction _ leaf represents minimum sample weight of leaf node and class _ weight represents class weight.
Step S123: model training is performed based on the model training data and the model parameters to obtain a model file.
Only after the model is trained, there is a model file, please refer to fig. 3, where fig. 3 is a schematic flow chart of a model training step provided in the embodiment of the present application, and the model training step may specifically be as follows:
step S1231: the model training data is read by using a file reading function read _ csv of Python.
Optionally, the model training data may be obtained and stored locally, the step reads the model training data from the local, the model training data may be in other server databases, or the step reads the model training data through a network.
Step S1232: and converting the model training data into dataFrame format data, and extracting the characteristic X and the label column Y in the dataFrame format data.
The dataFrame is a Distributed data set organized in a named column manner in a Python environment, and may be constructed according to a structured data file, a hive table, an external database, or an already existing RDD (flexible Distributed data set). The dataFrame format data can be read by rows and columns, so that the dataFrame format is favorable for subsequent extraction of the features X and the label columns Y.
Where feature X generally represents model input feature data and label column Y is generally the labels of the training set.
Step S1233: and obtaining model parameters, and constructing a model based on the model parameters.
Specifically, building a model based on model parameters may be performed by the following example code:
model=RandomForestClassifier(max_depth=max_depth,
min_samples_split=min_samples_split,random_state=0)。
and step S1234, substituting the feature X and the label column Y into a model.
Where the function of the model.fit function is a model fit, the model.fit function substituted into feature X and label column Y can alternatively be represented as model.fit (X, Y).
After the model is trained, the model file is a black box, if no visualization means is available, it is difficult to understand what the model is, what the model is to make a decision, and no explanatory property exists. Before the model file is visualized, the model needs to be analyzed, the tree structure of the model is extracted, and the model is assembled into a certain Json format to be visualized conveniently, so that the step S14 is executed next.
Step S14: and extracting tree structure data of the model file, and combining the tree structure data into data in a specified Json format.
Specifically, referring to fig. 4, fig. 4 is a schematic flowchart of a step of analyzing model file data provided in the embodiment of the present application, where the step of analyzing model file data may specifically be as follows:
step S141: models _ are used to obtain a tree list of models in the model file.
The tree list is a tree structure list frame in the model file.
Step S142: each tree in the model is parsed in a recursive manner based on the tree list to obtain model data.
The model data includes tree information, features, labels, and/or indices of the tree, etc.
Specifically, the parsing code in this embodiment may be as follows:
def analysisTree(tree, features, labels, agg, node_index = 0):
if tree.children_left[node_index] == -1: # indicates leaf
# agg['leaf'] = agg.get('leaf', 0) + 1
agg['name'] = agg.get('name', {})
agg['name']['leaf'] = agg['name'].get('leaf', 0) + 1
# to avoid setting the flag when it was set to false before
if agg.get('isLeaf', True): # if not set before set flag to true
agg['isLeaf'] = True
else: # if false before set to false
agg['isLeaf'] = False
else:
feature = features[tree.feature[node_index]]
threshold = tree.threshold[node_index]
agg['name'] = agg.get('name', {})
agg['name'][feature] = agg['name'].get(feature, 0) + 1
agg['isLeaf'] = False
left_index = tree.children_left[node_index]
right_index = tree.children_right[node_index]
children = agg.get('children', [{}, {}])
agg['children'] = [analysisTree(tree, features, labels, children[0], right_index),
analysisTree(tree, features, labels, children[1], left_index)]
return agg。
step S143: and assembling the model data into tree-structured data.
The tree structure data may be used as model data in this embodiment.
Illustratively, the assembled tree structure data may be as follows:
"chidren":[
{
"chidren":[
Object{…},
Object{…},
]
"isLeaf":false,
"name": feature A ",
"rule":[
"feature A",
4.8500001430511475
]
},
Object{…}
],
"isLeaf":false,
"name": feature B ",
"rule":[
"feature B",
0.75
]。
the model is a black box, particularly a tree model, and has no interpretability, the decision logic of the model is unknown, the model file is only a stack of parameters, and the parameters represent what meaning and have what function, so that a good means is needed to visualize the model, and the internal structure and the internal parameters of the model are displayed, which can help to explain the model better, so that the model is visually displayed through the step S16 in the embodiment.
Step S16: and performing tree display and rendering on the data in the designated Json format on the page through D3. js.
Referring to fig. 5 and fig. 6, fig. 5 is a schematic diagram illustrating a principle of a visual display step provided in the embodiment of the present application, and fig. 6 is a schematic flowchart illustrating a visual display step provided in the embodiment of the present application, then step S16 may specifically be as follows:
step S161: the tree pattern of D3.js is applied to the page container of the page by the tree structure layout of D3. js.
D3 (Data-drive Documents or D3. js) is a JavaScript library used to visualize Data using Web standards. The tree structure layout of d3.js may be a layout corresponding to a tree (tree mode).
Html, and a tree in the data in the specified Json format (i.e. data. js in fig. 5) is shown by using the tree.
Step S162: nodes and links of D3.js are constructed based on the specified Json format data.
The nodes are nodes in D3.js, the links in D3.js are connected, and the final display position of the nodes can be automatically calculated by the D3. js.
Step S163: a display attribute for each node is defined.
Specifically, the display attributes of the nodes may include custom display values, descriptions, styles, and the like, for example, each node of the decision tree displays data such as its features, split point thresholds, classification sample numbers, and labels, or as shown in fig. 7 below, fig. 7 is a schematic diagram illustrating the display attributes of the nodes of the random forest according to the embodiment of the present application, and the display attributes of the nodes display feature lists, classification labels, feature thresholds, and the like of each node of the random forest.
Optionally, in this embodiment, an additional mouse moving-in removal event may be added to the node, so as to perform content expansion display when the content is too long, and the content may be displayed in a tip manner, such as a decision principle, a threshold, a feature, a sample, and the like of each node of the display model.
Optionally, in this embodiment, a click event may be added to the node, and when the node is a parent node, the lower node may be contracted or expanded, so that the model may be conveniently visualized in different angles and dimensions. For example, for the ellipse radius settings of the sub-nodes: traversing the number of nodes on each layer, solving the maximum radius and the final radius of the nodes on each layer, wherein the value length of the final radius min (radius, value length/2) is obtained, all Chinese characters are converted into 2 English characters, then the length is obtained, the length is str.length x 14 (the font is assumed to be 14px), and the setting of node click expansion and contraction can be that the node child is changed into _ child when clicking.
Step S164: and performing tree display and rendering on the page based on the nodes, the connecting lines and the display attributes of the D3. js.
Specifically, in this embodiment, nodes, links, and display attributes of the tree are displayed using tree.
In order to cooperate with the tree-shaped machine learning model visualization method, the embodiment of the present application further provides a tree-shaped machine learning model visualization apparatus 20.
Referring to fig. 8, fig. 8 is a schematic block diagram of a tree machine learning model visualization apparatus according to an embodiment of the present disclosure.
The tree machine learning model visualization apparatus 20 includes:
a model file obtaining module 21, configured to obtain a model file;
the data extraction module 22 is configured to extract tree-structured data of the model file, combine the tree-structured data into data in a designated Json format, where the data in the designated Json format includes node data of the tree-structured data, and the node data includes whether each node is a leaf node, a node name, and a node rule;
and the visualization rendering module 23 is configured to perform tree display and rendering on data in the designated Json format on a page through the data in the designated Json format by the data in the designated Json format through the data in the designated Json format in the data in the page through the data in the designated Json format in.
Optionally, the model file obtaining module 21 is specifically configured to: obtaining Csv format model training data; setting model parameters according to the model type; model training is performed based on the model training data and the model parameters to obtain a model file.
Optionally, the model file obtaining module 21 is specifically configured to: reading model training data by using a file reading function read _ csv of Python; converting the model training data into dataFrame format data, and extracting a characteristic X and a label column Y in the dataFrame format data; obtaining model parameters, and constructing a model based on the model parameters; and substituting the characteristic X and the label column Y into a model.
Optionally, the data extraction module 22 is specifically configured to: obtaining a tree list of models in a model file by using models; analyzing each tree in the model in a recursive mode based on the tree list to obtain model data, wherein the model data comprises tree information, characteristics, labels and/or indexes of the trees; and assembling the model data into tree-structured data.
Optionally, the visualization rendering module 23 is specifically configured to: applying a tree mode of D3.js to a page container of a page through the tree structure layout of the D3. js; constructing nodes and connecting lines of D3.js based on the data in the specified Json format; defining a display attribute of each node; and performing tree display and rendering on the page based on the nodes, the connecting lines and the display attributes of the D3. js.
Optionally, the visualization rendering module 23 is specifically configured to: and adding a mouse moving-in and removing event to each node, wherein the mouse moving-in and removing event is used for expanding and displaying the content when the content is overlong.
Optionally, the visualization rendering module 23 is specifically configured to: and adding a click event to each node, wherein the click event is used for contracting or expanding the lower-layer node when the node is a father node.
The embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores program instructions, and when the processor reads and runs the program instructions, the processor executes the steps in any one of the tree machine learning model visualization methods provided in this embodiment.
It should be understood that the electronic device may be a Personal Computer (PC), a tablet PC, a smart phone, a Personal Digital Assistant (PDA), or other electronic device having a logical computing function.
The embodiment of the application also provides a readable storage medium, wherein computer program instructions are stored in the readable storage medium, and the computer program instructions are read by a processor and executed to execute the steps in the tree-shaped machine learning model visualization method.
To sum up, the embodiment of the present application provides a tree-shaped machine learning model visualization method, apparatus, electronic device and storage medium, the method includes: obtaining a model file; extracting tree structure data of the model file, and combining the tree structure data into appointed Json format data, wherein the appointed Json format data comprises node data of the tree structure data, and the node data comprises whether each node is a leaf node, a node name and a node rule; and performing tree display and rendering on the data in the designated Json format on a page through D3. js.
In the method, a model file is analyzed by using recursion based on a trained model, fields needing visualization, such as whether leaf nodes exist or not, threshold values, labels, samples, characteristics and the like of the nodes are extracted, the fields are assembled into a Json file according to the sequence of the nodes, and the front end analyzes and displays data by using a tree mechanism of D3, so that the visualization of the model is achieved. The scheme solves the problem that the visualization effect of the tree-shaped machine learning model is not good, the model is not only displayed by one static picture, but also is an interactive model webpage, a user can hold on the support and pull, a mouse moves to the node to automatically display the attribute of each node, the interactivity is good, and the integrity and the efficiency of the visualization display of the model are improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams, and combinations of blocks in the block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Therefore, the present embodiment further provides a readable storage medium, in which computer program instructions are stored, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the steps of any of the block data storage methods. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RanDom Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
It is noted that, in this document, relational terms are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for tree-based machine learning model visualization, the method comprising:
obtaining a model file;
extracting tree structure data of the model file, and combining the tree structure data into appointed Json format data, wherein the appointed Json format data comprises node data of the tree structure data, and the node data comprises whether each node is a leaf node, a node name and a node rule;
and performing tree display and rendering on the data in the designated Json format on a page through D3. js.
2. The method of claim 1, wherein obtaining the model file comprises:
obtaining Csv format model training data;
setting model parameters according to the model type;
and performing model training based on the model training data and the model parameters to obtain the model file.
3. The method of claim 2, wherein the model training based on the model training data and the model parameters comprises:
reading the model training data by using a file reading function read _ csv of Python;
converting the model training data into dataFrame format data, and extracting a characteristic X and a label column Y in the dataFrame format data;
obtaining the model parameters, and constructing a model based on the model parameters;
and substituting the characteristic X and the label column Y into a model.
4. The method of claim 1, wherein extracting the tree structure data of the model file comprises:
obtaining a tree list of models in the model file by using models;
analyzing each tree in the model in a recursive manner based on the tree list to obtain model data, wherein the model data comprises tree information, characteristics, labels and/or tree indexes;
and assembling the model data into the tree structure data.
5. The method of claim 1, wherein the tree-like presentation and rendering of the data in the designated Json format on a page by d3.js comprises:
applying the tree pattern of D3.js to a page container of the page through the tree structure layout of the D3. js;
constructing nodes and connecting lines of the D3.js based on the specified Json format data;
defining a display attribute of each node;
and performing tree display and rendering on the page based on the nodes, the connecting lines and the display attributes of the D3. js.
6. The method of claim 5, wherein after said defining the display attributes for each node, the method further comprises:
and adding a mouse moving-in and removing event to each node, wherein the mouse moving-in and removing event is used for expanding and displaying the content when the content is overlong.
7. The method of claim 5, wherein after said defining the display attributes for each node, the method further comprises:
and adding a click event to each node, wherein the click event is used for contracting or expanding the lower-layer node when the node is a father node.
8. An apparatus for tree machine learning model visualization, the apparatus comprising:
the model file acquisition module is used for acquiring a model file;
the data extraction module is used for extracting tree structure data of the model file and combining the tree structure data into appointed Json format data, the appointed Json format data comprises node data of the tree structure data, and the node data comprises whether each node is a leaf node, a node name and a node rule;
and the visual rendering module is used for performing tree display and rendering on the data in the designated Json format on a page through D3. js.
9. An electronic device comprising a memory having stored therein program instructions and a processor that, when executed, performs the steps of the method of any of claims 1-7.
10. A storage medium having stored thereon computer program instructions for executing the steps of the method according to any one of claims 1 to 7 when executed by a processor.
CN202110462855.6A 2021-04-28 2021-04-28 Tree-shaped machine learning model visualization method and device, electronic equipment and storage medium Pending CN112883242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110462855.6A CN112883242A (en) 2021-04-28 2021-04-28 Tree-shaped machine learning model visualization method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110462855.6A CN112883242A (en) 2021-04-28 2021-04-28 Tree-shaped machine learning model visualization method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112883242A true CN112883242A (en) 2021-06-01

Family

ID=76040690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110462855.6A Pending CN112883242A (en) 2021-04-28 2021-04-28 Tree-shaped machine learning model visualization method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112883242A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936052A (en) * 2021-09-16 2022-08-23 华为技术有限公司 Model visualization method, system and related equipment
WO2024045128A1 (en) * 2022-09-01 2024-03-07 西门子股份公司 Artificial intelligence model display method and apparatus, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834595A (en) * 2015-02-15 2015-08-12 网易(杭州)网络有限公司 Visual automatic test method and system
CN106953765A (en) * 2017-03-31 2017-07-14 焦点科技股份有限公司 A kind of interconnection path run-off data generation and exhibiting method
CN111338629A (en) * 2020-03-13 2020-06-26 京东数字科技控股有限公司 Data processing method and device for building tree diagram
CN111915710A (en) * 2020-07-10 2020-11-10 杭州渲云科技有限公司 Building rendering method based on real-time rendering technology

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834595A (en) * 2015-02-15 2015-08-12 网易(杭州)网络有限公司 Visual automatic test method and system
CN106953765A (en) * 2017-03-31 2017-07-14 焦点科技股份有限公司 A kind of interconnection path run-off data generation and exhibiting method
CN111338629A (en) * 2020-03-13 2020-06-26 京东数字科技控股有限公司 Data processing method and device for building tree diagram
CN111915710A (en) * 2020-07-10 2020-11-10 杭州渲云科技有限公司 Building rendering method based on real-time rendering technology

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SINAT_34102046: "Python读取表格中的数据用作机器学习--简单决策树例子--简易代码入门", 《HTTPS://BLOG.CSDN.NET/SINAT_34102046/ARTICLE/DETAILS/105016829》 *
小飞侠-2: "D3.js学习笔记十五:D3.js树图(Tree)展开和折叠", 《HTTPS://BLOG.CSDN.NET/QQ_26562641/ARTICLE/DETAILS/77480767》 *
菠萝Y: "将数据库中的树类型表,递归形成json 格式", 《HTTPS://BLOG.CSDN.NET/YUEAINI10000/ARTICLE/DETAILS/53501762》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114936052A (en) * 2021-09-16 2022-08-23 华为技术有限公司 Model visualization method, system and related equipment
WO2024045128A1 (en) * 2022-09-01 2024-03-07 西门子股份公司 Artificial intelligence model display method and apparatus, electronic device and storage medium

Similar Documents

Publication Publication Date Title
Hofmann et al. Text mining and visualization: Case studies using open-source tools
US8943016B2 (en) Systems and methods for semantic concept definition and semantic concept relationship synthesis utilizing existing domain definitions
CN106778878B (en) Character relation classification method and device
US11487844B2 (en) System and method for automatic detection of webpage zones of interest
US20190370274A1 (en) Analysis Method Using Graph Theory, Analysis Program, and Analysis System
CN110427614B (en) Construction method and device of paragraph level, electronic equipment and storage medium
US11379536B2 (en) Classification device, classification method, generation method, classification program, and generation program
Cuesta Practical data analysis
Shigarov et al. TabbyPDF: Web-based system for PDF table extraction
KR102078627B1 (en) Method and system for providing real-time feedback information associated with user-input contents
CN112883242A (en) Tree-shaped machine learning model visualization method and device, electronic equipment and storage medium
US20170132484A1 (en) Two Step Mathematical Expression Search
Cuesta et al. Practical data analysis
Nguyen et al. Web document analysis based on visual segmentation and page rendering
EP4172811A1 (en) System and method for automatic detection of webpage zones of interest
Yano et al. Labeling feature-oriented software clusters for software visualization application
John et al. Visual analysis of character and plot information extracted from narrative text
CN115373658A (en) Method and device for automatically generating front-end code based on Web picture
KR102570477B1 (en) Method for obtaining automatically user identification object in web page
US20210342531A1 (en) Method, apparatus, and computer-readable medium for transforming a hierarchical document object model to filter non-rendered elements
CN111028067A (en) E-commerce commodity searching method, device and equipment
Akhter Information extraction and interactive visualization of road accident related news
US20230306194A1 (en) Extensible framework for generating accessible captions for data visualizations
US20240126978A1 (en) Determining attributes for elements of displayable content and adding them to an accessibility tree
CN110990671B (en) Page type discrimination device and method and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210601