CN111259988A - Interactive random forest integration method and device and readable storage medium - Google Patents

Interactive random forest integration method and device and readable storage medium Download PDF

Info

Publication number
CN111259988A
CN111259988A CN202010115968.4A CN202010115968A CN111259988A CN 111259988 A CN111259988 A CN 111259988A CN 202010115968 A CN202010115968 A CN 202010115968A CN 111259988 A CN111259988 A CN 111259988A
Authority
CN
China
Prior art keywords
random forest
decision tree
interactive
data set
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010115968.4A
Other languages
Chinese (zh)
Inventor
林冰垠
卓本刚
唐兴兴
王跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202010115968.4A priority Critical patent/CN111259988A/en
Publication of CN111259988A publication Critical patent/CN111259988A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The invention discloses an interactive random forest integration method, equipment and a readable storage medium, wherein the interactive random forest integration method comprises the following steps: acquiring a sample data set, inputting the sample data set into an interactive random forest component, and outputting a parameter configuration interface; receiving configuration parameters input based on the parameter configuration interface, operating an interactive random forest assembly containing the sample data set and the configuration parameters, and generating an initial random forest model; and receiving an adjusting instruction triggered based on the initial random forest model, adjusting the initial random forest model according to the adjusting instruction, and generating a standard random forest model. The interactive random forest model is obtained by acquiring a training data set and a verification data set, inputting the acquired training data set and the verification data set into the random forest component, and further operating the component based on configuration parameters input by a user. The difficulty of random forest modeling is reduced, and the modeling efficiency is improved.

Description

Interactive random forest integration method and device and readable storage medium
Technical Field
The invention relates to the technical field of machine learning of financial technology (Fintech), in particular to an interactive random forest integration method, equipment and a readable storage medium.
Background
With the continuous development of financial science and technology, especially internet science and technology finance, more and more technologies are applied to the financial field, but the financial industry also puts higher requirements on the technologies, and with the continuous development of computer software and artificial intelligence, the application of machine learning modeling is more and more extensive.
In the machine learning modeling process of various business scenarios, sometimes, in order to obtain a more representative model, an ensemble learning model is used to replace a single decision tree model, and a random forest is one of the most commonly used ensemble learning models. In the traditional random forest modeling, modeling personnel need to sample a data set for multiple times by a bootstrap method (self-help method) to obtain multiple samples, then each sample is used for generating a complete decision tree model, and finally the models are integrated to obtain a random forest model. The modeling operation is complex, modeling personnel are required to be familiar with the structures of various loss functions and have high code capability, high difficulty and low efficiency.
Disclosure of Invention
The invention mainly aims to provide an interactive random forest integration method, equipment and a readable storage medium, and aims to solve the technical problems of high difficulty and low efficiency of random forest modeling in the prior art.
In order to achieve the above object, an embodiment of the present invention provides an interactive random forest integration method, where the interactive random forest integration method is applied to an interactive random forest integration device, and the interactive random forest integration method includes:
acquiring a sample data set, inputting the sample data set into an interactive random forest component, and outputting a parameter configuration interface;
receiving configuration parameters input based on the parameter configuration interface, operating an interactive random forest assembly containing the sample data set and the configuration parameters, and generating an initial random forest model;
and receiving an adjusting instruction triggered based on the initial random forest model, adjusting the initial random forest model according to the adjusting instruction, and generating a standard random forest model.
Optionally, the step of acquiring a sample data set, inputting the sample data set into an interactive random forest component, and outputting a parameter configuration interface includes:
calling an input port of an interactive random forest component, and connecting the input port with the sample data set;
and acquiring a training data set from the sample data set through the input port, inputting the acquired training data set into the interactive random forest component, and outputting a parameter configuration interface.
Optionally, the step of receiving configuration parameters input based on the parameter configuration interface, operating an interactive random forest component including the sample data set and the configuration parameters, and generating an initial random forest model includes:
receiving configuration parameters input based on the parameter configuration interface, extracting characteristic information in the configuration parameters, and acquiring a training data set in the sample data set;
and generating a plurality of decision tree models according to the characteristic information and the training data set, and integrating all the decision tree models to obtain an initial random forest model.
Optionally, after the step of receiving configuration parameters input based on the parameter configuration interface, operating an interactive random forest component including the sample data set and the configuration parameters, and generating an initial random forest model, the method includes:
when a viewing instruction input based on the initial random forest model is received, outputting a total view of the initial random forest model;
when a decision tree sorting instruction input based on the general view is received, a sorting index corresponding to the decision tree sorting instruction is obtained;
and redrawing the general view according to the sorting index, and outputting the redrawn general view.
Optionally, the step of outputting an overall view of the initial random forest model when receiving a viewing instruction input based on the initial random forest model is followed by:
when a decision tree searching instruction is received, a decision tree identifier in the decision tree searching instruction is obtained;
judging whether an object decision tree corresponding to the decision tree identification exists in the initial random forest model or not;
and if the object decision tree exists, outputting a decision tree diagram of the object decision tree.
Optionally, after the step of outputting the overall view of the initial random forest model when receiving the viewing instruction input based on the initial random forest model, the method further includes:
when a viewing instruction input based on a target decision tree in the total view is received, outputting a decision tree diagram of the target decision tree;
and if a decision tree reservation instruction input based on the target decision tree is received, setting the state of the target decision tree as a reservation state.
Optionally, after the step of outputting the overall view of the initial random forest model when receiving the viewing instruction input based on the initial random forest model, the method further includes:
when a decision tree screening instruction input based on the general view is received, screening parameters in the decision tree screening instruction are obtained, and a decision tree which accords with the screening parameters is used as a first decision tree;
when a abandon instruction is received, a decision tree identifier in the abandon instruction is obtained, and whether an object decision tree corresponding to the decision tree identifier exists in the first decision tree or not is judged;
if the object decision tree exists in the first decision tree, deleting the object decision tree in the first decision tree;
and when the first decision tree is reserved, deleting the decision tree which does not accord with the screening parameters and the object decision tree existing in the first decision tree.
Optionally, the step of receiving an adjustment instruction triggered based on the initial random forest model, adjusting the initial random forest model according to the adjustment instruction, and generating a standard random forest model includes:
receiving an adjusting instruction based on the initial random forest model, and acquiring a second decision tree corresponding to the adjusting instruction;
and after the second decision tree is deleted, adjusting the initial random forest model to generate a standard random forest model.
The invention also provides an interactive random forest integration device, which comprises: a memory, a processor and a program of the interactive random forest integration method stored on the memory and executable on the processor, which program, when executed by the processor, may implement the steps of the interactive random forest integration method as described above.
The invention also provides a readable storage medium, wherein a program for realizing the interactive random forest integration method is stored on the readable storage medium, and when the program for realizing the interactive random forest integration method is executed by a processor, the steps of the interactive random forest integration method are realized.
According to the method and the device, the training data set and the verification data set are obtained, the obtained training data set and the obtained verification data set are input into the interactive random forest component, then, the operation of the random forest component is executed based on the random forest configuration parameters input by a user, and after the operation of the random forest component is completed, the interactive random forest model is output. According to the method and the device, a pre-collected sample data set is obtained and is input into the interactive random forest component, a modeling worker can set parameter information of the random forest through a parameter configuration interface and operate the interactive random forest component to generate an initial random forest model, and the modeling worker can adjust the initial random forest model according to requirements. According to the method and the device, the modeling process is simplified, and the random forest model can be generated only by setting parameter information for modeling personnel, so that the workload of the modeling personnel is reduced, the requirements on the modeling personnel are also reduced, and errors generated during modeling are also reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a first embodiment of an interactive random forest integration method according to the present invention;
FIG. 2 is a schematic diagram of data set input in the interactive random forest integration method of the present invention;
FIG. 3 is a schematic diagram of parameter configuration in the interactive random forest integration method of the present invention;
FIG. 4 is a schematic diagram of decision tree elimination and retention in the interactive random forest integration method of the present invention;
FIG. 5 is a flowchart illustrating a second embodiment of an interactive random forest integration method according to the present invention;
FIG. 6 is a schematic diagram of a general view and a decision tree diagram in the interactive random forest integration method of the present invention;
FIG. 7 is a schematic diagram of decision tree search in the interactive random forest integration method of the present invention;
FIG. 8 is a schematic diagram of decision tree viewing in the interactive random forest integration method of the present invention;
FIG. 9 is a schematic diagram of decision tree retention in the interactive random forest integration method of the present invention;
FIG. 10 is a schematic diagram of model reuse in the interactive random forest integration method of the present invention;
fig. 11 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an interactive random forest integration method, which is applied to interactive random forest integration equipment, and in a first embodiment of the interactive random forest integration method, referring to fig. 1, the interactive random forest integration method comprises the following steps:
and step S10, acquiring a sample data set, inputting the sample data set into the interactive random forest component, and outputting a parameter configuration interface.
The sample data set in this embodiment includes: a training data set and a validation data set. Wherein the training data set is used for training the model, and the validation data set is used for evaluating the overall evaluation index of the interactive random forest model algorithm and the evaluation index of a single tree, including but not limited to AUC (Area Under Curve) index, KS (Kolmogorov-Smirnov ) index and LOSS (LOSS) index.
In the existing big data era, the problems of large data scale, more unstructured data, sparse characteristics, unbalanced categories and the like are faced. And a rich interaction mode needs to be provided during data preprocessing and data set segmentation. When the training set verification set is divided, the business meaning (binary classification, multi-classification or regression) and the time sequence (for example, news, video or audio and other resources with stronger time sequence) of the data are combined to ensure that the data are in the same distribution (i.e. ensure no difference and randomness of the data). After the data division is completed, inputting a training data set and a verification data set to an interactive random forest assembly together, outputting a parameter configuration interface by an interactive random forest integration program, and completing the initial work of interactive random forest modeling, wherein the interactive random forest assembly is used for receiving the input training data set and verification data set and outputting an interactive random forest model (which can be called as a model for simplification).
Specifically, the step refined in step S10 further includes:
step a1, calling an input port of the interactive random forest component, and connecting the input port with the sample data set.
A2, acquiring a training data set from the sample data set through the input port, inputting the acquired training data set into the interactive random forest component, and outputting a parameter configuration interface.
Therefore, the interactive random forest component is a simple package for data and methods, and the component may have a plurality of input/output interfaces, where the input interface of the interactive random forest component in this embodiment is used to receive a training data set and a verification data set, and the output interface is used to output an interactive random forest model.
As is known, a large amount of data samples are used in model training, before model training, the data is usually divided into several parts, the divided data are differentiated in a set form, two parts of the data divided into several parts are called a training data set and a verification data set, which are respectively stored in a training data table and a verification data table, two input ports of an interactive random forest component are obtained, and the purpose of respectively connecting the two input ports with the training data table and the verification data table is to enable the training data set and the verification data set to be input into the interactive random forest component through the input ports. As shown in fig. 2, the random forest in fig. 2 is an interactive random forest model component, the top of the random forest is connected to one end with an arrow, i.e., an input port, the other end without an arrow is respectively connected to a training data table and a verification data table, and the bottom of the random forest is an output port. After the training data set and the verification data set are input into the interactive random forest model component, the interactive random forest integration program will output a parameter configuration interface, and the parameter configuration interface is as shown in fig. 3.
And step S20, receiving configuration parameters input based on the parameter configuration interface, operating an interactive random forest assembly containing the sample data set and the configuration parameters, and generating an initial random forest model.
The modeling process comprises the steps of obtaining data samples, dividing the obtained data samples, inputting the divided data samples into an interactive random forest model component, adjusting model parameters and the like. The interactive random forest model component in this embodiment supports manual adjustment of parameters of the model by a user, and as shown in fig. 3, after the user adjusts various parameters in fig. 3 as needed and selects, stores, and operates the interactive random forest model component, the interactive random forest model component may generate a corresponding interactive random forest model according to parameters input by the user.
It can be known that, the interactive random forest model tuning is not only based on understanding of an algorithm, but also is a process of accumulating experience, this embodiment exemplifies a parameter of the number of trees, and it is known that an insufficient number of trees may cause an under-fitting phenomenon, and an excessive number of trees may cause an over-fitting phenomenon, and a specific settable range of the number of trees needs to be obtained by a large number of modeling practices by a person skilled in the art, for example, the range of the number of trees in fig. 3 is 1 to 10000, which is also a settable range when a user adjusts the parameter. The other parameters in fig. 3 are set in the same manner as the tree number.
Specifically, the step S20 is a step of refining, including:
step b1, receiving configuration parameters input based on the parameter configuration interface, extracting characteristic information in the configuration parameters, and acquiring a training data set in the sample data set.
And b2, generating a plurality of decision tree models according to the characteristic information and the training data set, and integrating all the decision tree models to obtain an initial random forest model.
As can be seen, the training data set for random forest modeling is derived from a sample data set, and during modeling, data samples required for constructing a decision tree can be obtained by sampling, and then configuration parameters input by a user on a parameter configuration interface are received, as shown in fig. 3, the configuration parameters input by the user include characteristic information required for constructing the decision tree, such as a maximum depth and a maximum branch tree. And when the characteristic information and the data sample are obtained, a generation decision tree can be constructed, and as is known, a single decision tree can be constructed by adopting the data sample obtained in a sampling mode, and the constructed decision tree can be integrated as required to obtain a random forest model.
And step S30, receiving an adjusting instruction triggered based on the initial random forest model, adjusting the initial random forest model according to the adjusting instruction, and generating a standard random forest model.
In this embodiment, the standard random forest model refers to a new random forest model generated after an interactive random forest integration program adjusts an initial random forest model based on an adjustment instruction triggered by a user, and it is known that if the adjustment of the initial random forest model only involves the change of a model view, the model itself does not change, in this case, the standard random forest model is only used as a difference call before and after the model adjustment, and if the adjustment of the initial random forest model involves the change of model information, the standard random forest model is a new random forest model generated after the adjustment. In this embodiment, the interactive random forest model generated by the interactive random forest component supports not only visual viewing, but also adjustment of the whole model by the user based on needs. It is known that each adjustment of the interactive random forest model by the user is accompanied by the generation of a new model (i.e., updating the model), which may be from different views or from different data in the information table.
Specifically, in the case of deletion of a decision tree, when a user filters all trees in the model general view based on the filtering parameters (including loss, KS, and AUC), the trees that do not meet the filtering conditions may be deleted, and it is known that when a decision tree is deleted, a corresponding portion of the visual display area will change, the model general view will be redrawn, the deleted decision tree may not be displayed in the visual display area, or may be displayed in the visual display area but cannot be selected, and the information table on the right side of the visual display area will be updated correspondingly.
Specifically, the step S30 is a step of refining, including:
and c1, receiving an adjusting instruction based on the initial random forest model, and acquiring a second decision tree corresponding to the adjusting instruction.
And c2, after the second decision tree is deleted, adjusting the initial random forest model to generate a random forest model.
It is understood that the adjustment made based on the initial random forest model in this embodiment is caused by deleting some decision trees in the model, and the adjustment made to the model will cause changes to data in the information table related to the model in addition to changes to the overall view of the model. The second decision tree in this embodiment is a deleted tree, where the second decision tree includes a tree that does not meet the filtering condition and a tree that is additionally rejected by the user, and the second decision tree may be determined during the filtering of the decision tree and the viewing of a single decision tree, as shown in fig. 4, based on the filtering condition of the decision tree input by the user, or the user may determine the second decision tree by manually inputting a decision tree identifier in an input box of "additionally rejected trees", and it is known that the decision tree input by the user in the input box of "additionally rejected trees" is not affected by the filtering condition, that is, the decision tree input by the user in the input box of "additionally rejected trees" is deleted regardless of whether the decision tree meets the filtering condition or not.
And after the second decision tree is deleted, the initial random forest model is adjusted, the adjusted content comprises the view and data in the information table, and the random forest model is generated after the initial random forest model is adjusted.
In this embodiment, a training data set and a verification data set are obtained, the obtained training data set and the obtained verification data set are input into the interactive random forest component, then, based on random forest configuration parameters input by a user, operation of the random forest component is executed, and after the operation is completed and the random forest component is successfully operated, the interactive random forest model is output. The method and the device support visual viewing and interactive operation of the model, and when a user performs viewing operation based on each output part in the visual display area, the visual display area can be correspondingly changed based on the operation of the user. The method also supports the adjustment of the whole model, including the screening of trees which do not meet the conditions, and when the decision tree is deleted or kept every time, the method also includes the data change in the information table besides the change of the model general view, and the data change in the information table reflects the change of the whole model. The acquired training data set and the acquired verification data set are input into the interactive random forest component, and after parameters are configured, the component is operated to obtain an interactive random forest model, so that rapid and simple modeling is realized.
Further, referring to fig. 5, a second embodiment of the audio information management method of the present invention is proposed on the basis of the above-described embodiment of the present invention.
This embodiment is a step after step S20 in the first embodiment, and the present embodiment is different from the above-described embodiments of the present invention in that:
and step S40, when a viewing instruction input based on the initial random forest model is received, outputting a total view of the initial random forest model.
It is noted that the general view viewing instruction input based on the random forest model in the embodiment includes active input by the user and automatic generation. When the user configures the parameters of the model and operates the random forest component, that is, the total view viewing instruction is actively input, and after the component is successfully operated, the user can visually view the model result, as shown in fig. 6, the total view of the model can be output in the visual display area. When the user performs the viewing operation on the display area, if the viewing operation affects the overall view of the model, for example, all the trees are sorted according to different indexes of the decision tree, the user performs the sorting operation and also automatically generates an overall view viewing instruction, that is, each sorting is accompanied with the generation of a new overall view. It is understood that there are many cases where the total view is updated, and the embodiment will not be described in detail.
Specifically, steps subsequent to step S40 include:
and d1, when a decision tree search instruction is received, obtaining a decision tree identifier in the decision tree search instruction.
And d2, judging whether an object decision tree corresponding to the decision tree identification exists in the initial random forest model.
And d3, if the object decision tree exists, outputting a decision tree diagram of the object decision tree.
Each decision tree in the generated interactive random forest model has a specific identification number (i.e., ID), and a user can manually input the identification number in an input box at a specific part of the visual display area, except for manually clicking a certain decision tree in the model general view, as shown in fig. 7. It can be known that when there is no decision tree corresponding to the identification number input by the user in the model, prompt information that "the searched identification number does not exist" may be displayed in the visualization display area (not shown in fig. 7), and the reason why there is no decision tree corresponding to the identification number input by the user is not that the identification number input by the user is not standard (i.e., the identification number is input incorrectly), but may also be because the user selects all trees of the model before searching the identification number, and it is known that the decision tree that does not meet the selection condition is deleted, and if the identification number input by the user is a decision tree that has been deleted by the user before searching, prompt information that "the searched identification number does not exist" may be displayed in the visualization display area. The identification numbers in fig. 7 are only used for illustration, and the specific content of the identification numbers in this embodiment is not limited and detailed. The mode of supporting the user to manually input the identification number can effectively improve the efficiency of the user for checking the corresponding decision tree when the decision tree is more and the user searches purposefully.
Specifically, the steps subsequent to step S40 further include:
and e1, when receiving a viewing instruction input based on the target decision tree in the total view, outputting a decision tree diagram of the target decision tree.
The model general view comprises all decision trees of the model, the decision trees are sorted according to default indexes when the model is just generated, a user can manually select different indexes to sort the decision trees in the model, and the model general view can be redrawn along with each sorting. The method for the user to view a single decision tree in the model general view includes two methods, one is that the user directly clicks a certain decision tree in the selected model general view, and the corresponding position in the visual display area displays the view of the decision tree (i.e., the decision tree diagram) and the information of the decision tree (including the decision tree identification number and the parameter information), and the other is that the user inputs the decision tree identification number in a search box in the visual area, and if the decision tree corresponding to the identification number exists in the model, the corresponding decision tree diagram is displayed in the visual display area. It is to be noted that, as shown in fig. 8, in the first method for viewing a single decision tree, if a model is already screened before viewing and there is a deleted decision tree, and the user does not click the "display screening" button, the deleted decision tree (the deleted decision tree and the retained decision tree are distinguished by light and dark in fig. 8) is retained in the model general view, and when the deleted decision tree (which is lighter in color) is clicked by the user, the view of the decision tree and the information of the decision tree are not displayed at the corresponding position in the visualization display area.
Step e2, if a decision tree reservation instruction based on the target decision tree input is received, setting the state of the target decision tree to a reserved state.
As shown in fig. 9, when the user views a single decision tree, a button of "keep the tree" appears at the upper right of the decision tree diagram, and when the user clicks and selects the button, the state of the decision tree will be set to an unfiltered state, that is, when the user sets a filtering condition to filter the decision tree, the decision tree will be kept no matter whether the decision tree meets the filtering condition, and if the user does not click the button of "keep the tree", the tree will be deleted when the tree does not meet the filtering condition set by the user. As shown in fig. 4, the method for retaining a decision tree can manually input the identification number of the decision tree to be retained in the input box of "additionally retain the following tree" when performing the decision tree screening, and if the decision tree corresponding to the input identification number exists in the model, the state of the decision tree will be set to an unfiltered state, that is, the decision tree will be retained regardless of whether the decision tree meets the screening condition or not.
Specifically, the steps subsequent to step S40 further include:
and f1, when a decision tree screening instruction input based on the general view is received, obtaining screening parameters in the decision tree screening instruction, and taking the decision tree conforming to the screening parameters as a first decision tree.
Step e2, when a discard instruction is received, obtaining a decision tree identifier in the discard instruction, and determining whether an object decision tree corresponding to the decision tree identifier exists in the first decision tree.
Step f3, if there is an object decision tree in the first decision tree, deleting the object decision tree in the first decision tree.
And f4, when the first decision tree is reserved, deleting the decision tree which does not accord with the screening parameters and the object decision tree existing in the first decision tree.
The user can click a "screening" icon in the upper right corner of the model general view to pop up a screening window, and further screen the decision tree in the model, as shown in fig. 4, the screening parameters include KS, AUC and loss, wherein the parameters applicable to the screening of the two-class model include KS, AUC and loss, and the parameters applicable to the screening of the multi-class model and the regression model are loss. When the user inputs the screening conditions based on the screening parameters, additionally reserved trees and additionally rejected trees can be input below the screening window, and when the user inputs and clicks to determine the conditions, if additionally reserved trees input by the user exist in the trees which do not accord with the screening conditions, the other trees except the additionally reserved trees in the trees which do not accord with the screening conditions are deleted. If the additional reserved trees input by the user exist in the trees which do not accord with the screening condition, the additional reserved trees in the trees which do not accord with the screening condition cannot be deleted.
In the embodiment, the interactive random forest model output after the operation of the interactive random forest component supports visual display, a user can adjust the whole model or a single decision tree based on requirements except that the information of the whole model and the information of the single decision tree can be seen, the model can be adjusted based on the viewing of the model information or the requirements, the visual display area has corresponding operation instructions, and the operation of the user on the model is facilitated on the basis of meeting the user requirements.
As shown in fig. 10, the interactive random forest model in this embodiment further supports model reuse, that is, model parameters of the upstream random forest component can be reused to the downstream random forest component, and the downstream random forest component can directly perform model training by using the method of the upstream random forest model, so as to check an operation result without configuring parameters. Alternatively, the random forest model may be output as an input to a component for cross validation or prediction, and used for cross validation of a trained model and model prediction of new data, respectively. When the model is multiplexed, the output port of the upstream random forest component is connected with the input port of the downstream random forest component to complete the model transmission channel, and then the downstream random forest component is connected with a data table to complete the model multiplexing component building.
Step S50, when a decision tree sorting instruction input based on the overall view is received, obtaining a sorting index corresponding to the decision tree sorting instruction.
And step S60, redrawing the general view according to the sorting index, and outputting the redrawn general view for a user to view.
In the case of deletion of a decision tree, when a user manually clicks one of all the trees of the interactive random forest model, the decision tree is marked explicitly, a view of the decision tree (i.e., a decision tree diagram) is displayed below the visual display area, and information related to the decision tree, including a tree ID, a tree index and the like, is displayed at the upper right of the decision tree diagram. When a user views different decision trees each time, the corresponding position of the whole visualization display area changes. When the user chooses to sort all the trees in the model general view according to different indexes (including Tree size, KS and AUC) of the decision Tree (also called Tree in a simplified mode), the general view of the model is updated (namely redrawn) according to the indexes input by the user.
After the random forest model is generated, the method and the device also support visual checking of the model and relevant data thereof, the model is adjusted more simply and conveniently, the workload of modeling personnel is reduced, and the modeling efficiency is improved.
Referring to fig. 11, fig. 11 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 11, the interactive random forest integration apparatus may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.
Optionally, the interactive random forest integration device may further include a rectangular user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).
Those skilled in the art will appreciate that the structure of the interactive random forest integration apparatus shown in fig. 11 does not constitute a limitation of the interactive random forest integration apparatus, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
As shown in fig. 11, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and an interactive random forest integration program. The operating system is a program for managing and controlling hardware and software resources of the interactive random forest integration equipment and supports the operation of the interactive random forest integration program and other software and/or programs. The network communication module is used for realizing communication among components in the memory 1005 and communication with other hardware and software in the interactive random forest integration system.
In the interactive random forest integration processing apparatus shown in fig. 11, the processor 1001 is configured to execute an interactive random forest integration program stored in the memory 1005, and implement any one of the steps of the interactive random forest integration method described above.
The specific implementation of the interactive random forest integration equipment is basically the same as that of each embodiment of the interactive random forest integration method, and details are not repeated here.
The invention also provides an interactive random forest integration device, which comprises:
the data input module is used for acquiring a training data set and a verification data set and inputting the training data set and the verification data set into the interactive random forest component;
the component operation module is used for operating the random forest component after receiving random forest configuration parameters input by a user and outputting a data set and an interactive random forest model;
and the model output module is used for outputting the updated random forest model based on an interactive random forest model updating instruction input by a user.
Optionally, the data input module includes:
the port connecting unit is used for calling an input port of the interactive random forest component and connecting the input port with the sample data set;
and the data set acquisition unit is used for acquiring a training data set from the sample data set through the input port, inputting the acquired training data set into the interactive random forest component and outputting a parameter configuration interface.
Optionally, the component operating module includes:
the characteristic extraction unit is used for receiving configuration parameters input based on the parameter configuration interface, extracting characteristic information in the configuration parameters and acquiring a training data set in the sample data set;
and the integration unit is used for generating a plurality of decision tree models according to the characteristic information and the training data set, and integrating all the decision tree models to obtain an initial random forest model.
Optionally, the interactive random forest integration apparatus further includes:
a total view output unit, configured to output a total view of the initial random forest model when receiving a viewing instruction input based on the initial random forest model;
the decision tree sorting unit is used for acquiring a sorting index corresponding to a decision tree sorting instruction when the decision tree sorting instruction input based on the general view is received;
and the redrawing unit is used for redrawing the general view according to the sorting index and outputting the redrawed general view for a user to check.
Optionally, the interactive random forest integration apparatus further includes:
the decision tree searching unit is used for acquiring a decision tree identifier in a decision tree searching instruction when the decision tree searching instruction is received;
the judging unit is used for judging whether an object decision tree corresponding to the decision tree identification exists in the initial random forest model or not;
and the output unit is used for outputting the decision tree diagram of the object decision tree if the object decision tree exists.
Optionally, the interactive random forest integration apparatus further includes:
a decision tree graph output unit, configured to output a decision tree graph of a target decision tree when a viewing instruction input based on the target decision tree in the total view is received;
and the decision tree retaining unit is used for setting the state of the target decision tree into a retaining state if receiving a decision tree retaining instruction input based on the target decision tree.
Optionally, the interactive random forest integration apparatus further includes:
the screening unit is used for acquiring screening parameters in the decision tree screening instruction when the decision tree screening instruction input based on the general view is received, and taking the decision tree which accords with the screening parameters as a first decision tree;
a decision tree identifier obtaining unit, configured to, when a discard instruction is received, obtain a decision tree identifier in the discard instruction, and determine whether an object decision tree corresponding to the decision tree identifier exists in the first decision tree;
an object decision tree deleting unit, configured to delete an object decision tree in the first decision tree if the object decision tree exists in the first decision tree;
and the first decision tree retaining unit is used for deleting the decision trees which do not accord with the screening parameters and the object decision trees existing in the first decision tree when the first decision tree is retained.
Optionally, the model output module includes:
the model adjusting unit is used for receiving an adjusting instruction based on the initial random forest model and acquiring a second decision tree corresponding to the adjusting instruction;
and the second decision tree deleting unit is used for adjusting the initial random forest model after deleting the second decision tree and generating a random forest model.
The specific implementation of the interactive random forest integration device of the invention is basically the same as that of each embodiment of the interactive random forest integration method, and is not described herein again.
The invention provides a readable storage medium storing one or more programs, the one or more programs being further executable by one or more processors for implementing the steps of the interactive random forest integration method according to any one of the preceding claims.
The specific implementation of the medium of the present invention is basically the same as that of each embodiment of the above-mentioned interactive random forest integration method, and is not described herein again.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An interactive random forest integration method, characterized in that the interactive random forest integration method comprises:
acquiring a sample data set, inputting the sample data set into an interactive random forest component, and outputting a parameter configuration interface;
receiving configuration parameters input based on the parameter configuration interface, operating an interactive random forest assembly containing the sample data set and the configuration parameters, and generating an initial random forest model;
and receiving an adjusting instruction triggered based on the initial random forest model, adjusting the initial random forest model according to the adjusting instruction, and generating a standard random forest model.
2. The interactive random forest integration method of claim 1, wherein the steps of obtaining a sample data set, inputting the sample data set into an interactive random forest component, and outputting a parameter configuration interface, comprise:
calling an input port of an interactive random forest component, and connecting the input port with the sample data set;
and acquiring a training data set from the sample data set through the input port, inputting the acquired training data set into the interactive random forest component, and outputting a parameter configuration interface.
3. A method for interactive random forest integration according to claim 1, wherein the step of receiving configuration parameters input based on the parameter configuration interface, operating an interactive random forest component containing the set of sample data and the configuration parameters, and generating an initial random forest model comprises:
receiving configuration parameters input based on the parameter configuration interface, extracting characteristic information in the configuration parameters, and acquiring a training data set in the sample data set;
and generating a plurality of decision tree models according to the characteristic information and the training data set, and integrating all the decision tree models to obtain an initial random forest model.
4. A method for interactive random forest integration according to claim 1, wherein the step of receiving configuration parameters input based on the parameter configuration interface, running an interactive random forest component containing the set of sample data and the configuration parameters, and generating an initial random forest model comprises, after the step of:
when a viewing instruction input based on the initial random forest model is received, outputting a total view of the initial random forest model;
when a decision tree sorting instruction input based on the general view is received, a sorting index corresponding to the decision tree sorting instruction is obtained;
and redrawing the general view according to the sorting index, and outputting the redrawn general view.
5. An interactive random forest integration method as claimed in claim 4 wherein the step of outputting an overall view of the initial random forest model when a view instruction based on the initial random forest model input is received is followed by:
when a decision tree searching instruction is received, a decision tree identifier in the decision tree searching instruction is obtained;
judging whether an object decision tree corresponding to the decision tree identification exists in the initial random forest model or not;
and if the object decision tree exists, outputting a decision tree diagram of the object decision tree.
6. An interactive random forest integration method as recited in claim 4, wherein the step of outputting an overall view of the initial random forest model upon receiving a view instruction input based on the initial random forest model further comprises, after the step of:
when a viewing instruction input based on a target decision tree in the total view is received, outputting a decision tree diagram of the target decision tree;
and if a decision tree reservation instruction input based on the target decision tree is received, setting the state of the target decision tree as a reservation state.
7. An interactive random forest integration method as recited in claim 4, wherein the step of outputting an overall view of the initial random forest model upon receiving a view instruction input based on the initial random forest model further comprises, after the step of:
when a decision tree screening instruction input based on the general view is received, screening parameters in the decision tree screening instruction are obtained, and a decision tree which accords with the screening parameters is used as a first decision tree;
when a abandon instruction is received, a decision tree identifier in the abandon instruction is obtained, and whether an object decision tree corresponding to the decision tree identifier exists in the first decision tree or not is judged;
if the object decision tree exists in the first decision tree, deleting the object decision tree in the first decision tree;
and when the first decision tree is reserved, deleting the decision tree which does not accord with the screening parameters and the object decision tree existing in the first decision tree.
8. An interactive random forest integration method as claimed in any one of claims 1 to 7 wherein the step of receiving adjustment instructions triggered based on the initial random forest model, adjusting the initial random forest model in accordance with the adjustment instructions, and generating a standard random forest model comprises:
receiving an adjusting instruction based on the initial random forest model, and acquiring a second decision tree corresponding to the adjusting instruction;
and after the second decision tree is deleted, adjusting the initial random forest model to generate a standard random forest model.
9. An interactive random forest integration apparatus, comprising: the device comprises a memory, a processor and a program stored on the memory for implementing the interactive random forest integration method, wherein the memory is used for storing the program for implementing the interactive random forest integration method;
the processor is configured to execute a program implementing the interactive random forest integration method to implement the steps of the interactive random forest integration method according to any one of claims 1 to 8.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a program for implementing an interactive random forest integration method, the program for implementing the interactive random forest integration method being executed by a processor for implementing the steps of the interactive random forest integration method according to any one of claims 1 to 8.
CN202010115968.4A 2020-02-24 2020-02-24 Interactive random forest integration method and device and readable storage medium Pending CN111259988A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010115968.4A CN111259988A (en) 2020-02-24 2020-02-24 Interactive random forest integration method and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010115968.4A CN111259988A (en) 2020-02-24 2020-02-24 Interactive random forest integration method and device and readable storage medium

Publications (1)

Publication Number Publication Date
CN111259988A true CN111259988A (en) 2020-06-09

Family

ID=70951181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010115968.4A Pending CN111259988A (en) 2020-02-24 2020-02-24 Interactive random forest integration method and device and readable storage medium

Country Status (1)

Country Link
CN (1) CN111259988A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914881A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Random forest generation method and device, electronic equipment and storage medium
CN112287191A (en) * 2020-07-31 2021-01-29 北京九章云极科技有限公司 Model display method and device and electronic equipment
CN112799658A (en) * 2021-04-12 2021-05-14 北京百度网讯科技有限公司 Model training method, model training platform, electronic device, and storage medium
CN113095432A (en) * 2021-04-27 2021-07-09 电子科技大学 Visualization system and method based on interpretable random forest

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914881A (en) * 2020-06-18 2020-11-10 北京百度网讯科技有限公司 Random forest generation method and device, electronic equipment and storage medium
CN112287191A (en) * 2020-07-31 2021-01-29 北京九章云极科技有限公司 Model display method and device and electronic equipment
CN112799658A (en) * 2021-04-12 2021-05-14 北京百度网讯科技有限公司 Model training method, model training platform, electronic device, and storage medium
CN113095432A (en) * 2021-04-27 2021-07-09 电子科技大学 Visualization system and method based on interpretable random forest

Similar Documents

Publication Publication Date Title
CN111259988A (en) Interactive random forest integration method and device and readable storage medium
CN111400186B (en) Performance test method and system
CN106528769A (en) Data acquisition method and apparatus
CN106155884A (en) A kind of log analysis method and system
CN104933044B (en) Using the classification method and sorter of unloading reason
CN103324728A (en) Mobile terminal application program searching method and apparatus
CN115033894B (en) Software component supply chain safety detection method and device based on knowledge graph
Greene et al. Visualizing and exploring software version control repositories using interactive tag clouds over formal concept lattices
CN114021156A (en) Method, device and equipment for organizing vulnerability automatic aggregation and storage medium
CN114254950A (en) Telecommunication resource data processing method and device, electronic equipment and storage medium
CN113409555A (en) Real-time alarm linkage method and system based on Internet of things
CN110909888A (en) Method, device and equipment for constructing generic decision tree and readable storage medium
CN109389972B (en) Quality testing method and device for semantic cloud function, storage medium and equipment
CN110442782B (en) Cloud resource retrieval method and device
CN108182142A (en) Test resource integration method, system and function test method, system
CN110895470A (en) Applet management apparatus and management method
CN116089490A (en) Data analysis method, device, terminal and storage medium
CN115495362A (en) Method, device, storage medium and computer equipment for generating test code
CN110727565A (en) Network equipment platform information collection method and system
US20180067837A1 (en) Framework for detecting source code anomalies
CN112783775A (en) Special character input testing method and device
CN113918534A (en) Policy processing system and method
CN116383095B (en) Smoking test method and system based on RPA robot and readable storage medium
CN110377504A (en) Test method, device, equipment and the storage medium of application program smoothness degree
CN117519702B (en) Search page design method and system based on low code collocation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination