WO2024021555A1 - Procédé et dispositif d'examen et d'approbation de ressources, et procédé et dispositif d'apprentissage de modèle de forêt aléatoire - Google Patents

Procédé et dispositif d'examen et d'approbation de ressources, et procédé et dispositif d'apprentissage de modèle de forêt aléatoire Download PDF

Info

Publication number
WO2024021555A1
WO2024021555A1 PCT/CN2023/074133 CN2023074133W WO2024021555A1 WO 2024021555 A1 WO2024021555 A1 WO 2024021555A1 CN 2023074133 W CN2023074133 W CN 2023074133W WO 2024021555 A1 WO2024021555 A1 WO 2024021555A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
resource
decision tree
approval
training set
Prior art date
Application number
PCT/CN2023/074133
Other languages
English (en)
Chinese (zh)
Inventor
常三强
胡成倩
张麒
韩冬
Original Assignee
京东科技信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东科技信息技术有限公司 filed Critical 京东科技信息技术有限公司
Publication of WO2024021555A1 publication Critical patent/WO2024021555A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Definitions

  • the present disclosure relates to the field of cloud computing technology, and in particular to resource approval methods, random forest model training methods and devices, and computer-readable storage media.
  • Private cloud provides services to organizations within the enterprise and can provide a variety of cloud products to enterprise users, thus forming a complex cloud ecological chain. It has the characteristics of high data security and strong controllability of IT infrastructure.
  • Enterprise-level users usually have complex multi-layered internal organizational structures.
  • users at different levels within the enterprise will be given differentiated permissions, and the resources that users with different permissions can use also have different specifications.
  • users need to use resources beyond their own permissions they need to submit an application and wait for the application to be approved before they can use the product normally.
  • a resource approval method including:
  • the approval result is predicted, where the approval result indicates whether to release the resource requested by the user;
  • the method of synthesizing the approval results of each decision tree to determine whether to release the resources requested by the user includes:
  • determining whether to release the resource requested by the user is based on the proportion of the number of decision trees that generate the same approval result to the total number of decision trees and the first preset threshold, including:
  • determining whether to release the resource requested by the user is based on the proportion of the number of decision trees that generate the same approval result to the total number of decision trees and the first preset threshold, including:
  • the ratio of the number of decision trees that generate the same approval result to the total number of decision trees does not exceed the first preset threshold, it is determined whether to release the resource requested by the user based on multiple characteristics.
  • the user's request for resource usage also includes the user's historical usage request for the resource.
  • the characteristics of the user's resource usage request include: the type of resource the user requests to use, the specifications of the resource the user requests to use, the number of resources the user requests to use, the user's resource usage permissions and the user's request. At least one reason for using the resource.
  • a training method for a random forest model including:
  • the training set includes samples of user requests for resource use, and the samples also include labels indicating whether to release the resources requested by the user;
  • Each decision tree is trained based on the values of its candidate features, as well as the labels of the samples.
  • training each decision tree based on the value of the candidate feature of each decision tree and the label of the sample includes:
  • the feature corresponding to the current node in the training set sample corresponding to the child node of the current node, and the label of the sample, select the feature corresponding to the child node of the current node from the remaining candidate features;
  • the child nodes of the current node include a first child node of the current node and a second child node of the current node, the value of the feature corresponding to the current node according to the sample in the training set corresponding to the current node, and The label of the sample determines the training set corresponding to the child node of the current node, including:
  • the value of the feature corresponding to the current node in the sample in the training set corresponding to the current node, and the label of the sample select a feature value from the value range of the feature corresponding to the current node as the first dividing line between the current node and the current node.
  • the split point that divides the training set corresponding to the first sub-node of the current node and the training set corresponding to the second sub-node of the current node it is determined to divide the samples in the training set corresponding to the current node into the first sub-node.
  • the training set is also the training set of the second child node.
  • the cutoff conditions include that there are no remaining candidate features, the number of samples in the training set corresponding to the current node is less than a second preset threshold, and the Gini coefficient of the training set corresponding to the current node is less than a third preset threshold.
  • Set at least one threshold.
  • training each decision tree based on the value of the candidate feature of each decision tree and the label of the sample includes:
  • the decision tree is trained based on the values of the candidate features corresponding to the decision tree in the samples in the training set of the decision tree and the labels indicating whether to release the resources requested by the user.
  • the multiple characteristics of determining the user's resource usage request include:
  • the value of the missing feature of the sample is determined.
  • a resource approval device including:
  • the acquisition module is configured to obtain the user's request for resource use
  • a first determination module configured to determine a plurality of characteristics of the user's request for resource use
  • the selection module is configured to, for each decision tree in the random forest model, select the feature corresponding to the decision tree from multiple features;
  • the prediction module is configured to predict the approval result based on the value of the feature corresponding to each decision tree, where, approval The result indicates whether the resources requested by the user are released;
  • the second determination module is configured to synthesize the approval results of each decision tree and determine whether to release the resources requested by the user.
  • a training device for a random forest model including:
  • the acquisition module is configured to obtain a training set, where the training set includes samples of user requests for resource use, and the samples also include labels indicating whether to release the resources requested by the user;
  • a determining module configured to determine a plurality of characteristics of the user's request for resource usage
  • the extraction module is configured to extract some features from multiple features for each decision tree in the random forest model as candidate features for the decision tree;
  • the training module is configured to train each decision tree based on the values of the candidate features of each decision tree and the labels of the samples.
  • an electronic device including:
  • a processor coupled to the memory, the processor being configured to execute the resource approval method according to any embodiment of the present disclosure, or the resource approval method according to any embodiment of the present disclosure based on instructions stored in the memory.
  • a computer-storable medium is provided, with computer program instructions stored thereon.
  • the instructions are executed by a processor, the resource approval method according to any embodiment of the present disclosure is implemented, or the resource approval method is implemented according to any embodiment of the present disclosure.
  • the training method of the random forest model according to any embodiment of the present disclosure.
  • Figure 1 shows a flow chart of a resource approval method according to some embodiments of the present disclosure
  • Figure 2 shows a schematic diagram of a decision tree according to some embodiments of the present disclosure
  • Figure 3 shows a schematic diagram of a random forest model determining whether to release resources according to some embodiments of the present disclosure
  • Figure 4 shows a flowchart of resource approval according to some embodiments of the present disclosure
  • Figure 5 shows a flow chart of a training method of a random forest model according to some embodiments of the present disclosure
  • Figure 6 shows a schematic diagram of pruning a decision tree according to some embodiments of the present disclosure
  • Figure 7 shows a block diagram of a resource approval device according to some embodiments of the present disclosure
  • Figure 8 shows a block diagram of a training device for a random forest model according to some embodiments of the present disclosure
  • Figure 9 shows a block diagram of an electronic device according to other embodiments of the present disclosure.
  • Figure 10 illustrates a block diagram of a computer system for implementing some embodiments of the present disclosure.
  • any specific values are to be construed as illustrative only and not as limiting. Accordingly, other examples of the exemplary embodiments may have different values.
  • the nodes in the approval flow are often responsible persons at all levels of enterprises or institutions. Each node needs to approve the resource use requests of many users. It is inevitable that erroneous approval operations will occur, reducing the accuracy of approval.
  • the approval process often goes through multiple nodes, and the approval time at each node depends on the situation of the person in charge of the node. Any obstruction at any node will cause the entire approval flow to stagnate, reducing the efficiency of approval completion.
  • some embodiments of the present disclosure provide a resource approval method, a random forest training method and device, and a computer-readable storage medium.
  • Figure 1 shows a flow chart of a resource approval method according to some embodiments of the present disclosure.
  • the resource approval method includes steps S110 to S150.
  • the following resource approval method is executed by the resource approval device.
  • resource approval devices include input and output devices and processors.
  • the resource approval method includes obtaining the user's resource usage request through input and output devices (such as interactive panels, etc.); using the processor to determine multiple characteristics of the user's resource usage request; using the processor to target each user in the random forest model.
  • a decision tree is used to select the feature corresponding to the decision tree from multiple features; the decision tree algorithm is used to predict the approval result based on the value of the feature corresponding to each decision tree, where the approval result indicates whether to issue the request requested by the user.
  • resources the decision tree algorithm is executed by the processor; the processor is used to synthesize the approval results of each decision tree to determine whether to release the resources requested by the user.
  • Steps S110 to S150 will be introduced in detail below.
  • step S110 the user's request for resource use is obtained.
  • the user when the user needs to use resources beyond their own permissions, the user is prompted and the resource usage request filled in by the user on the page is obtained.
  • step S120 multiple characteristics of the user's request for resource usage are determined.
  • the characteristics of the user's request to use resources include: the type of resource the user requests to use, the specifications of the resource the user requests to use, the number of resources the user requests to use, the user's resource use permissions and the user's request to use the resource. at least one of the reasons.
  • multiple features such as the user's position, rank, and responsible work are extracted as multiple features of the user's resource use request.
  • step S130 for each decision tree in the random forest model, a feature corresponding to the decision tree is selected from a plurality of features.
  • a random forest model consists of multiple decision trees, each of which makes predictions based on only a subset of the multiple features requested using the request.
  • Figure 2 shows a schematic diagram of a decision tree according to some embodiments of the present disclosure.
  • each decision tree includes multiple nodes, and one node of the decision tree is a feature. Therefore, with The features corresponding to the decision tree are also the features corresponding to multiple nodes of the decision tree.
  • the characteristics corresponding to the decision tree include: whether the user has the same historical usage request, the user's rank, the permission group to which the user belongs, the value of the resources applied by the user, and the number of resources applied by the user.
  • the authority group is a set of scope and degree of decision-making on a certain matter that the incumbent must have in order to ensure the effective performance of duties.
  • each decision tree Since each decision tree generates approval results based on a part of the features, that is, assigning multiple features to multiple decision trees for processing, the number of features that each decision tree needs to process is compared to the original total number of features in the use request. Less, so that it can handle high-dimensional (that is, with multiple features) usage requests without the need for feature selection in advance or feature dimensionality reduction. It can solve the approval problem of complex (multi-feature) user requests and improve Improve the accuracy and efficiency of approval of resource usage requests.
  • step S140 the approval result is predicted based on the value of the feature corresponding to each decision tree, where the approval result indicates whether to release the resource requested by the user.
  • the values of multiple features corresponding to the decision tree include: there is no identical historical use request, the user belongs to a high-authority group, the user rank is PY, the value of the resource requested by the user is C, The number of resources requested by the user is D, among which PY has a higher level than PX, C>A, D>B.
  • the input of the decision tree is the characteristic values of multiple features corresponding to the decision tree.
  • the decision tree first determines whether "the same historical usage request exists”. If the judgment result is no, then it enters "whether the user has high authority". Group?" node. At the "Does the user belong to the high-privilege group?" node, if the judgment result is yes, then the judgment of "Is the resource value greater than A?” is entered. At the node "Is the resource value greater than A?", the judgment result is yes, then the final approval result of this decision tree is to decide to release resources to the user.
  • the approval results are predicted based on the values of the features corresponding to each decision tree, including multiple decision trees generating the approval results in parallel.
  • multiple decision trees of a random forest can be made into a parallel method to make predictions independently, thereby increasing the speed of approval.
  • step S150 the approval results of each decision tree are integrated to determine whether to release the resources requested by the user.
  • Figure 3 shows a schematic diagram of a random forest model determining whether to release resources according to some embodiments of the present disclosure.
  • each decision tree generates an approval result based on some characteristics of the user's request for resource use.
  • the results are then jointly decided by multiple trees, and based on the majority voting mechanism, it is decided whether to release the resources requested by the user.
  • combining the approval results of each decision tree to determine whether to release the resources requested by the user includes: based on the proportion of the number of decision trees that generate the same approval result to the total number of decision trees, and the first preset Threshold to determine whether to release the resources requested by the user.
  • determining whether to release the resource requested by the user is determined based on the proportion of the number of decision trees that generate the same approval result to the total number of decision trees and the first preset threshold, including: When the ratio of the number of decision trees to the total number of decision trees exceeds the first preset threshold, it is determined whether to release the resources requested by the user based on the approval result.
  • the random forest algorithm predicts results by constructing a large number of independent decision trees.
  • the number of decision trees predicting the same result needs to reach a preset threshold before the result will be accepted.
  • the voting mechanism can be a one-vote veto system, a minority obeys the majority, Weighted majority, etc., the approval flow will automatically release resources and end the approval immediately.
  • the first preset threshold be 0.8
  • the random forest consists of n decision trees.
  • the approval result of m decision trees is "release resources to users.” then in In this case, the random forest determines the final result to issue the resources requested by the user.
  • determining whether to release the resource requested by the user is determined based on the proportion of the number of decision trees that generate the same approval result to the total number of decision trees and the first preset threshold, including: in the decision making that generates the same approval result When the ratio of the number of trees to the total number of decision trees does not exceed the first preset threshold, it is determined whether to release the resources requested by the user based on multiple characteristics.
  • the random forest cannot automatically determine whether the approval is passed, and the approval form will be transferred to other approval methods, such as the type of resources used by the approver according to the user's request. , the specifications of the resources requested by the user, the number of resources requested by the user, the user's resource usage rights and the reasons for the user's request to use the resources, etc., to determine whether to release the resources requested by the user.
  • This disclosure integrates the results of all decision trees to determine whether to release the resources requested by the user, reduces the impact of errors of a single decision tree on the final result, and improves the accuracy of approval.
  • Figure 4 illustrates a flowchart of resource approval according to some embodiments of the present disclosure.
  • the approval flow starts. At this time, the approval flow automatically enters the Listen state, waiting for the user to enter approval information at the front desk.
  • the user's request for resource usage also includes the user's historical usage request for the resource.
  • information such as the specifications of the resources that the user applies for use will be automatically obtained and filled in with the specifications previously selected by the user or the specifications commonly used by the user. The applicant only needs to add the scenarios and reasons for the requested resources.
  • the status of the approval flow Listen lasts for 30 minutes. If the user does not fill out and submit the approval form within 30 minutes, the approval form will be automatically closed.
  • the process engine After the user fills in the approval form information, the first node reached by the approval flow is the process engine.
  • a certain number of preset rules will be built into the Process engine. These rules support customization.
  • the administrator of the platform can customize which permission groups (such as which departments and positions) are suitable for the company according to the needs of the company's organizational structure. Which resource calls can be exempted from approval.
  • the process engine stipulates that testers can be exempted from approval when they apply for cloud hosts that exceed the specified specifications within a certain range for link stress testing.
  • Some rules are preset based on hard conditions such as security requirements, company rules and regulations, and management requirements.
  • the process engine uses these rules to directly filter out which usage requests need to enter other approval methods, such as for some cross-border requests.
  • these special resource usage requests are divided into other approval methods. The remaining resource usage requests enter the processing flow of the random forest algorithm.
  • the system will capture the relevant features of the applicant's usage request, such as the applicant's position, rank, responsible work and other factors that will affect the approval results. These features will be input into the decision tree as Basis for prediction. Finally, a random forest formed by multiple decision trees determines whether to pass the user's request.
  • the results of random forest predictions need to reach a certain threshold before they will be accepted. If the prediction results do not reach the threshold, it will not be possible to automatically determine whether the approval has been passed, and the approval form will be transferred to other approval methods. For resource usage requests that pass the approval, the resources will be automatically released to the user and the approval process will be completed. For resource usage requests that fail, the Listen state will be returned, waiting for the user to modify the information.
  • Resource usage requests that enter other approval methods will also have two statuses: passed and failed. For resource usage requests that pass the approval, the resources will be released to the user and the approval process will end. For resource usage requests that fail, the Listen state will be returned, waiting for the user to modify the information.
  • Figure 5 shows a flow chart of a training method of a random forest model according to some embodiments of the present disclosure.
  • the training method of the random forest model includes steps S210-S240.
  • the training method of the random forest model is executed by a training device of the random forest model.
  • step S210 a training set is obtained, where the training set includes samples of user requests for resource use. This also includes a label indicating whether to release the resource requested by the user.
  • a usage request sample includes a user-filled request for resource usage, as well as a label indicating whether to release the resource requested by the user.
  • step S220 multiple characteristics of the user's resource usage request are determined.
  • multiple features such as the user's position, rank, and responsible work are extracted as multiple features of the user's resource use request, and the values of these features are determined.
  • step S230 for each decision tree in the random forest model, some features are extracted from multiple features as candidate features of the decision tree.
  • a part of features are randomly extracted and used as candidate features for decision tree training.
  • the sample has Y features, and T (T ⁇ Y) features are randomly selected from all features of the sample as candidate features for a decision tree.
  • Random sampling with replacement makes the probability of each sample being drawn conform to a uniform distribution.
  • training each decision tree according to the value of the candidate feature of each decision tree and the label of the sample includes: for each decision tree, extracting multiple samples from the training set as training for the decision tree Set; train the decision tree based on the values of candidate features corresponding to the decision tree in the training set of the decision tree and the label indicating whether to release the resource requested by the user.
  • random sampling with replacement is also used to extract samples from the training set.
  • S (S ⁇ X) samples are randomly sampled from the data set with replacement as a training set for a decision tree.
  • the training set is divided into multiple subsets.
  • Each decision tree is constructed using a subset as a training set.
  • multiple trained decision trees form a forest.
  • rows and columns are randomly selected, which can truly randomly divide the entire data table into multiple parts, and use one part for each decision tree.
  • the number of decision trees is Enough, there is always a decision tree that can capture the value of the data set to the greatest extent, thereby improving the accuracy of resource approval of the random forest model.
  • step S240 each decision tree is trained according to the value of the candidate feature of each decision tree and the label of the sample.
  • the following describes the training method of a single decision tree.
  • training each decision tree according to the value of the candidate feature of each decision tree and the label of the sample includes: taking the root node of the decision tree as the current node, and selecting from the candidate features according to the training set. root node pair According to the characteristic value of the sample in the training set corresponding to the current node and the label of the sample, determine the training set corresponding to the child node of the current node; according to the training set corresponding to the child node of the current node Concentrate the value of the feature of the sample corresponding to the current node, as well as the label of the sample, and select the feature corresponding to the child node of the current node from the remaining candidate features; use the child node of the current child node as the current node, and loop to determine the child node of the current node.
  • the child nodes of the current node include a first child node of the current node and a second child node of the current node, according to the value of the feature corresponding to the current node according to the sample in the training set corresponding to the current node, and the value of the sample.
  • Label determines the training set corresponding to the child node of the current node, including: the value of the feature corresponding to the current node based on the sample in the training set corresponding to the current node, and the label of the sample, from the value of the feature corresponding to the current node Select the value of a feature in the range as the split point to divide the training set corresponding to the first sub-node of the current node and the training set corresponding to the second sub-node of the current node; according to the division and the first sub-node of the current node
  • the corresponding training set and the split point of the training set corresponding to the second child node of the current node are used to determine whether the samples in the training set corresponding to the current node are divided into the training set of the first child node or the training set of the second child node.
  • the S samples use the extracted T features to train the decision tree model.
  • the features corresponding to the root node first select the features corresponding to the root node.
  • CART classification and regression tree
  • CART is a binary tree that classifies infinitely downward from the origin. That is to say, its nodes have only two choices, 'yes' and 'no'. Through Continuously divide the feature space into a limited number of units, and determine the predicted probability distribution on these units.
  • the Gini coefficient represents the impurity of the model. The smaller the Gini coefficient, the lower the impurity, and the better the feature.
  • the purity of data set D can be measured by the Gini value. Assuming that there are K samples in the set, the calculation formula of the Gini coefficient is as follows:
  • Gini(D) reflects the probability that two samples randomly selected from the data set D have inconsistent class labels. Therefore, the smaller Gini(D), the higher the purity of data set D.
  • Calculate the Gini coefficient of each feature value of each existing feature of the current node to the data set D For example, first determine the value range of feature A based on the samples of the training set. Taking the current node as the root node as an example, at this time, the root node corresponds to The training set is D. From the value range of A, select the value a. According to whether the feature A takes the value a, the training set D is divided into two parts. For example, when the value of the feature A of the sample is a, then the value of the feature A is a. The sample is divided into the training set D 1 , otherwise, it is divided into the training set D 2.
  • the formula for calculating the Gini coefficient of the feature value A and the cut point a for the data set D is as follows:
  • Gini(D 1 ) represents the Gini coefficient of data set D1.
  • the decision tree can handle both continuous values and discrete values. For example, for continuous values, assuming that the continuous feature A of m samples has m values, arranged from small to large, then CART takes the average of two adjacent sample values as the dividing point. There are m-1 dividing points in total, and they are calculated separately. The Gini coefficient when these m-1 points are used as binary classification points. Select the point with the smallest Gini coefficient as the cutting point of the continuous feature. For example, the point with the smallest Gini coefficient is a, then the value less than a is category 1, and the value greater than a is category 2. This achieves the discretization of continuous features.
  • CART uses the cyclic bisection method. CART divides the value of feature A into three situations: (a 1 , a 2 a 3 ) or (a 1 a 2 , a 3 ) or (a 2 , a 1 a 3 ), and finds the combination with the smallest Gini coefficient, such as ( a 2 , a 1 a 3 ), and then establish a binary tree node. One node is the sample corresponding to a 2 , and the other node is the sample corresponding to a 1 and a 3 . Because the values of feature A are not completely separated this time.
  • the samples in the training set corresponding to the current node After determining the split point that divides the training set corresponding to the first sub-node of the current node and the training set corresponding to the second sub-node of the current node, determine the samples in the training set corresponding to the current node based on the split point Whether it is divided into the first sub-node or the second sub-node, thereby generating a training set of the first sub-node and the second sub-node.
  • the child node uses the child node as the current node, loop the above steps of determining the characteristics of the current node based on the training set of the current node, and determining the training set of the child node based on the characteristics of the current node, until the cutoff condition is reached, then return to the decision subtree, and the current node stops Recurse, and finally build the entire decision tree.
  • the interaction between different features can be measured. For example, if the training set in the same decision tree is split into two child nodes according to a certain feature M, and it is easier to split on feature J, then features M and J are interactive.
  • the cutoff conditions include that there are no remaining candidate features, the number of samples in the training set corresponding to the current node is less than a second preset threshold, and the Gini coefficient of the training set corresponding to the current node is less than a third preset threshold. value of at least one.
  • the decision tree subtree is returned and the current node stops recursing.
  • This disclosure automatically determines the importance of features in a user's resource usage request based on the Gini coefficient. In addition, it can measure the interactivity between different features, build a decision tree and generate resource approval results without the need for dimensionality reduction or feature selection, which improves Accuracy and efficiency in approving resource usage requests.
  • multiple decision trees are constructed to finally form a random forest model.
  • determining multiple characteristics of the user's request for resource use includes, in the case where a value of a characteristic of a sample of the user's request for resource use is missing, calculating a path of the sample and other samples through the node in the decision tree. The similarity of the sample and the path through the node in the decision tree between the sample and other samples is used to determine the value of the missing feature of the sample.
  • the missing values in the sample For example, first, preset some estimates for the missing values in the sample. For numeric variables, select the median or mode of the remaining data as the estimate of the current missing value. If it is a numeric variable, a new estimate is obtained through a weighted average. Then, based on the estimated values, build a random forest and put all the data into the random forest and run it again. Record the step-by-step classification path of each group of data in the decision tree, determine which group of data is most similar to the missing data path, and introduce a similarity matrix to record the similarity between the data. For example, if there are N groups of data, the similarity matrix The size is N*N. If the missing value is a categorical variable, a new estimated value is obtained through weighted voting, and so on until a stable estimated value is obtained.
  • the filled data is random and uncertain, and can better reflect the true distribution of these unknown data.
  • each node uses random partial features instead of all the features of the training set, it can be well applied to filling high-dimensional data. Therefore, the present disclosure can reduce the interference of missing values on resource approval and improve the accuracy of approval of resource use requests.
  • training each decision tree includes pruning the decision tree based on the values of the candidate features of each decision tree and the labels of the samples.
  • Figure 6 shows a schematic diagram of pruning a decision tree according to some embodiments of the present disclosure.
  • the post-pruning method is used, that is, a decision tree is first generated, and then all pruned CART trees are generated based on the generated decision trees, and then cross-validation is used to test the effect of pruning, and general The pruning strategy with the best performance.
  • the loss function of the subtree T t is:
  • the loss function of the root node is as follows:
  • is the regularization parameter (the same as the regularization of linear regression)
  • C(T t ) is the prediction error of the verification data (that is, the Gini coefficient of the verification data)
  • T t is the number of leaf nodes of the subtree T.
  • each decision tree is trained based on the values of the candidate features of each decision tree and the labels of the samples, including training multiple decision trees in parallel. For example, training multiple decision trees of a random forest in parallel and independently can improve the training speed of the random forest model.
  • Figure 7 shows a block diagram of a resource approval device according to some embodiments of the present disclosure.
  • the resource approval device 7 includes an acquisition module 71 , a first determination module 72 , a selection module 73 , a prediction module 74 , and a second determination module 75 .
  • the obtaining module 71 is configured to obtain the user's request for resource use, for example, performing step S110 as shown in Figure 1 .
  • the first determination module 72 is configured to determine multiple characteristics of the user's resource usage request, for example, perform step S120 as shown in FIG. 1 .
  • the selection module 73 is configured to, for each decision tree in the random forest model, select the feature corresponding to the decision tree from multiple features, for example, perform step S130 as shown in FIG. 1 .
  • the prediction module 74 is configured to predict the approval result according to the value of the feature corresponding to each decision tree, where the approval result indicates whether to release the resource requested by the user, for example, perform step S140 as shown in FIG. 1 .
  • the second determination module 75 is configured to synthesize the approval results of each decision tree and determine whether to release the resources requested by the user, for example, performing step S150 as shown in FIG. 1 .
  • Figure 8 shows a block diagram of a training device for a random forest model according to some embodiments of the present disclosure.
  • the training device of the random forest model includes an acquisition module 81, a determination module 82, an extraction module 83, and a training module 84.
  • the acquisition module 81 is configured to acquire a training set, where the training set includes samples of user requests for resource use, and the samples also include labels indicating whether to issue the resources requested by the user. For example, step S210 shown in Figure 5 is performed.
  • the determination module 82 is configured to determine multiple characteristics of the user's resource usage request, for example, perform step S220 as shown in FIG. 5 .
  • the extraction module 83 is configured to, for each decision tree in the random forest model, extract some features from multiple features as candidate features of the decision tree, for example, perform step S230 as shown in FIG. 5 .
  • the training module 84 is configured to train each decision tree according to the value of the candidate feature of each decision tree and the label of the sample, for example, perform step S240 as shown in FIG. 5 .
  • Figure 9 shows a block diagram of an electronic device according to other embodiments of the present disclosure.
  • the electronic device 9 includes a memory 91; and a processor 92 coupled to the memory 91.
  • the memory 91 is used to store instructions for executing corresponding embodiments of the resource approval method or the training method of the random forest model.
  • the processor 92 is configured to execute the resource approval method or the random forest model training method in any embodiment of the present disclosure based on instructions stored in the memory 91 .
  • Figure 10 illustrates a block diagram of a computer system for implementing some embodiments of the present disclosure.
  • Computer system 100 may be embodied in the form of a general purpose computing device.
  • Computer system 100 includes memory 1010, a processor 1020, and a bus 1000 that connects various system components.
  • the memory 1010 may include, for example, system memory, non-volatile storage media, and the like.
  • System memory stores, for example, operating systems, applications, boot loaders, and other programs.
  • System memory may include volatile storage media such as random access memory (RAM) and/or cache memory.
  • RAM random access memory
  • the non-volatile storage medium stores, for example, instructions for executing corresponding embodiments of at least one of the resource approval methods or the random forest model training methods in any embodiments of the present disclosure.
  • Non-volatile storage media includes but is not limited to disk storage, optical storage, flash memory, etc.
  • the processor 1020 may be implemented as a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gates or transistors and other discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • each module such as the judgment module and the determination module, can be implemented by instructions executing corresponding steps in a central processing unit (CPU) running memory, or by dedicated circuits executing corresponding steps.
  • Bus 1000 may use any of a variety of bus structures.
  • bus structures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, and Peripheral Component Interconnect (PCI) bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • PCI Peripheral Component Interconnect
  • the computer system 100 may also include an input/output interface 1030, a network interface 1040, a storage interface 1050, and the like. These interfaces 1030, 1040, 1050, the memory 1010 and the processor 1020 may be connected through a bus 1000.
  • the input and output interface 1030 can provide a connection interface for input and output devices such as a monitor, mouse, and keyboard.
  • Network interface 1040 provides connection interfaces for various networked devices.
  • the storage interface 1050 provides a connection interface for external storage devices such as floppy disks, USB disks, and SD cards.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable device to produce a machine, such that execution of the instructions by the processor produces implementations in one or more blocks of the flowcharts and/or block diagrams.
  • a device with specified functions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable device to produce a machine, such that execution of the instructions by the processor produces implementations in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions which may also be stored in computer-readable memory, cause the computer to operate in a specific manner to produce an article of manufacture, including implementing the functions specified in one or more blocks of the flowcharts and/or block diagrams. instructions.
  • the disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente divulgation se rapporte au domaine technique de l'informatique en nuage, et concerne un procédé et un dispositif d'examen et d'approbation de ressources, et un procédé et un dispositif d'apprentissage de modèle de forêt aléatoire. Le procédé d'examen et d'approbation de ressources comprend : l'acquisition d'une requête d'utilisation de ressources d'un utilisateur ; la détermination d'une pluralité de caractéristiques de la requête d'utilisation de ressources de l'utilisateur ; pour chaque arbre de décision dans un modèle de forêt aléatoire, la sélection, parmi la pluralité de caractéristiques, d'une caractéristique correspondant à l'arbre de décision ; la prédiction d'un résultat d'examen et d'approbation en fonction de la valeur de la caractéristique correspondant à chaque arbre de décision, le résultat d'examen et d'approbation représentant si une ressource requise par l'utilisateur est émise ; et compte tenu des résultats d'examen et d'approbation des arbres de décision, la détermination quant à savoir s'il faut émettre la ressource requise par l'utilisateur.
PCT/CN2023/074133 2022-07-29 2023-02-01 Procédé et dispositif d'examen et d'approbation de ressources, et procédé et dispositif d'apprentissage de modèle de forêt aléatoire WO2024021555A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210905742.3A CN115147092A (zh) 2022-07-29 2022-07-29 资源审批方法、随机森林模型的训练方法及装置
CN202210905742.3 2022-07-29

Publications (1)

Publication Number Publication Date
WO2024021555A1 true WO2024021555A1 (fr) 2024-02-01

Family

ID=83413509

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/074133 WO2024021555A1 (fr) 2022-07-29 2023-02-01 Procédé et dispositif d'examen et d'approbation de ressources, et procédé et dispositif d'apprentissage de modèle de forêt aléatoire

Country Status (2)

Country Link
CN (1) CN115147092A (fr)
WO (1) WO2024021555A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147092A (zh) * 2022-07-29 2022-10-04 京东科技信息技术有限公司 资源审批方法、随机森林模型的训练方法及装置
CN115616204A (zh) * 2022-12-21 2023-01-17 金发科技股份有限公司 一种聚对苯二甲酸乙二醇酯再生料的鉴别方法及系统
CN116739719B (zh) * 2023-08-14 2023-11-03 南京大数据集团有限公司 一种交易平台的流程配置系统及方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264342A (zh) * 2019-06-19 2019-09-20 深圳前海微众银行股份有限公司 一种基于机器学习的业务审核方法及装置
CN111709828A (zh) * 2020-06-12 2020-09-25 中国建设银行股份有限公司 一种资源处理方法、装置、设备及系统
WO2021077011A1 (fr) * 2019-10-18 2021-04-22 Solstice Initiative, Inc. Systèmes et procédés pour accessibilité de service public partagé
CN113505936A (zh) * 2021-07-26 2021-10-15 平安信托有限责任公司 项目审批结果的预测方法、装置、设备及存储介质
CN115147092A (zh) * 2022-07-29 2022-10-04 京东科技信息技术有限公司 资源审批方法、随机森林模型的训练方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264342A (zh) * 2019-06-19 2019-09-20 深圳前海微众银行股份有限公司 一种基于机器学习的业务审核方法及装置
WO2021077011A1 (fr) * 2019-10-18 2021-04-22 Solstice Initiative, Inc. Systèmes et procédés pour accessibilité de service public partagé
CN111709828A (zh) * 2020-06-12 2020-09-25 中国建设银行股份有限公司 一种资源处理方法、装置、设备及系统
CN113505936A (zh) * 2021-07-26 2021-10-15 平安信托有限责任公司 项目审批结果的预测方法、装置、设备及存储介质
CN115147092A (zh) * 2022-07-29 2022-10-04 京东科技信息技术有限公司 资源审批方法、随机森林模型的训练方法及装置

Also Published As

Publication number Publication date
CN115147092A (zh) 2022-10-04

Similar Documents

Publication Publication Date Title
WO2024021555A1 (fr) Procédé et dispositif d'examen et d'approbation de ressources, et procédé et dispositif d'apprentissage de modèle de forêt aléatoire
US20230126005A1 (en) Consistent filtering of machine learning data
JP6771751B2 (ja) リスク評価方法およびシステム
US20200050968A1 (en) Interactive interfaces for machine learning model evaluations
US11182691B1 (en) Category-based sampling of machine learning data
EP3161635B1 (fr) Service d'apprentissage machine
WO2019218699A1 (fr) Procédé et appareil de détermination de transaction frauduleuse, dispositif informatique et support d'informations
US20210035026A1 (en) Diagnosing & triaging performance issues in large-scale services
US20150379072A1 (en) Input processing for machine learning
US10891325B2 (en) Defect record classification
WO2023056723A1 (fr) Procédé et appareil de diagnostic de défaillance, dispositif électronique et support de stockage
US11567735B1 (en) Systems and methods for integration of multiple programming languages within a pipelined search query
US20210374163A1 (en) Scanning for information according to scan objectives
AU2021353828B2 (en) System and method of operationalizing automated feature engineering
Perkins et al. Practical data science for actuarial tasks
US11500840B2 (en) Contrasting document-embedded structured data and generating summaries thereof
Thurow et al. Imputing missings in official statistics for general tasks–our vote for distributional accuracy
US11715037B2 (en) Validation of AI models using holdout sets
CN112100165A (zh) 基于质量评估的交通数据处理方法、系统、设备和介质
WO2024093253A1 (fr) Procédé d'échantillonnage de données et dispositif associé
US20230010147A1 (en) Automated determination of accurate data schema
TWI755702B (zh) 測試真實資料之方法及電腦可讀媒介
Ye An Evaluation on Using Coarse-grained Events in an Event Sourcing Context and its Effects Compared to Fine-grained Events
CN112199603B (zh) 基于对抗网络的信息推送方法、装置及计算机设备
US20220413727A1 (en) Quality-performance optimized identification of duplicate data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23844774

Country of ref document: EP

Kind code of ref document: A1