CN111275547B - Wind control system and method based on isolated forest - Google Patents

Wind control system and method based on isolated forest Download PDF

Info

Publication number
CN111275547B
CN111275547B CN202010196415.6A CN202010196415A CN111275547B CN 111275547 B CN111275547 B CN 111275547B CN 202010196415 A CN202010196415 A CN 202010196415A CN 111275547 B CN111275547 B CN 111275547B
Authority
CN
China
Prior art keywords
data
abnormal
calculating
proportion
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010196415.6A
Other languages
Chinese (zh)
Other versions
CN111275547A (en
Inventor
毕艳亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Fumin Bank Co Ltd
Original Assignee
Chongqing Fumin Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Fumin Bank Co Ltd filed Critical Chongqing Fumin Bank Co Ltd
Priority to CN202010196415.6A priority Critical patent/CN111275547B/en
Publication of CN111275547A publication Critical patent/CN111275547A/en
Application granted granted Critical
Publication of CN111275547B publication Critical patent/CN111275547B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention relates to the technical field of computers, in particular to an air control system and method based on an isolated forest, comprising the following steps: s1, collecting historical data of a client, and determining effective characteristics of a data source according to the historical data; s2, evaluating rejection proportion of the data source characteristics according to the passing rate of the current project and the number of the used data sources; s3, putting the effective characteristic values into an isolated forest algorithm, determining the proportion of abnormal points by combining the rejection proportion, and dividing an abnormal sample and a normal sample; s4, making the abnormal sample and the normal sample into a box diagram, and making a strategy meeting the expected rejection proportion according to the distribution condition of the box diagram. The invention has the advantages that: first, the pass rate of a single data source with respect to an attribute policy can be quantitatively determined. Secondly, any new item can adjust the strategy without waiting until the performance exists, and the strategy iteration period can be compressed.

Description

Wind control system and method based on isolated forest
Technical Field
The invention relates to the technical field of computers, in particular to a wind control system and method based on an isolated forest.
Background
Risk control refers to various measures and methods taken by a manager to eliminate or reduce various possibilities of occurrence of risk events or to reduce losses caused by occurrence of risk events. For banks, the main risk is also a credit risk, where the loan risk is the main content. At present, most banks adopt a T+1 wind control mode, namely daily transaction data is stored in a database, and a wind control system extracts transaction details to perform risk assessment after going to work every day. The method cannot control risk of each transaction in real time, and cannot effectively detect abnormal data.
Document CN109345137a discloses an outlier detection method based on agricultural big data, comprising: a data acquisition step of acquiring agricultural production data, agricultural soil data and agricultural meteorological resource data and integrating the data into a training data set; the step of building an iTree tree, namely selecting m sample points from a training data set, and continuously and randomly selecting splitting attributes and splitting points until a termination condition is reached; a step of constructing an isolated forest algorithm model, which is to initialize the number t of the iTree trees in the isolated forest and a sub-sample set m extracted during the construction of the iTree trees, enter a step of circularly constructing the iTree trees, construct the independent iTree trees, and form an isolated forest algorithm model by the aggregation of all the iTree trees; and an abnormal value judging step of calculating an abnormal value s (x), and judging whether the test data x is an abnormal value or not by the abnormal value s (x). The invention applies the isolated forest algorithm model to the abnormal value detection of the agricultural big data, and can effectively improve the detection effect of the abnormal value of the agricultural big data.
The core of financial wind control is mainly to determine whether to pass or not according to specific characteristics of users. There are two cases in the past process: first, for newly started projects, the strategies used by other projects are referenced. However, in many cases, the new project differs significantly from the previous project, resulting in a risk due to the inapplicability of the new project strategy (including both cases, too low threshold, too many users are rejected, and too high threshold, too few users are rejected). Secondly, for new data source use, due to lack of knowledge of the data source and matching of items to be added, it is difficult to formulate a reasonable strategy.
Disclosure of Invention
The invention provides a wind control method based on an isolated forest, which solves the technical problems that strategies used by other projects are not suitable for new projects and reasonable strategies are difficult to formulate for new data source use.
The basic scheme provided by the invention is as follows: the wind control method based on the isolated forest comprises the following steps: s1, collecting historical data of a client, and determining effective characteristics of a data source according to the historical data; s2, evaluating rejection proportion of the data source characteristics according to the passing rate of the current project and the number of the used data sources; s3, putting the effective characteristic values into an isolated forest algorithm, determining the proportion of abnormal points by combining the rejection proportion, and dividing an abnormal sample and a normal sample; s4, making the abnormal sample and the normal sample into a box diagram, and making a strategy meeting the expected rejection proportion according to the distribution condition of the box diagram.
The working principle of the invention is as follows: firstly, putting the effective characteristic values into an isolated forest algorithm, determining the proportion of abnormal points by combining the rejection proportion, and dividing an abnormal sample and a normal sample. And then the abnormal sample and the normal sample are made into a box diagram, and a strategy conforming to the expected rejection proportion is formulated according to the distribution condition of the box diagram. The invention has the advantages that: first, the pass rate of a single data source with respect to an attribute policy can be quantitatively determined. Secondly, any new item can adjust the strategy without waiting until the performance exists, and the strategy iteration period can be compressed. Thirdly, a problematic data source can be found in time, for example, the data source can be determined to be invalid if the distribution of the related characteristics of a certain data source is not different between an abnormal group and a normal group.
The invention can quantitatively determine the passing rate of a single data source related to the attribute strategy according to the distribution condition of the box diagram. Any new project can adjust the strategy without waiting until the performance exists, can compress the strategy iteration period, and can timely find out the data source with problems.
Further, step S1 includes: s11, extracting historical data of a client; s12, extracting corresponding characteristics of the client data; s13, the extracted characteristic information is dataized, and clustering operation is carried out; s14, calculating the spatial position distance between the clustering center point and other points; and S15, the calculated distance is presented as two-dimensional data, and a point far from the origin of coordinates is given a corresponding larger weight fraction. The beneficial effects are that: therefore, the actual data volume in the abnormality detection process is reduced, the calculation resources are saved, and the abnormality detection efficiency is improved. Meanwhile, under the step of feature extraction data analysis, the method can solve some overfitting problems in anomaly detection, and the robustness of an anomaly detection algorithm is enhanced.
Further, step S3 includes: s31, training a single tree; s32, integrating the results of all the isolated trees; s33, calculating an abnormal score S (x), and judging whether the test data x is an abnormal value or not according to the abnormal score S (x). The beneficial effects are that: the orphan forest algorithm has linear time complexity and is an ensable method and therefore can be used on top of datasets containing massive amounts of data. Generally, the greater the number of trees, the more stable the algorithm. Since each tree is generated independently of the other, it can be deployed on a large-scale distributed system to accelerate operations.
Further, step S31 includes: s31a, randomly selecting psi points from training data to serve as subsamples, and putting the subsamples into a root node of an isolated tree; s31b, randomly designating a dimension, and randomly generating a cutting point P in the current node data range; s31c, placing a point smaller than P in the currently selected dimension on the left branch of the current node, and placing a point larger than or equal to P on the right branch of the current node; s31d, recursively constructing new leaf nodes at the left branch and the right branch of the node until only one data or tree grows to the set height. The beneficial effects are that: thus, the covering and inundation effect of the abnormality can be reduced when the abnormal value is detected; in addition, due to the linear time complexity, the distance or density is not required to be calculated to find the abnormal data, and the high-dimensional data and the mass data can be effectively processed.
Further, step S32 includes: s32a, repeatedly cutting from the beginning; s32b, then calculates the average value of the result of each cut. The beneficial effects are that: since the cutting process is completely random, an ensable method is required to converge the results, i.e., repeatedly starting the cut from scratch.
Further, step S33 includes: s33a, calculating the depth h (x) of each tree; s33b, calculating the average depth E (h (x)) of all the trees; s33c, calculating abnormal scores; s33d, judging whether the test data x is an abnormal value or not according to the abnormal score S (x) of the test data x. The beneficial effects are that: if the anomaly score is close to 1, it must be an anomaly point; if the anomaly score is much less than 0.5, then it must not be an outlier; if the score of all points of the outlier score is around 0.5, there is a high probability that outliers are not present in the sample. Whether the test data is an abnormal value can be intuitively judged according to the abnormal score.
The invention also discloses a wind control system based on the isolated forest, which comprises: the acquisition module is used for acquiring historical data of the clients and determining effective characteristics of the data sources according to the historical data; the evaluation module is used for evaluating rejection proportion of the data source characteristics according to the passing rate of the current project and the number of the used data sources; the algorithm module is used for putting the effective characteristic value into an isolated forest algorithm, determining the proportion of the abnormal points by combining the rejection proportion and dividing the abnormal samples and the normal samples; the decision module is used for making the abnormal sample and the normal sample into a box diagram, and making a strategy which accords with the expected rejection proportion according to the distribution condition of the box diagram.
The invention determines the passing rate of the attribute strategy related to a single data source according to the distribution condition of the box diagram, and can compress the strategy iteration period.
Further, the algorithm module includes: the training unit is used for training a single tree; an integrating unit for integrating the results of all the isolated trees; and the judging unit is used for calculating the abnormal score s (x) and judging whether the test data x is an abnormal value or not through the abnormal score s (x). The beneficial effects are that: each tree is independent of the other and can be deployed on a large-scale distributed system to accelerate operations.
Further, the step of training the single tree by the training unit comprises: s51, randomly selecting psi points from training data to serve as subsamples, and putting the subsamples into a root node of an isolated tree; s52, randomly designating a dimension, and randomly generating a cutting point P in the current node data range; s53, placing a point smaller than P in the current selected dimension on the left branch of the current node, and placing a point larger than or equal to P on the right branch of the current node; s54, recursively constructing new leaf nodes at the left branch node and the right branch node of the node until only one data or tree grows to the set height. The beneficial effects are that: the method not only can reduce the covering and inundation effects of the anomaly during the detection of the anomaly value, but also can effectively process high-dimensional data and mass data.
Further, the step of the judging unit calculating the abnormality score s (x) and judging whether the test data x is an abnormal value by the abnormality score s (x) includes: s61, calculating the depth h (x) of each tree; s62, calculating the average depth E (h (x)) of all the trees; s63, calculating abnormal scores; s64, judging whether the test data x is an abnormal value or not according to the abnormal score S (x) of the test data x. The beneficial effects are that: whether the test data is an abnormal value can be intuitively judged according to the abnormal score.
Drawings
FIG. 1 is a flow chart of an embodiment of an isolated forest based wind control method of the present invention.
Detailed Description
The following is a further detailed description of the embodiments:
example 1
The embodiment of the wind control method based on the isolated forest is basically as shown in the accompanying figure 1: the method comprises the following steps: s1, collecting historical data of a client, and determining effective characteristics of a data source according to the historical data; s2, evaluating rejection proportion of the data source characteristics according to the passing rate of the current project and the number of the used data sources; s3, putting the effective characteristic values into an isolated forest algorithm, determining the proportion of abnormal points by combining the rejection proportion, and dividing an abnormal sample and a normal sample; s4, making the abnormal sample and the normal sample into a box diagram, and making a strategy meeting the expected rejection proportion according to the distribution condition of the box diagram.
When a customer loans to a bank, the bank can audit the repayment capacity of the borrower, the repayment record of the borrower, the repayment willingness of the borrower, the profitability of the loan item, the guarantee of the loan, the legal responsibility of loan repayment and the like, and the customer can also provide the materials.
First, it is determined which fields of the data source are valid to be used. There are two ways: first, it can be determined through past experience which fields of a data source are valid. For example, past experience has shown that: the record of borrower repayment shows the on-schedule repayment without delay repayment or account-relying situation; meanwhile, the borrower loan has good project profit capability and stable fund flow; such borrowers are always able to pay on schedule without risk of overdue. Experience has shown that it is effective to judge the credit of the customer with the "borrower payoff record" and "loan item profitability". The second way is to use a clustering algorithm. First, historical data of the client is extracted, such as credit data of five types of clients, namely normal, concerned, secondary, suspicious and lost in the past. And secondly, extracting corresponding characteristics of the client data, such as repayment records, repayment capacity, fund flows and the like. Thirdly, the extracted characteristic information is dataized, and clustering operation is carried out. And fourthly, calculating the space position distance between the clustering center point and other points. And fifthly, presenting the calculated distance as two-dimensional data, and giving a corresponding larger weight score to a point far from the origin of coordinates.
Then, the rejection rate of the data source is determined according to the required passing rate of the project and the number of data sources to be used. For example, a loan program requires eighty percent of the rate of passage, and there are only three sources of data that can be strategically placed. Twenty percent of rejection can be roughly allocated to three data sources, and the allocation is performed according to the quality of the data sources. Such as: the first data source has the highest quality, and ten percent of the first data source is rejected; the second data source is of the next highest quality, rejecting six percent of it; the third data source is of low quality, rejecting four percent of it. In this way, the rejection of the individual data sources can be initially determined.
Then, extracting effective fields of the data source, filling the fields with null values, and putting the fields into an orphan forest algorithm. Training a single tree: step one, selecting psi points from training data randomly as subsamples, and putting the subsamples into a root node of an isolated tree. Such as: five points are selected as sub-samples, namely, the repayment capability of the borrower, the repayment record of the borrower, the repayment willingness of the borrower, the profitability of the loan item and the guarantee of the loan. And secondly, randomly designating a dimension, and randomly generating a cutting point P in the current node data range. Such as: the specified dimension is four. And thirdly, placing a point smaller than P in the currently selected dimension on the left branch of the current node, and placing a point larger than or equal to P on the right branch of the current node. Such as: the four left branches of the borrower, the repayment record of the borrower, the repayment willingness of the borrower and the profitability of the loan item are placed on the current node, and the guarantee of the loan is placed on the right branch of the current node. And fourthly, recursively constructing new leaf nodes at the left branch node and the right branch node of the node until only one data or tree grows to the set height. Such as: the three of the repayment capability of the borrower, the repayment record of the borrower and the repayment willingness of the borrower are placed on the left branch of the current node, and the profitability of the loan item is placed on the right branch of the current node. … borrower's repayment ability, borrower's repayment record, these two left branches of putting at current node, borrower's repayment willingness put the right branch at current node. … repeatedly starting cutting from the beginning according to the above steps, integrating the results of all the isolated trees, and then calculating the average value of the cutting results each time. Then, an anomaly score s (x) is calculated, and it is judged whether the test data x is an anomaly value or not by the anomaly score s (x). Step one, the depth h (x) on each tree is calculated. According to the isolated forest algorithm, the guarantee depth of the loan is 4, the profit capability depth of the loan item is 3, the repayment willingness depth of the borrower is 2, and the repayment capability of the borrower and the repayment record depth of the borrower are 1. Step two, the average depth E (h (x)) of all trees is calculated. It can be seen that the average depth is (4+3+2+1×2)/5=2.2. And thirdly, calculating abnormal scores, wherein a calculation formula can refer to the prior art of an isolated forest algorithm. And step four, judging whether the test data x is an abnormal value or not through the abnormal score s (x) of the test data x. Such as: the guarantee score of the loan is 0.85, (close to 1), and then the guarantee data of the loan is an abnormal value; the profitability score for a loan term is 0.05, (much less than 0.5), then the profitability data for the loan term is not outlier. And finally, determining the abnormal point duty ratio according to the rejection ratio, and dividing the sample into an abnormal sample and a normal sample.
And then, respectively drawing a box diagram for each characteristic of the abnormal sample and the normal sample, and determining a policy threshold value with reasonable properties according to the difference condition of the box diagram. If the difference of the box diagram of each attribute of a certain data source is small, the data source can be judged to be invalid for the item, and the line should be cut off at the time. Such as: the normal population has a quarter locus of 4 and a three quarter locus of 9; the normal distribution of this data is between 0 and 16.5 according to the theory of boxplot correlation. The abnormal population had 13 quarters and 20 quarters; according to the theory of box diagram correlation, the normal distribution of the abnormal population is 0 to 30, so the attribute threshold can be set to 17 (positive and negative 1 fluctuation). Eventually bringing the overall rejection rate in a predetermined interval.
And returning to the box line drawing step when the query quantity of the data source is about twenty thousands, and carrying out the same analysis on the data source to determine each strategy threshold value. And so on, the iteration and correction of the whole strategy are realized rapidly, the strategy can be adjusted without waiting until the performance exists, and meanwhile, the effectiveness of the strategy can be ensured.
Example 2
On the basis of the embodiment 1, a wind control system based on an isolated forest is also disclosed, which comprises: the acquisition module is used for acquiring historical data of the clients and determining effective characteristics of the data sources according to the historical data; the evaluation module is used for evaluating rejection proportion of the data source characteristics according to the passing rate of the current project and the number of the used data sources; the algorithm module is used for putting the effective characteristic value into an isolated forest algorithm, determining the proportion of the abnormal points by combining the rejection proportion and dividing the abnormal samples and the normal samples; the decision module is used for making the abnormal sample and the normal sample into a box diagram, and making a strategy which accords with the expected rejection proportion according to the distribution condition of the box diagram. And determining the passing rate of a single data source related to the attribute strategy according to the distribution condition of the box diagram, and compressing the strategy iteration period.
The algorithm module comprises: the training unit is used for training a single tree; an integrating unit for integrating the results of all the isolated trees; and the judging unit is used for calculating the abnormal score s (x) and judging whether the test data x is an abnormal value or not through the abnormal score s (x). Each tree is independent of the other and can be deployed on a large-scale distributed system to accelerate operations.
The step of training the single tree by the training unit comprises: s51, randomly selecting psi points from training data to serve as subsamples, and putting the subsamples into a root node of an isolated tree; s52, randomly designating a dimension, and randomly generating a cutting point P in the current node data range; s53, placing a point smaller than P in the current selected dimension on the left branch of the current node, and placing a point larger than or equal to P on the right branch of the current node; s54, recursively constructing new leaf nodes at the left branch node and the right branch node of the node until only one data or tree grows to the set height. Therefore, the method not only can reduce the covering and inundation effects of the abnormality when detecting the abnormal value, but also can effectively process high-dimensional data and mass data.
The step of the judging unit calculating the abnormality score s (x) and judging whether the test data x is an abnormal value by the abnormality score s (x) includes: s61, calculating the depth h (x) of each tree; s62, calculating the average depth E (h (x)) of all the trees; s63, calculating abnormal scores; s64, judging whether the test data x is an abnormal value or not according to the abnormal score S (x) of the test data x. Whether the test data is an abnormal value can be intuitively judged according to the abnormal score.
Example 3
On the basis of embodiment 2, when a customer loans to a bank, the customer submits various data that needs to be audited. And when the data such as the repayment capacity of the borrower, the repayment record of the borrower, the repayment willingness of the borrower, the profitability of loan items and the like are extracted, the repeated operation of the user is also acquired.
After the repeated operation of the user is collected, counting the number of repeated operation, and judging whether the number of repeated operation reaches a preset operation threshold. When the number of repeated operations does not reach the preset operation threshold, processing is performed according to example 1; and when the number of repeated operations reaches or exceeds a preset operation threshold, performing manual auditing. Thus, delay or hysteresis of the user can be effectively reduced, and experience efficiency is improved.
The foregoing is merely an embodiment of the present invention, and a specific structure and characteristics of common knowledge in the art, which are well known in the scheme, are not described herein, so that a person of ordinary skill in the art knows all the prior art in the application day or before the priority date of the present invention, and can know all the prior art in the field, and have the capability of applying the conventional experimental means before the date, so that a person of ordinary skill in the art can complete and implement the present embodiment in combination with his own capability in the light of the present application, and some typical known structures or known methods should not be an obstacle for a person of ordinary skill in the art to implement the present application. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent.

Claims (5)

1. The wind control method based on the isolated forest is characterized by comprising the following steps of: the method comprises the following steps:
s1, collecting historical data of a client, and determining effective characteristics of a data source according to the historical data;
s2, evaluating rejection proportion of the data source characteristics according to the passing rate of the current project and the number of the used data sources;
s3, putting the effective characteristic values into an isolated forest algorithm, determining the proportion of abnormal points by combining the rejection proportion, and dividing an abnormal sample and a normal sample;
s4, making the abnormal sample and the normal sample into a box diagram, and making a strategy conforming to the expected rejection proportion according to the distribution condition of the box diagram;
wherein, step S1 includes:
s11, extracting historical data of a client;
s12, extracting corresponding characteristics of the client data;
s13, the extracted characteristic information is dataized, and clustering operation is carried out;
s14, calculating the spatial position distance between the clustering center point and other points;
s15, the calculated distance is presented as two-dimensional data, and a point far from the origin of coordinates is given a corresponding larger weight fraction;
wherein, step S3 includes:
s31, training a single tree;
s32, integrating the results of all the isolated trees;
s33, calculating an abnormal score S (x), and judging whether the test data x is an abnormal value or not according to the abnormal score S (x);
wherein, step S31 includes:
s31a, randomly selecting psi points from training data to serve as subsamples, and putting the subsamples into a root node of an isolated tree;
s31b, randomly designating a dimension, and randomly generating a cutting point P in the current node data range;
s31c, placing a point smaller than P in the currently selected dimension on the left branch of the current node, and placing a point larger than or equal to P on the right branch of the current node; s31d, recursing steps S31b and S31c on the left branch and the right branch of the node, and continuously constructing new leaf nodes until only one data or tree grows to the set height on the leaf nodes;
wherein, step S32 includes:
s32a, repeatedly cutting from the beginning;
s32b, calculating an average value of each segmentation result;
wherein, step S33 includes:
s33a, calculating the depth h (x) of each tree;
s33b, calculating the average depth E (h (x)) of all the trees;
s33c, calculating abnormal scores;
s33d, judging whether the test data x is an abnormal value or not according to the abnormal score S (x) of the test data x.
2. Wind control system based on isolated forest, its characterized in that: comprising the following steps: the acquisition module is used for acquiring historical data of the clients and determining effective characteristics of the data sources according to the historical data; the evaluation module is used for evaluating rejection proportion of the data source characteristics according to the passing rate of the current project and the number of the used data sources; the algorithm module is used for putting the effective characteristic value into an isolated forest algorithm, determining the proportion of the abnormal points by combining the rejection proportion and dividing the abnormal samples and the normal samples; the decision module is used for making the abnormal sample and the normal sample into a box diagram, and making a strategy which accords with the expected rejection proportion according to the distribution condition of the box diagram.
3. The isolated forest based wind control system of claim 2, wherein: the algorithm module comprises: the training unit is used for training a single tree; an integrating unit for integrating the results of all the isolated trees; and the judging unit is used for calculating the abnormal score s (x) and judging whether the test data x is an abnormal value or not through the abnormal score s (x).
4. A stand alone forest based wind control system as claimed in claim 3 wherein: the step of training the single tree by the training unit comprises: s51, randomly selecting psi points from training data to serve as subsamples, and putting the subsamples into a root node of an isolated tree; s52, randomly designating a dimension, and randomly generating a cutting point P in the current node data range; s53, placing a point smaller than P in the current selected dimension on the left branch of the current node, and placing a point larger than or equal to P on the right branch of the current node; s54, recursively constructing new leaf nodes at the left branch node and the right branch node of the node until only one data or tree grows to the set height.
5. The isolated forest based wind control system of claim 4, wherein: the step of the judging unit calculating the abnormality score s (x) and judging whether the test data x is an abnormal value by the abnormality score s (x) includes: s61, calculating the depth h (x) of each tree; s62, calculating the average depth E (h (x)) of all the trees; s63, calculating abnormal scores; s64, judging whether the test data x is an abnormal value or not according to the abnormal score S (x) of the test data x.
CN202010196415.6A 2020-03-19 2020-03-19 Wind control system and method based on isolated forest Active CN111275547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010196415.6A CN111275547B (en) 2020-03-19 2020-03-19 Wind control system and method based on isolated forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010196415.6A CN111275547B (en) 2020-03-19 2020-03-19 Wind control system and method based on isolated forest

Publications (2)

Publication Number Publication Date
CN111275547A CN111275547A (en) 2020-06-12
CN111275547B true CN111275547B (en) 2023-07-18

Family

ID=71000780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010196415.6A Active CN111275547B (en) 2020-03-19 2020-03-19 Wind control system and method based on isolated forest

Country Status (1)

Country Link
CN (1) CN111275547B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112738088B (en) * 2020-12-28 2023-03-21 上海观安信息技术股份有限公司 Behavior sequence anomaly detection method and system based on unsupervised algorithm
CN117077067B (en) * 2023-10-18 2023-12-22 北京亚康万玮信息技术股份有限公司 Information system automatic deployment planning method based on intelligent matching
CN117555892B (en) * 2024-01-10 2024-04-02 江苏省生态环境大数据有限公司 Atmospheric pollutant multimode fusion accounting model post-treatment method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009359A (en) * 2019-01-22 2019-07-12 阿里巴巴集团控股有限公司 Training method, update method and the device of unsupervised risk prevention system model

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6665850B1 (en) * 2002-05-22 2003-12-16 Lsi Logic Corporation Spanning tree method for K-dimensional space
MY166960A (en) * 2013-11-27 2018-07-26 Mimos Berhad A system and method for detecting anomalies in computing resources
CN108777873B (en) * 2018-06-04 2021-03-02 江南大学 Wireless sensor network abnormal data detection method based on weighted mixed isolated forest
CN108921440B (en) * 2018-07-11 2022-08-05 平安科技(深圳)有限公司 Pollutant abnormity monitoring method, system, computer equipment and storage medium
CN109345137A (en) * 2018-10-22 2019-02-15 广东精点数据科技股份有限公司 A kind of rejecting outliers method based on agriculture big data
CN109300029A (en) * 2018-10-25 2019-02-01 北京芯盾时代科技有限公司 Borrow or lend money fraud detection model training method, debt-credit fraud detection method and device
CN110046781B (en) * 2018-12-04 2020-07-07 阿里巴巴集团控股有限公司 Merchant risk prevention and control method and device
CN109859029A (en) * 2019-01-04 2019-06-07 深圳壹账通智能科技有限公司 Abnormal application detection method, device, computer equipment and storage medium
CN110046665A (en) * 2019-04-17 2019-07-23 成都信息工程大学 Based on isolated two abnormal classification point detecting method of forest, information data processing terminal
CN110414555B (en) * 2019-06-20 2023-10-03 创新先进技术有限公司 Method and device for detecting abnormal sample
CN110322349B (en) * 2019-06-25 2023-08-22 创新先进技术有限公司 Data processing method, device and equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009359A (en) * 2019-01-22 2019-07-12 阿里巴巴集团控股有限公司 Training method, update method and the device of unsupervised risk prevention system model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于图数据库与机器学习的业务安全风控平台;方国强;《数据安全与云计算》(第2期);67-69 *

Also Published As

Publication number Publication date
CN111275547A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN111275547B (en) Wind control system and method based on isolated forest
Dolédec et al. Niche separation in community analysis: a new method
CN107194803A (en) A kind of P2P nets borrow the device of borrower's assessing credit risks
Bi The self-thinning surface
CN103778262B (en) Information retrieval method and device based on thesaurus
CN108846338A (en) Polarization characteristic selection and classification method based on object-oriented random forest
Chakrabarty A regression approach to distribution and trend analysis of quarterly foreign tourist arrivals in India
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN112396252A (en) Method for acquiring construction success evaluation values of double-creation park of power internet of things
CN105825288A (en) Optimization analysis method for eliminating regression data colinearity problem of complex system
CN113127464B (en) Agricultural big data environment feature processing method and device and electronic equipment
Glennon et al. Development and validation of credit scoring models
CN112434886A (en) Method for predicting client mortgage loan default probability
CN109978300B (en) Customer risk tolerance quantification method and system, and asset configuration method and system
Cross et al. Macroeconomic forecasting with large stochastic volatility in mean VARs
Chen et al. A Web‐based distributed system for hurricane occurrence projection
CN113408895A (en) Ecological quality index construction method and system based on pixel scale
CN114328668A (en) Method and device for generating deposit risk control strategy, terminal and storage medium
CN114358951A (en) Intelligent assessment method for fund product based on big data analysis
Ibekwe et al. Modelling the Determinants of Naira/US Dollar Currency Exchange Rates Using Principal Component Analysis (PCA) and Singular Value Decomposition (SVD)
CN113988639A (en) Asset value dynamic management system
CN111984846A (en) Asset operation assessment decision algorithm based on big data analysis
Setnes et al. Fuzzy target selection in direct marketing
CN115689779B (en) User risk prediction method and system based on cloud credit decision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant