CN115496393A - Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable medium - Google Patents

Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable medium Download PDF

Info

Publication number
CN115496393A
CN115496393A CN202211232672.6A CN202211232672A CN115496393A CN 115496393 A CN115496393 A CN 115496393A CN 202211232672 A CN202211232672 A CN 202211232672A CN 115496393 A CN115496393 A CN 115496393A
Authority
CN
China
Prior art keywords
service
index
detection
value
service index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211232672.6A
Other languages
Chinese (zh)
Inventor
刘国华
程琬芸
高振勇
马丽
宫元瑞
李营
覃春钰
袁野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202211232672.6A priority Critical patent/CN115496393A/en
Publication of CN115496393A publication Critical patent/CN115496393A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an anomaly detection method, an anomaly detection device, electronic equipment and a computer readable medium, and relates to the technical field of big data processing. One embodiment of the method comprises: acquiring index values of all service indexes of all detection entities; calculating the cumulative probability of each service index of each detection entity according to the index value of each service index of each detection entity; and inputting the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, thereby outputting an abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity, and sending the abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity to a target user so as to alarm the target user for abnormality. The implementation method can solve the technical problems that certain requirements are required for data quantity and detection accuracy is lacked.

Description

Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable medium
Technical Field
The present invention relates to the field of big data processing technologies, and in particular, to an anomaly detection method and apparatus, an electronic device, and a computer-readable medium.
Background
When a bank develops credit business, each index needs to be closely monitored in each link before, during and after credit, abnormity is timely found out, and a corresponding strategy is formulated to correct the deviation. Each bank can monitor the change of business indexes and the change of risk indexes by deploying an early warning system. At present, a machine learning algorithm model is deployed in an early warning system, for example, anomaly detection based on an unsupervised algorithm, and an index based on expert judgment is deployed in the early warning system. The traditional anomaly detection model based on the unsupervised algorithm has certain requirements on data volume and depends on the accuracy of the algorithm, on one hand, the data volume is required to meet the requirements of the algorithm, and on the other hand, the assumed space of the algorithm is required to meet the service scene so as to achieve better detection effect; the latter method is based on experience to subjectively set indexes and threshold values, and lacks of theoretical basis and accuracy.
Disclosure of Invention
In view of this, embodiments of the present invention provide an anomaly detection method, an anomaly detection device, an electronic device, and a computer-readable medium, so as to solve the technical problems that a certain requirement is imposed on a data volume and detection accuracy is lacking.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an abnormality detection method including:
acquiring index values of all service indexes of all detection entities;
calculating the cumulative probability of each service index of each detection entity according to the index value of each service index of each detection entity;
inputting the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, thereby outputting abnormal detection entities and importance sequences of each service index corresponding to the abnormal detection entities;
and sending the abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity to a target user so as to alarm the target user for the abnormality.
Optionally, calculating, according to the index value of each service index of each detection entity, an accumulated probability of each service index of each detection entity includes:
acquiring index values of all service indexes of all reference entities in a historical time period;
calculating the statistic of each service index according to the index value of each service index of each reference entity and the index value of each service index of each detection entity;
and respectively calculating the cumulative probability of each service index of each entity according to the statistics.
Optionally, calculating statistics of each service index according to the index value of each service index of each reference entity and the index value of each service index of each detection entity, including:
for each service index, calculating the average value and the standard deviation of the service index according to the index value of the service index of each reference entity, thereby calculating the statistic of the service index according to the average value and the standard deviation of the service index and the index value of the service index of each detection entity.
Optionally, the following formula is used to calculate the statistic of the service index:
Figure BDA0003882087400000021
wherein, Z is the statistic of the service index, X is the average value of the service index of each detection entity, n is the number of the detection entities, mu is the average value of the service index of each reference entity, and sigma is the standard deviation of the service index of each reference entity.
Optionally, calculating the cumulative probability of each service index of each entity according to the statistics, respectively, includes:
for each service index, if the average value of the service indexes of the detection entities is larger than the average value of the service indexes of the reference entities, calculating the right tail cumulative probability of the service indexes of the entities according to the statistic;
and for each service index, if the average value of the service indexes of the detection entities is less than or equal to the average value of the service indexes of the reference entities, calculating the left-tail cumulative probability of the service indexes of the entities according to the statistic.
Optionally, before calculating the cumulative probability of each service index of each entity according to the index value of each service index of each entity, the method further includes:
if the missing value rate of the service index is greater than the missing value rate threshold value, deleting the index value of the service index;
if the single value rate of the service index is greater than the single value rate threshold value, deleting the index value of the service index;
and if the correlation of the service index is greater than the correlation threshold value, deleting the index value of the service index.
Optionally, if the correlation of the service index is greater than the correlation threshold, deleting the index value of the service index, including:
and for any two service indexes, calculating a Pearson correlation coefficient between the any two service indexes, and deleting one of the any two service indexes according to a service scene if the Pearson correlation coefficient between the any two service indexes is greater than a correlation threshold.
In addition, according to another aspect of the embodiments of the present invention, there is provided an abnormality detection apparatus including:
the acquisition module is used for acquiring index values of all the service indexes of all the detection entities;
the calculation module is used for calculating the cumulative probability of each service index of each detection entity according to the index value of each service index of each detection entity;
the detection module is used for inputting the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, so that abnormal detection entities and importance sequences of each service index corresponding to the abnormal detection entities are output;
and the alarm module is used for sending the abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity to a target user so as to alarm the target user for the abnormality.
Optionally, the computing module is further configured to:
acquiring index values of all service indexes of all reference entities in a historical time period;
calculating the statistic of each service index according to the index value of each service index of each reference entity and the index value of each service index of each detection entity;
and respectively calculating the cumulative probability of each service index of each entity according to the statistics.
Optionally, the computing module is further configured to:
for each service index, calculating the average value and the standard deviation of the service index according to the index value of the service index of each reference entity, thereby calculating the statistic of the service index according to the average value and the standard deviation of the service index and the index value of the service index of each detection entity.
Optionally, the computing module is further configured to:
calculating the statistic of the service index by adopting the following formula:
Figure BDA0003882087400000041
wherein, Z is the statistic of the service index, X is the average value of the service index of each detection entity, n is the number of the detection entities, mu is the average value of the service index of each reference entity, and sigma is the standard deviation of the service index of each reference entity.
Optionally, the computing module is further configured to:
for each service index, if the average value of the service indexes of all the detection entities is larger than the average value of the service indexes of all the reference entities, calculating the right tail cumulative probability of the service indexes of all the entities according to the statistic;
and for each service index, if the average value of the service indexes of the detection entities is less than or equal to the average value of the service indexes of the reference entities, calculating the left-tail cumulative probability of the service indexes of the entities according to the statistic.
Optionally, the computing module is further configured to:
before calculating the cumulative probability of each service index of each entity according to the index value of each service index of each entity,
if the missing value rate of the service index is greater than the missing value rate threshold value, deleting the index value of the service index;
if the single-value rate of the service index is greater than the single-value rate threshold value, deleting the index value of the service index;
and if the correlation of the service index is greater than the correlation threshold value, deleting the index value of the service index.
Optionally, the computing module is further configured to:
and for any two service indexes, calculating a Pearson correlation coefficient between the any two service indexes, and deleting one of the any two service indexes according to a service scene if the Pearson correlation coefficient between the any two service indexes is greater than a correlation threshold.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the method of any of the embodiments described above.
According to another aspect of the embodiments of the present invention, there is also provided a computer readable medium, on which a computer program is stored, which when executed by a processor implements the method of any of the above embodiments.
According to another aspect of the embodiments of the present invention, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the above embodiments.
One embodiment of the above invention has the following advantages or benefits: the method adopts the technical means that the accumulated probability of each service index of each detection entity is calculated according to the index value of each service index of each detection entity, and then the accumulated probability of each service index of each detection entity is input into an unsupervised model for unsupervised training, so that the abnormal detection entity and the importance ranking of each service index corresponding to the abnormal detection entity are output, and the technical problems that certain requirements are required on data quantity and detection accuracy is lacked in the prior art are solved. The embodiment of the invention applies the hypothesis testing methodology in statistics to the development of the anomaly detection model, particularly applies the early warning mechanism based on anomaly detection, better conforms to the idea of anomaly detection, simplifies the optimization and development difficulty of the algorithm in the development process of the model, contains the anomaly information quantity in the characteristic engineering as much as possible, and can simultaneously achieve the understanding of the early warning of the anomaly point. Therefore, the embodiment of the invention applies the actual operation of hypothesis testing to the feature engineering, improves the information quantity of the features in the unsupervised model, and reduces the requirements on algorithm optimization and data quantity.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. Wherein:
FIG. 1 is a flow chart of an anomaly detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the clustering results of a density-based unsupervised clustering algorithm;
FIG. 3 is a flowchart of an abnormality detection method according to a referential embodiment of the present invention;
FIG. 4 is a flowchart of an abnormality detection method according to another referential embodiment of the present invention;
FIG. 5 is a schematic diagram of an anomaly detection apparatus according to an embodiment of the present invention;
FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 7 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.
The embodiment of the invention firstly combines the business logic to create the index of the input and output module, then calculates the P value (cumulative probability) of the index by using hypothesis test, namely the degree deviating from normal distribution, and finally obtains the characteristic with more business logic based on statistics. The embodiment of the invention adds the abnormal performance data based on hypothesis testing into the feature engineering, so that the features contain more abnormal information, and the dependence on algorithm tuning and algorithm selection is eliminated. In addition, due to the adoption of a hypothesis testing method, the characteristics are easier to understand and closer to business logic, and the technical problem of poor business interpretability of the unsupervised model is solved. Therefore, the application of hypothesis testing to construct features in the anomaly detection better conforms to the core idea of the anomaly detection, and meanwhile, the method can get rid of the dependence on the algorithm and reduce the requirement on the data volume.
Fig. 1 is a flowchart of an abnormality detection method according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 1, the abnormality detection method may include:
step 101, obtaining the index value of each service index of each detection entity.
Firstly, setting various service indexes to be monitored according to service requirements, wherein the step can be set by technical personnel according to experience, and the service indexes to be monitored can also be obtained by other analysis methods. These business indicators ultimately form features into the model. In addition, the model entry index of the unsupervised model is selected to be matched with the business target of the model. And then obtaining index values of all the service indexes of all the detection entities from the database, wherein the detection entities can be banks, organizations and the like which need to carry out service monitoring.
And 102, calculating the cumulative probability of each service index of each detection entity according to the index value of each service index of each detection entity.
Before step 102, the character-type index value needs to be converted into a numerical-type index value. For example, the sample data may be converted by percentage of each character type value, e.g., "male" may be replaced by 40%, and "female" may be replaced by 60%; coding schemes can also be used, for example, "male" and "female" can be converted to (0, 1) or (1, 0).
Optionally, step 102 may comprise: acquiring index values of all service indexes of all reference entities in a historical time period; calculating the statistic of each service index according to the index value of each service index of each reference entity and the index value of each service index of each detection entity; and respectively calculating the cumulative probability of each service index of each entity according to the statistics. In order to determine whether the sample data (the index value obtained in step 101) is significantly abnormal or not from the model entry index obtained in step 101, a reference group (the index value of each service index of each reference entity in a historical time period) needs to be found, the sample data is compared with the reference group to establish statistics, and then the cumulative probability of the sample data is calculated according to the statistics.
The selection of the reference group may be based on specific business requirements. For example, a bank may have 10 branches in city a. In order to monitor whether the change of the number of the loan clients applied per month in each branch of the city is abnormal (obviously increased or obviously decreased), the monitoring index is set as the average number of the loan clients applied per month in each branch of the city. Then when selecting the reference group, the loan application client amount of the past 6 months of 10 branches in the city can be selected as the reference group. It is understood that the historical data of the past 6 months is stable overall data, while the data of the previous month is sample data.
103, inputting the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, thereby outputting the abnormal detection entities and the importance ranking of each service index corresponding to the abnormal detection entities.
The abnormal detection is to find abnormal data inconsistent with the overall data distribution by using a data mining method. Such anomalies merely present a difference in distribution, not necessarily good or bad, and are not necessarily related to fraud in the credit domain. In this way, the abnormal points found by the abnormal detection technology do not depend on manual definition, and the abnormal points are good supplements for the abnormal service which cannot be covered by the supervision algorithm.
The anomaly detection algorithm of the existing basic unsupervised algorithm comprises an isolated forest, K-means clustering based on distance and DBSCAN clustering based on density. In the embodiment of the invention, a more classical DBSCAN algorithm in anomaly detection is combined, and how to apply hypothesis testing to development of an early warning model is described in detail, but the same method is also suitable for other algorithms and is not described any more.
DBSCAN, density-Based Spatial Clustering of Applications with Noise, is a Density-Based unsupervised Clustering algorithm. The core definition of the algorithm and the algorithm principle are as follows. The hyper-parameters of the algorithm are two, the cluster radius epsilon, the density threshold minPts (number of data points within the radius). As shown in fig. 2, core point: refers to at least some minPts data points within the radius epsilon centered on it. Boundary points: the boundary data points lie within a radius of a core point, but the data points within the radius centered at the boundary are less than minPts. Outliers: not in any one cluster, these points are located in low density regions, far from the core and boundary points.
The principle of the algorithm is that each point of the data set is traversed once under the specified density threshold value minPts and the radius epsilon, if the number of points existing in a circle with the radius epsilon and a certain point as the center of the circle is larger than minPts, the data set belongs to a cluster, and the direct density accessibility or the density accessibility is met. Otherwise, another cluster is generated. And finally, traversing each point of the data set, and defining the outlier as an abnormal point if the existing point can not be induced into any cluster, namely the density can not be reached. DBSCAN calculates the radius (based on euclidean distance or other distances) in the hyper sphere, so normalization of the in-mode features is usually required. In embodiments of the present invention, feature normalization may be omitted since all in-mode features are based on the P-value (probability value of 0 to 1) of the hypothesis test.
After the characteristic engineering is completed, the model development stage can be entered. The initial version model can give parameter values according to expert experience by combining with a service scene, and then the initial version model is optimized in the later period. Although the unsupervised model does not utilize the target variables (i.e. the dependent variables and the Y values to be predicted) in the training process, a small amount of Y values can still be utilized to optimize the effect of the model in the optimization process.
It should be noted that when a general early warning model is developed in the field of bank credit, the embodiment of the invention can improve the quality of characteristic engineering, particularly, the screening of the model-entering index and the characteristic created by using a hypothesis testing method are well fitted with the characteristic of general early warning, namely, the business logic is combined, and the processing of abnormal information is also highlighted technically. Therefore, the embodiment of the invention has the advantages that after the characteristic engineering, a simple algorithm can be selected, the model optimization is relatively simple, and a good detection effect can be achieved.
The output result of the model is to return the cluster number to which each detection entity belongs and whether the detection entity is an abnormal point (abnormal detection entity), and simultaneously, the importance sequence of each service index corresponding to the abnormal point can be output.
And 104, sending the abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity to a target user so as to alarm the target user about the abnormality.
After the abnormal detection entity is detected, the abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity are sent to a target user (such as an administrator, operation and maintenance personnel, service supervision personnel and the like), and an abnormality alarm is given to the target user, so that the target user can quickly find the abnormal entity and the reason of the abnormality.
According to the various embodiments described above, it can be seen that the technical means of the embodiments of the present invention, which calculates the cumulative probability of each service index of each detection entity according to the index value of each service index of each detection entity, and then inputs the cumulative probability of each service index of each detection entity into the unsupervised model for unsupervised training, thereby outputting the abnormal detection entity and the importance ranking of each service index corresponding to the abnormal detection entity, solves the technical problems of certain requirements on data amount and lack of detection accuracy in the prior art. The embodiment of the invention applies the hypothesis testing methodology in statistics to the development of the anomaly detection model, particularly applies the early warning mechanism based on anomaly detection, more conforms to the idea of anomaly detection, simplifies the optimization and development difficulty of the algorithm in the process of model development, contains the anomaly information quantity in the characteristic engineering as much as possible, and can simultaneously achieve the understanding of the early warning of the anomaly point. Therefore, the embodiment of the invention applies the actual operation of hypothesis testing to the feature engineering, improves the information quantity of the features in the unsupervised model, and reduces the requirements on algorithm optimization and data quantity.
Fig. 3 is a flowchart of an abnormality detection method according to a referential embodiment of the present invention. As still another embodiment of the present invention, as shown in fig. 3, the abnormality detecting method may include:
step 301, obtaining the index value of each service index of each detection entity.
Step 302, cleaning the index values of the various service indexes of the various detection entities.
If the missing value rate of the service index is greater than the missing value rate threshold value, deleting the index value of the service index; if the single value rate of the service index is greater than the single value rate threshold value, deleting the index value of the service index; and if the correlation of the service index is greater than the correlation threshold value, deleting the index value of the service index.
Specifically, the data cleaning mainly comprises the following steps:
1) Data quality inspection and processing
Checking data format, cleaning data format that model cannot process, such as time format processing into numerical type, deleting special characters, etc.
2) Data missing value, single value rate, correlation processing
Deleting the service index with high missing value rate;
deleting the service index with high single value rate;
and calculating the correlation of each service index, performing service logic judgment on the service index with high correlation coefficient, and reserving the index with strong service logic.
Optionally, if the correlation of the service index is greater than the correlation threshold, deleting the index value of the service index includes: and for any two service indexes, calculating a Pearson correlation coefficient between the any two service indexes, and deleting one of the any two service indexes according to a service scene if the Pearson correlation coefficient between the any two service indexes is greater than a correlation threshold.
Step 303, converting the character-type index value of each service index of each detection entity into a numerical index value.
And 304, calculating the cumulative probability of each service index of each detection entity according to the converted index value of each service index of each detection entity.
Step 305, inputting the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, thereby outputting abnormal detection entities and importance ranking of each service index corresponding to the abnormal detection entities.
Step 306, the abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity are sent to a target user, so as to alarm the target user about the abnormality.
In addition, in one embodiment of the present invention, the detailed implementation of the abnormality detection method is described in detail above, and therefore, the repeated description is not repeated here.
Fig. 4 is a flowchart of an abnormality detection method according to another referential embodiment of the present invention. As another embodiment of the present invention, as shown in fig. 4, the abnormality detecting method may include:
step 401, obtaining index values of each service index of each detection entity.
Step 402, obtaining the index value of each service index of each reference entity in the historical time period.
Step 403, calculating statistics of each service index according to the index value of each service index of each reference entity and the index value of each service index of each detection entity.
Optionally, calculating statistics of each service index according to the index value of each service index of each reference entity and the index value of each service index of each detection entity, including: for each service index, calculating the average value and the standard deviation of the service index according to the index value of the service index of each reference entity, thereby calculating the statistic of the service index according to the average value and the standard deviation of the service index and the index value of the service index of each detection entity.
And selecting applicable statistical distribution and designing statistical quantity according to the service index to be tested and the data which can be obtained. In the above example, the historical data of the past 6 months can be used as a population, that is, the mean and variance of the population are known, and the arithmetic mean of each branch can be used to check the change of the number of customers in the local city of a certain bank. Therefore, normal distribution and Z statistic can be selected to test for index changes. Besides the Z statistic, the t statistic, chi-square statistic, and F statistic may also be used to check the change of the index, which is not described in detail in the embodiments of the present invention.
Optionally, the following formula is used to calculate the statistic of the service index:
Figure BDA0003882087400000131
wherein, Z is the statistic of the service index, X is the average value of the service index of each detection entity, n is the number of the detection entities, mu is the average value of the service index of each reference entity, and sigma is the standard deviation of the service index of each reference entity.
Calculating the statistics is a standard step in hypothesis testing, and after calculating the statistics, designing a null hypothesis. In the above example, the statistics may be calculated as follows:
Figure BDA0003882087400000132
wherein, X is the average loan application client number of each branch in the previous month, mu is the average loan application client number of each branch in the past 6 months, and sigma is the standard deviation of the loan application client number of each branch in the past 6 months.
Detecting whether the number of customers is abnormal may be decomposed into whether the number of customers is significantly increasing or significantly decreasing, so the null hypothesis has two kinds:
H0:X>μ;
H0:X≤μ。
and step 404, respectively calculating the cumulative probability of each service index of each entity according to the statistics.
Optionally, calculating the cumulative probability of each service index of each entity according to the statistics, respectively, includes: for each service index, if the average value of the service indexes of the detection entities is larger than the average value of the service indexes of the reference entities, calculating the right tail cumulative probability of the service indexes of the entities according to the statistic; and for each service index, if the average value of the service indexes of the detection entities is less than or equal to the average value of the service indexes of the reference entities, calculating the left-tail cumulative probability of the service indexes of the entities according to the statistic.
And according to the statistic and the zero hypothesis obtained in the steps, selecting and calculating the left tail, right tail and double tail cumulative probability of the statistic.
In the case of the above-described example,
h0: x > μ, this test calculates the right tail cumulative probability P for a significant increase in the number of customers.
H0: mu is less than or equal to X, the test shows that the number of clients is obviously reduced, and the left tail cumulative probability P is calculated.
And as other representative service indexes, the method can be used for carrying out characteristic engineering: applying for high-risk reason early warning that the loan is rejected, such as blacklist rejection ratio, and utilizing a frequency checking method of chi-square distribution; comparing the profits of the advertisement marketing channels, and utilizing a double-sample test method of Z statistic or t statistic; the F-distribution can be used to check for variances in employee performance versus performance differences for individual branch employees, such as whether significant performance differences exist for employees of a team.
Step 405, inputting the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, thereby outputting abnormal detection entities and importance ranking of each service index corresponding to the abnormal detection entities.
Step 406, sending the abnormal detection entity and the importance ranking of each service index corresponding to the abnormal detection entity to a target user, so as to alarm the target user about the abnormality.
In addition, in another embodiment of the present invention, the detailed implementation of the abnormality detection method is described in detail above, and therefore the repeated description is not repeated here.
Fig. 5 is a schematic diagram of an abnormality detection apparatus according to an embodiment of the present invention. As shown in fig. 5, the anomaly detection apparatus 500 includes an obtaining module 501, a calculating module 502, a detecting module 503 and an alarming module 504; the obtaining module 501 is configured to obtain index values of each service index of each detection entity; the calculating module 502 is configured to calculate an accumulated probability of each service index of each detection entity according to the index value of each service index of each detection entity; the detection module 503 is configured to input the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, so as to output an abnormal detection entity and an importance ranking of each service index corresponding to the abnormal detection entity; the warning module 504 is configured to send the abnormal detection entity and the importance ranking of each service index corresponding to the abnormal detection entity to a target user, so as to perform an abnormality warning for the target user.
Optionally, the calculating module 502 is further configured to:
acquiring index values of all service indexes of all reference entities in a historical time period;
calculating the statistic of each service index according to the index value of each service index of each reference entity and the index value of each service index of each detection entity;
and respectively calculating the cumulative probability of each service index of each entity according to the statistics.
Optionally, the calculating module 502 is further configured to:
for each service index, calculating the average value and the standard deviation of the service index according to the index value of the service index of each reference entity, thereby calculating the statistic of the service index according to the average value and the standard deviation of the service index and the index value of the service index of each detection entity.
Optionally, the calculating module 502 is further configured to:
calculating the statistic of the service index by adopting the following formula:
Figure BDA0003882087400000151
wherein, Z is the statistic of the service index, X is the average value of the service index of each detection entity, n is the number of the detection entities, mu is the average value of the service index of each reference entity, and sigma is the standard deviation of the service index of each reference entity.
Optionally, the calculating module 502 is further configured to:
for each service index, if the average value of the service indexes of the detection entities is larger than the average value of the service indexes of the reference entities, calculating the right tail cumulative probability of the service indexes of the entities according to the statistic;
and for each service index, if the average value of the service indexes of the detection entities is less than or equal to the average value of the service indexes of the reference entities, calculating the left-tail cumulative probability of the service indexes of the entities according to the statistic.
Optionally, the calculating module 502 is further configured to:
before calculating the cumulative probability of each service index of each entity according to the index value of each service index of each entity,
if the missing value rate of the service index is greater than the missing value rate threshold value, deleting the index value of the service index;
if the single value rate of the service index is greater than the single value rate threshold value, deleting the index value of the service index;
and if the correlation of the service index is greater than the correlation threshold value, deleting the index value of the service index.
Optionally, the calculating module 502 is further configured to:
and for any two service indexes, calculating a Pearson correlation coefficient between the any two service indexes, and deleting one of the any two service indexes according to a service scene if the Pearson correlation coefficient between the any two service indexes is greater than a correlation threshold.
It should be noted that the detailed description of the embodiment of the abnormality detection apparatus according to the present invention has been described in detail in the above abnormality detection method, and therefore, the description thereof will not be repeated here.
Fig. 6 illustrates an exemplary system architecture 600 to which the anomaly detection method or anomaly detection apparatus of embodiments of the present invention may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The terminal devices 601, 602, 603 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 605 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 601, 602, 603. The background management server can analyze and process the received data such as the article information query request and feed back the processing result to the terminal equipment.
It should be noted that the abnormality detection method provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the abnormality detection apparatus is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with a terminal device implementing embodiments of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer programs according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module, a calculation module, a detection module, and an alarm module, where the names of the modules do not in some cases constitute a limitation on the modules themselves.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not assembled into the device. The computer readable medium carries one or more programs which, when executed by a device, implement the method of: acquiring index values of all service indexes of all detection entities; calculating the cumulative probability of each service index of each detection entity according to the index value of each service index of each detection entity; inputting the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, thereby outputting abnormal detection entities and importance sequences of each service index corresponding to the abnormal detection entities; and sending the abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity to a target user so as to alarm the target user for the abnormality.
As another aspect, an embodiment of the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the method described in any of the above embodiments.
According to the technical scheme of the embodiment of the invention, the accumulated probability of each service index of each detection entity is calculated according to the index value of each service index of each detection entity, and then the accumulated probability of each service index of each detection entity is input into the unsupervised model for unsupervised training, so that the abnormal detection entities and the importance ranking of each service index corresponding to the abnormal detection entities are output, and the technical problems that certain requirements are required on data quantity and the detection accuracy is lacked in the prior art are solved. The embodiment of the invention applies the hypothesis testing methodology in statistics to the development of the anomaly detection model, particularly applies the early warning mechanism based on anomaly detection, better conforms to the idea of anomaly detection, simplifies the optimization and development difficulty of the algorithm in the development process of the model, contains the anomaly information quantity in the characteristic engineering as much as possible, and can simultaneously achieve the understanding of the early warning of the anomaly point. Therefore, the embodiment of the invention applies the actual operation of hypothesis testing to the feature engineering, improves the information quantity of the features in the unsupervised model, and reduces the requirements on algorithm optimization and data quantity.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (17)

1. An abnormality detection method characterized by comprising:
acquiring index values of all service indexes of all detection entities;
calculating the cumulative probability of each service index of each detection entity according to the index value of each service index of each detection entity;
inputting the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, thereby outputting abnormal detection entities and importance sequences of each service index corresponding to the abnormal detection entities;
and sending the abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity to a target user so as to alarm the target user for the abnormality.
2. The method of claim 1, wherein calculating the cumulative probability of each service index of each detection entity according to the index value of each service index of each detection entity comprises:
acquiring index values of all service indexes of all reference entities in a historical time period;
calculating the statistic of each service index according to the index value of each service index of each reference entity and the index value of each service index of each detection entity;
and respectively calculating the cumulative probability of each service index of each entity according to the statistics.
3. The method of claim 2, wherein calculating statistics of the service indicators according to the indicator value of the service indicators of the reference entities and the indicator value of the service indicators of the detection entities comprises:
for each service index, calculating the average value and the standard deviation of the service index according to the index value of the service index of each reference entity, thereby calculating the statistic of the service index according to the average value and the standard deviation of the service index and the index value of the service index of each detection entity.
4. The method of claim 3, wherein the statistics of the business metric are calculated using the following formula:
Figure FDA0003882087390000021
wherein, Z is the statistic of the service index, X is the average value of the service index of each detection entity, n is the number of the detection entities, mu is the average value of the service index of each reference entity, and sigma is the standard deviation of the service index of each reference entity.
5. The method of claim 4, wherein calculating the cumulative probability of each of the business indicators of each of the entities based on the statistics comprises:
for each service index, if the average value of the service indexes of the detection entities is larger than the average value of the service indexes of the reference entities, calculating the right tail cumulative probability of the service indexes of the entities according to the statistic;
and for each service index, if the average value of the service indexes of the detection entities is less than or equal to the average value of the service indexes of the reference entities, calculating the left-tail cumulative probability of the service indexes of the entities according to the statistic.
6. The method of claim 1, wherein before calculating the cumulative probability of each business index of each entity according to the index value of each business index of each entity, further comprising:
if the missing value rate of the service index is greater than the missing value rate threshold value, deleting the index value of the service index;
if the single value rate of the service index is greater than the single value rate threshold value, deleting the index value of the service index;
and if the correlation of the service index is greater than the correlation threshold value, deleting the index value of the service index.
7. The method of claim 6, wherein deleting the indicator value of the service indicator if the correlation of the service indicator is greater than the correlation threshold comprises:
and for any two service indexes, calculating a Pearson correlation coefficient between the any two service indexes, and deleting one of the any two service indexes according to a service scene if the Pearson correlation coefficient between the any two service indexes is greater than a correlation threshold.
8. An abnormality detection device characterized by comprising:
the acquisition module is used for acquiring index values of all the service indexes of all the detection entities;
the calculation module is used for calculating the cumulative probability of each service index of each detection entity according to the index value of each service index of each detection entity;
the detection module is used for inputting the cumulative probability of each service index of each detection entity into an unsupervised model for unsupervised training, so that abnormal detection entities and importance sequences of each service index corresponding to the abnormal detection entities are output;
and the alarm module is used for sending the abnormal detection entity and the importance sequence of each service index corresponding to the abnormal detection entity to a target user so as to alarm the target user for the abnormality.
9. The apparatus of claim 8, wherein the computing module is further configured to:
acquiring index values of all service indexes of all reference entities in a historical time period;
calculating the statistic of each service index according to the index value of each service index of each reference entity and the index value of each service index of each detection entity;
and respectively calculating the cumulative probability of each service index of each entity according to the statistics.
10. The apparatus of claim 9, wherein the computing module is further configured to:
for each service index, calculating the average value and the standard deviation of the service index according to the index value of the service index of each reference entity, thereby calculating the statistic of the service index according to the average value and the standard deviation of the service index and the index value of the service index of each detection entity.
11. The apparatus of claim 10, wherein the computing module is further configured to:
calculating the statistic of the service index by adopting the following formula:
Figure FDA0003882087390000041
wherein, Z is the statistic of the service index, X is the average value of the service index of each detection entity, n is the number of the detection entities, mu is the average value of the service index of each reference entity, and sigma is the standard deviation of the service index of each reference entity.
12. The apparatus of claim 11, wherein the computing module is further configured to:
for each service index, if the average value of the service indexes of all the detection entities is larger than the average value of the service indexes of all the reference entities, calculating the right tail cumulative probability of the service indexes of all the entities according to the statistic;
and for each service index, if the average value of the service indexes of the detection entities is less than or equal to the average value of the service indexes of the reference entities, calculating the left-tail cumulative probability of the service indexes of the entities according to the statistic.
13. The apparatus of claim 8, wherein the computing module is further configured to:
before calculating the cumulative probability of each service index of each entity according to the index value of each service index of each entity,
if the missing value rate of the service index is greater than the missing value rate threshold value, deleting the index value of the service index;
if the single-value rate of the service index is greater than the single-value rate threshold value, deleting the index value of the service index;
and if the correlation of the service index is greater than the correlation threshold value, deleting the index value of the service index.
14. The apparatus of claim 13, wherein the computing module is further configured to:
and for any two service indexes, calculating a Pearson correlation coefficient between the any two service indexes, and deleting one of the any two service indexes according to a service scene if the Pearson correlation coefficient between the any two service indexes is greater than a correlation threshold.
15. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
the one or more programs, when executed by the one or more processors, implement the method of any of claims 1-7.
16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
17. A computer program product comprising a computer program, characterized in that the computer program realizes the method according to any of claims 1-7 when executed by a processor.
CN202211232672.6A 2022-10-10 2022-10-10 Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable medium Pending CN115496393A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211232672.6A CN115496393A (en) 2022-10-10 2022-10-10 Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211232672.6A CN115496393A (en) 2022-10-10 2022-10-10 Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable medium

Publications (1)

Publication Number Publication Date
CN115496393A true CN115496393A (en) 2022-12-20

Family

ID=84473887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211232672.6A Pending CN115496393A (en) 2022-10-10 2022-10-10 Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable medium

Country Status (1)

Country Link
CN (1) CN115496393A (en)

Similar Documents

Publication Publication Date Title
CN107809331B (en) Method and device for identifying abnormal flow
CN110119413A (en) The method and apparatus of data fusion
CN111340611B (en) Risk early warning method and device
CN111427974A (en) Data quality evaluation management method and device
CN111861487A (en) Financial transaction data processing method, and fraud monitoring method and device
CN113360359A (en) Index abnormal data tracing method, device, equipment and storage medium
CN111369344A (en) Method and device for dynamically generating early warning rule
WO2023134188A1 (en) Index determination method and apparatus, and electronic device and computer-readable medium
CN111160847A (en) Method and device for processing flow information
CN112950359B (en) User identification method and device
CN114880482A (en) Graph embedding-based relation graph key personnel analysis method and system
US11227288B1 (en) Systems and methods for integration of disparate data feeds for unified data monitoring
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
CN116862658A (en) Credit evaluation method, apparatus, electronic device, medium and program product
CN112734352A (en) Document auditing method and device based on data dimensionality
CN116228429A (en) Method and device for detecting transaction data
CN113987186B (en) Method and device for generating marketing scheme based on knowledge graph
CN115496393A (en) Abnormality detection method, abnormality detection device, electronic apparatus, and computer-readable medium
CN114581219A (en) Anti-telecommunication network fraud early warning method and system
CN115809818A (en) Multidimensional diagnosis and evaluation method and device for auxiliary equipment of pumped storage power station
CN115147195A (en) Bidding purchase risk monitoring method, apparatus, device and medium
CN114239985A (en) Exchange rate prediction method and device, electronic equipment and storage medium
CN113792749A (en) Time series data abnormity detection method, device, equipment and storage medium
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN113222632A (en) Object mining method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination