CN112927068B

CN112927068B - Method, device, equipment and storage medium for determining risk classification threshold of service data

Info

Publication number: CN112927068B
Application number: CN202110341559.0A
Authority: CN
Inventors: 曹若迪
Original assignee: Good Diagnosis Shanghai Information Technology Co ltd
Current assignee: Good Diagnosis Shanghai Information Technology Co ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2024-08-20
Anticipated expiration: 2041-03-30
Also published as: CN112927068A

Abstract

The embodiment of the specification provides a method, a device, equipment and a storage medium for determining a risk classification threshold of service data, wherein the method comprises the following steps: acquiring a risk probability predicted value of historical service data and a real occurrence probability of a dangerous event; dividing the risk probability prediction value into a plurality of interval groups according to the size; monotonicity processing is carried out on the plurality of interval groups so as to obtain a plurality of interval groups conforming to monotonicity; classifying the plurality of interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event, and determining a risk classification threshold according to a classification result so as to be used for predicting business risks. The embodiment of the specification can improve the accuracy and efficiency of setting the risk classification threshold of the business data.

Description

Method, device, equipment and storage medium for determining risk classification threshold of service data

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining a risk classification threshold of service data.

Background

Risk classification identification is involved in many business fields. For example, security production risk classification identification in the industrial field; classification and identification of credit risks in the financial field; network and information security risk classification and identification in the field of Internet; health risk classification and identification in the medical and insurance fields, etc.

Before the business risk classification identification is performed, a business data risk classification threshold needs to be determined. In the prior art, the risk classification threshold of the service data is generally set manually according to experience. However, in many cases, the manually set risk classification threshold of the service data is not necessarily accurate, and when the risk classification recognition is performed on the service data by taking the inaccurate risk classification threshold of the service data as a reference, it is difficult to obtain the real risk of the service data. Moreover, for each application scene, the corresponding risk classification threshold needs to be set manually and independently, and the efficiency of setting the risk classification threshold of the service data is low.

Disclosure of Invention

An objective of the embodiments of the present disclosure is to provide a method, an apparatus, a device, and a storage medium for determining a risk classification threshold of service data, so as to improve accuracy and efficiency of setting the risk classification threshold of service data.

In order to achieve the above objective, in one aspect, an embodiment of the present disclosure provides a method for determining a risk classification threshold of service data, including:

Acquiring a risk probability predicted value of historical service data and a real occurrence probability of a dangerous event;

dividing the risk probability prediction value into a plurality of interval groups according to the size;

monotonicity processing is carried out on the plurality of interval groups so as to obtain a plurality of interval groups conforming to monotonicity;

Classifying the plurality of interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event, and determining a risk classification threshold according to a classification result so as to be used for predicting business risks.

In an embodiment of the present disclosure, after the obtaining of the plurality of interval groups conforming to monotonicity, the method further includes:

performing hypothesis testing processing on the plurality of interval groups conforming to monotonicity to obtain a plurality of interval groups passing hypothesis testing;

Correspondingly, the classifying the plurality of interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event includes:

and classifying the interval groups passing the hypothesis test according to the real occurrence probability of the dangerous event.

In an embodiment of the present disclosure, the dividing the risk probability prediction value into a plurality of interval groups according to a size includes:

dividing the risk probability prediction value into M interval groups according to the size; m is a positive integer greater than 1;

Determining a home zone packet of the real occurrence probability of the dangerous event in the M zone packets;

The home zone grouping is further divided into N zone groupings according to the actual occurrence probability of the dangerous event; n is a positive integer greater than 1;

And further dividing the overlapping parts of the N interval packets and the M interval packets, and jointly using the overlapping parts and the non-overlapping parts as a plurality of interval packets.

In an embodiment of the present disclosure, the monotonically processing the plurality of interval groups includes:

Determining the real occurrence probability of dangerous events in each interval group;

Judging whether the real occurrence probability of the dangerous event in each two adjacent interval groups accords with monotonic increment or not;

When the real occurrence probability of the intra-group dangerous event of each two adjacent interval groups does not accord with the monotonic increment, the interval merging processing is carried out and the monotonic processing is carried out again until the real occurrence probability of the intra-group dangerous event of each two adjacent interval groups accords with the monotonic increment.

In an embodiment of the present disclosure, the performing a section merging process includes:

Determining the actual occurrence probability of the dangerous event in the sliding window of each interval group;

For each interval group, determining the magnitude relation between the actual occurrence probability of the dangerous event in the sliding window and the actual occurrence probability of the dangerous event;

according to the size relation, correspondingly determining the sliding window risk category of each interval group;

for each interval group which does not accord with monotonicity, when the sliding window risk category is the same as the sliding window risk category of the following interval group, merging the interval group with the following interval group;

for each interval group which does not accord with monotonicity, when the sliding window risk category is different from the sliding window risk category of the following interval group, merging the two interval groups after the interval group.

In an embodiment of the present disclosure, the performing a hypothesis testing process on the plurality of interval groups conforming to monotonicity includes:

performing hypothesis testing processing on the plurality of interval groups conforming to monotonicity;

When there is an interval packet that fails the hypothesis test, the interval packet is combined with the following interval packet, and the hypothesis test processing is performed again after the combination until each interval packet passes the hypothesis test.

In an embodiment of the present disclosure, the classifying the plurality of interval groups passing the hypothesis test according to the probability of occurrence of the dangerous event includes:

determining the real occurrence probability of dangerous events in groups of each interval group passing hypothesis test;

for each interval group passing hypothesis test, determining the magnitude relation between the real occurrence probability of the dangerous event in the group and the real occurrence probability of the dangerous event;

and correspondingly determining the risk category of each interval group passing the hypothesis test according to the size relation.

In an embodiment of the present disclosure, the determining a risk classification threshold according to a classification result includes:

And combining all interval groups with the same risk category into one interval group to obtain a risk classification threshold.

On the other hand, the embodiment of the specification also provides a service data risk classification threshold determining device, which comprises:

the input data acquisition module is used for acquiring a risk probability prediction value of the historical service data and the actual occurrence probability of the dangerous event;

the interval grouping division module is used for dividing the risk probability prediction value into a plurality of interval groups according to the size;

the monotonicity processing module is used for performing monotonicity processing on the plurality of interval groups so as to obtain a plurality of interval groups conforming to monotonicity;

and the classification threshold determining module is used for classifying the plurality of interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event and determining a risk classification threshold according to a classification result so as to be used for predicting business risks.

In an embodiment of the present specification, the apparatus further includes:

the hypothesis testing module is used for carrying out hypothesis testing processing on the interval groups conforming to the monotonicity so as to obtain a plurality of interval groups passing the hypothesis testing;

Correspondingly, the classification threshold determining module classifies the plurality of interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event, and comprises the following steps:

and the classification threshold determining module classifies the plurality of interval groups passing the hypothesis test according to the real occurrence probability of the dangerous event.

In another aspect, embodiments of the present disclosure also provide a computer storage medium having stored thereon a computer program which, when executed by a processor of a computer device, performs instructions of the above method.

The technical scheme provided by the embodiment of the specification can automatically calculate the risk classification threshold according to the historical service data, and compared with the mode of manually setting the risk classification threshold, the mode greatly improves the efficiency of setting the risk classification threshold of the service data. Furthermore, in the embodiment of the specification, the monotonicity processing is performed on the plurality of initial interval groups, so that the initial interval groups meet the monotonicity requirement; and classifying the interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event, and determining a risk classification threshold according to a classification result for service risk prediction, so that the service data risk classification threshold determined by the embodiment of the specification is more accurate, and further the accuracy of subsequent service risk identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some of the embodiments described in the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

fig. 1 is a schematic application scenario of a method for determining a risk classification threshold of service data according to an embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating a method of determining a risk classification threshold for traffic data in some embodiments of the present description;

FIG. 3 shows a flow chart of distinguishing packets in one embodiment of the present description;

FIG. 4 shows a flow diagram of monotonicity processing in an embodiment of the present specification;

FIG. 5 is a flowchart showing the section merging process in the monotonicity process in one embodiment of the present specification;

FIG. 6 shows a flow chart of a differential classification in an embodiment of the present description;

FIG. 7 is a schematic diagram illustrating a process for determining a risk classification threshold of service data according to an embodiment of the present disclosure;

FIG. 8 is a diagram showing the determination of the risk classification threshold (before packet adjustment) of the probability of illness in an embodiment of the present disclosure;

FIG. 9 is a diagram showing the determination of the risk classification threshold of the probability of illness (after group adjustment) in one embodiment of the present disclosure;

FIG. 10 is a block diagram illustrating a business data risk classification threshold determination device in some embodiments of the present disclosure;

FIG. 11 illustrates a block diagram of a computer device in some embodiments of the present description.

[ Reference numerals description ]

10. A business data risk classification threshold determining device;

20. A database;

30. A business data risk identification device;

40. A business system;

101. an input data acquisition module;

102. a section grouping and dividing module;

103. A monotonicity processing module;

104. A classification threshold determining module;

1102. A computer device;

1104. A processor;

1106. A memory;

1108. A driving mechanism;

1110. an input/output module;

1112. an input device;

1114. an output device;

1116. a presentation device;

1118. A graphical user interface;

1120. A network interface;

1122. a communication link;

1124. A communication bus.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The specification relates to a business data risk classification threshold determining technology, which can be applied to any business scene needing business risk identification. For example, may include, but is not limited to: safety production risk classification identification scenes in the industrial field; classifying and identifying scenes of credit risks in the financial field; network and information security risk classification recognition scenes in the field of Internet; health risk classification identification scenes in the insurance field; health risk classification and identification scenes in the medical field; classification of driving risk in the field of autopilot identifies scenes, etc.

In view of the problem that the accuracy of the manually set risk classification threshold of the service data is low, the embodiment of the specification provides a method capable of automatically determining the risk classification threshold of the service data. Referring to fig. 1, the method for determining a risk classification threshold of service data according to the embodiments of the present disclosure may be used in the apparatus 10 for determining a risk classification threshold of service data. The service data risk classification threshold determining device 10 may obtain historical service data from the database 20; automatically calculating a risk classification threshold of the historical service data according to the risk classification threshold; and provides the risk classification threshold to the service data risk recognition device 30, so that the service data risk recognition device 30 can perform risk recognition on the service data provided by the service system 40 according to the risk classification threshold. In implementation, the business data risk classification threshold determining device 10 and the business data risk identifying device 30 may be configured on different computer devices or may be configured on the same computer device, and may specifically be selected according to actual needs.

Referring to fig. 2, in some embodiments of the present disclosure, the method for determining a risk classification threshold of service data may include the following steps:

s201, acquiring a risk probability prediction value of historical service data and a real occurrence probability of a dangerous event.

S202, dividing the risk probability prediction value into a plurality of interval groups according to the size.

S203, monotonicity processing is carried out on the plurality of interval groups so as to obtain a plurality of interval groups conforming to monotonicity.

S204, classifying the interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event, and determining a risk classification threshold according to a classification result so as to be used for predicting business risks.

According to the embodiment of the specification, the risk classification threshold can be automatically calculated according to the historical service data, and compared with the manual setting of the risk classification threshold, the risk classification threshold setting efficiency of the service data is greatly improved. Furthermore, in the embodiment of the specification, the monotonicity processing is performed on the plurality of initial interval groups, so that the initial interval groups meet the monotonicity requirement; and classifying the interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event, and determining a risk classification threshold according to a classification result for service risk prediction, so that the service data risk classification threshold determined by the embodiment of the specification is more accurate, and further the accuracy of subsequent service risk identification is improved.

In the embodiment of the present disclosure, determining the risk classification threshold is to divide the risk classification into intervals, specifically, determining the number of intervals of the risk classification and the interval endpoint value of each interval. For example, in an exemplary embodiment, the range of the risk probability is [0,1], based on the method for determining the risk classification threshold of the service data, if the risk classification threshold of the service data is determined by dividing [0,1] into five risk classification intervals of [0,0.2 ], [0.2,0.5 ], [0.5,0.8 ], and [0.8,1], the five risk classification intervals are the risk classification thresholds to be determined.

In some embodiments of the present description, the probability of occurrence of a dangerous event of historical business data is: in the historical service data, the ratio of the number of samples in which dangerous events have actually occurred to the total number of samples in the historical service data. For example, the total number of samples of the historical service data in the last three years is 1000000, wherein the number of samples in which the dangerous event has actually occurred is 1000, and the probability of actually occurring the dangerous event in the historical service data is:

In some embodiments of the present disclosure, the risk probability prediction value of each sample in the historical service data may be calculated according to any suitable risk prediction model, and specifically, the risk prediction model may be selected according to an actual application scenario, which is not limited in this disclosure. For example, for autopilot risk, a suitable autopilot risk prediction model may be used for prediction; for the leakage risk of the oil and gas pipeline, a proper oil and gas pipeline leakage risk prediction model can be used for prediction; for financial institution credit risk, a suitable credit risk prediction model may be used for prediction; for risk of illness, the prediction can be made using a suitable risk prediction model for illness.

The risk probability predicted value obtained in the last step is divided into a plurality of interval groups according to the size, so that a proper risk classification threshold can be conveniently obtained on the basis. However, in the same application scenario, for the same historical service data sample, when different risk prediction models are adopted for prediction, the output risk probability prediction values of the same historical service data sample may be different, but basically are around the real occurrence probability of a dangerous event. Specifically, the above-described differences can be largely divided into the following two types: one is to judge the risk probability predictive value of a single sample, only for individual samples; one is the probability of a real occurrence of a dangerous event (i.e., the probability of a real occurrence of a dangerous event within a group) within the population sample range where the individual sample is located, which value is closer to the probability of a real occurrence of a dangerous event. Therefore, simply equally dividing the value range [0,1] of the risk probability into a plurality of interval groups is not enough to meet both cases, so that it is also necessary to add a plurality of interval groups for the probability of actually occurring a dangerous event.

Thus, in some embodiments of the present disclosure, referring to fig. 3, the dividing the risk probability prediction value into a plurality of interval groups according to the size may include the following steps:

s301, dividing the risk probability predicted value into M interval groups according to the size; m is a positive integer greater than 1.

Wherein, the size of M can be set according to actual needs. For example, in an exemplary embodiment, the range of the risk probability is [0,1], and if [0,1] or the like is divided into 20 parts (here, m=20) in steps of 0.05, 20 equally divided section packets are obtained. Based on this, the home zone group among 20 equally divided zone groups can be determined according to the magnitude of each risk probability prediction value. For example, since a certain risk probability prediction value is 0.008, which falls within the fourth section packet (0.0005,0.1) of the 20 equally divided section packets, the fourth section packet (0.0005,0.1) is the home section packet having a risk probability prediction value of 0.008.

S302, determining the attribution interval grouping of the true occurrence probability of the dangerous event in the M interval groupings.

Also taking the fourth interval packet (0.0005,0.1) of the 20 equally divided interval packets as an example, since the real occurrence probability of the dangerous event in the history service data is 0.001 in the range of the fourth interval packet (0.0005,0.1), the fourth interval packet (0.0005,0.1) is the home interval packet with the risk probability prediction value of 0.001.

S303, the home zone group is further divided into N zone groups according to the actual occurrence probability of the dangerous event; n is a positive integer greater than 1.

The size of N can be set according to actual needs, and is generally 1< N < m. For example, the home zone packet (0.0005,0.1) having a probability of actually occurring the dangerous event of 0.001 may be divided into three zone packets (0.0005,0.001), (0.001,0.01) and (0.01,0.1).

S304, further dividing the overlapping parts of the N interval packets and the M interval packets, and jointly using the overlapping parts and the non-overlapping parts as a plurality of interval packets.

For example, among the 20 equally divided section packets, 19 section packets other than the fourth section packet (0.0005,0.1) and three section packets separated based on the fourth section packet (0.0005,0.1) may be collectively used as the plurality of section packets.

Each value in the interval group represents a risk probability predicted value of an individual sample, and when the real occurrence probability of the dangerous event in the group of the interval group is larger, the risk probability predicted value of each individual sample in the interval group is higher. However, there are cases where there is a fluctuation in the vicinity of the critical point of the section packet, resulting in that the probability of occurrence of a dangerous event in the group of section packets is not monotonically increasing. Therefore, it is necessary to determine to which section group the fluctuation portion should belong so that the probability of occurrence of the intra-group dangerous event of all section groups is increased with the section groups.

Thus, referring to fig. 4, in some embodiments of the present disclosure, the monotonically processing the plurality of interval groups may include the steps of:

s401, determining the real occurrence probability of the dangerous event in each interval group.

S402, judging whether the actual occurrence probability of the dangerous event in each two adjacent interval groups accords with monotonic increment.

Judging whether the real occurrence probability of the dangerous event in the group of every two adjacent interval groups accords with monotonic increment or not, namely judging whether the real occurrence probability of the dangerous event in the group of one interval group is larger than the real occurrence probability of the dangerous event in the group of the previous interval group.

For example, in the embodiment shown in fig. 7, there are sixteen interval packets of 0 to 15. Starting from the interval group 0, judging whether the monotonic increasing performance is met between the real occurrence probability of the dangerous event in the group of the interval group 0 and the real occurrence probability of the dangerous event in the group of the interval group 1, namely judging whether the real occurrence probability of the dangerous event in the group of the interval group 1 is larger than the real occurrence probability of the dangerous event in the group of the interval group 0. If the real occurrence probability of the intra-group dangerous event of the interval group 1 is larger than the real occurrence probability of the intra-group dangerous event of the interval group 0, the real occurrence probability of the intra-group dangerous event of the interval group 0 and the intra-group dangerous event of the interval group 1 are considered to accord with monotonic incremental; otherwise, the real occurrence probability of the dangerous event in the group of the interval 0 and the interval 1 is not consistent with monotonic increment. As shown in the first row interval packet in fig. 7, by judging acknowledgement: the real occurrence probability of the dangerous event in the group of the No. 0 interval group and the No. 1 interval group does not accord with monotonic increment, the real occurrence probability of the dangerous event in the group of the No. 12 interval group and the No. 13 interval group does not accord with monotonic increment, and the real occurrence probability of the dangerous event in the group of the other two adjacent interval groups accord with monotonic increment.

S403, when the real occurrence probability of the intra-group dangerous event of the two adjacent interval groups does not accord with the monotonic increment, the interval merging processing is carried out and the monotonic processing is carried out again until the real occurrence probability of the intra-group dangerous event of each two adjacent interval groups accords with the monotonic increment.

As shown in fig. 5, in some embodiments of the present specification, the section merging process performed in step S403 may further include the following steps:

s501, determining the actual occurrence probability of the dangerous event in the sliding window of each interval group.

In the embodiment of the present disclosure, the probability of actually occurring a dangerous event in the sliding window is used to determine the merging range of the interval packet subsequently. The real occurrence probability of the dangerous event in the sliding window can overcome the problem of abnormal risk probability possibly caused by abnormal local sample distribution in a single interval group; therefore, the real occurrence probability of the dangerous event in the sliding window is used for grouping and combining, so that the business data risk classification threshold determined by the embodiment of the specification is more accurate; thereby being beneficial to further improving the accuracy of the subsequent business risk identification.

The sliding window size of the probability of actually occurring a dangerous event in the sliding window can be selected according to actual needs, for example, in an embodiment of the present disclosure, the sliding window size may be three interval groups. In this case, for the current ith interval group, the probability of actually occurring a dangerous event in the sliding window can be calculated according to the formulaAnd (5) calculating to obtain the product. Wherein ave is the actual occurrence probability of dangerous events in the sliding window of the ith interval group, sum (i-1), sum (i) and sum (i+1) are the numbers of dangerous individuals (i.e. individuals having dangerous events) in the ith, ith and (i+1) th interval groups, and count (i-1), count (i) and count (i+1) are the numbers of samples in the ith, ith and (i+1) th interval groups, respectively. In an embodiment of the present disclosure, for a first interval group among a plurality of interval groups, the number of samples in the group and the number of risk individuals in the previous interval group may be zero by default. In another embodiment of the present disclosure, for the last interval packet among the plurality of interval packets, since no interval packet may be merged thereafter, it may not be necessary to calculate the probability of actually occurring a dangerous event within its sliding window.

Those skilled in the art will appreciate that the above-described method of calculating the probability of a true occurrence of a dangerous event within a sliding window is merely exemplary. In other embodiments of the present disclosure, the data smoothing algorithm may be implemented by weighted sliding average, SG (Savitsky-Golay) filtering, alpha (α) mean filtering, etc., according to actual needs, which is not limited in the present disclosure.

S502, for each interval group, determining the relation between the actual occurrence probability of the dangerous event in the sliding window and the actual occurrence probability of the dangerous event.

In the embodiment of the present disclosure, the interval range of the sliding window risk category may be preset according to the probability of actually occurring the dangerous event of the historical service data. For example, the range of the sliding window risk category may be preset according to the actual occurrence probability of the dangerous event as shown in table 1 below. In table 1, ave _i is the probability of actually occurring a dangerous event in the sliding window of the ith interval group, and P is the probability of actually occurring a dangerous event.

TABLE 1

For the ith interval group, when the actual occurrence probability ave _i of the dangerous event in the sliding window is calculated, the size relation between the actual occurrence probability ave and the dangerous event can be determined by looking up the table 1.

S503, correspondingly determining the sliding window risk category of each interval group according to the size relation.

As can be seen from the above table 1, since each sliding window risk category has a unique corresponding interval range; after determining the real occurrence probability of the dangerous event in the sliding window of one interval group, the sliding window risk category to which the dangerous event belongs can be determined through the interval range to which the dangerous event belongs in the table 1. For example, assuming that p=0.0001, the probability of occurrence of a real dangerous event in the sliding window of a certain section group is 0.001, it can be confirmed by referring to table 1 that the sliding window risk class of the section group is class 3 risk.

S504, for each section group which does not accord with monotonicity, when the sliding window risk category of the section group is the same as the sliding window risk category of the section group which follows the section group, merging the section group with the section group which follows the section group; otherwise, the two interval packets after the interval packet are combined.

For example, for the ith section group that does not meet monotonicity, if its sliding window risk class is 1 and its sliding window risk class of the (i+1) th section group is also 1, the ith section group and the (i+1) th section group may be combined. For the ith interval group which does not accord with monotonicity, if the sliding window risk class is 1 class and the sliding window risk class of the (i+1) th interval group is 2 class, the (i+1) th interval group and the (i+2) th interval group can be combined.

For example, in the embodiment shown in fig. 7, since the intra-group hazard event true occurrence probabilities of the interval group No. 0 and the interval group No. 1 do not coincide with monotonic increases, and all belong to the same sliding window risk category (here, it is assumed that they belong to the same sliding window risk category as an example), the interval group No. 0 and the interval group No. 1 may be merged into one interval group. Similarly, since the probability of actually occurring a dangerous event in the group of interval 12 and interval 13 does not satisfy monotonic increasing property, and both belong to the same sliding window risk category (the same sliding window risk category is assumed as an example here), interval 12 and interval 13 may be combined into one interval. On this basis, the sequence numbers of the respective section packets are adaptively adjusted, so that the section packets shown in the second row in fig. 7 can be obtained, and then monotonicity judgment is performed on the section packets shown in the second row in fig. 7.

For the section packet shown in the second row in fig. 7, by judging confirmation: the actual occurrence probability of the dangerous event in the group of the No. 1 interval group and the No. 2 interval group does not accord with monotonic increment, the actual occurrence probability of the dangerous event in the group of the No. 11 interval group and the No. 12 interval group does not accord with monotonic increment, and the actual occurrence probability of the dangerous event in the group of the other two adjacent interval groups accord with monotonic increment. Therefore, the interval packet No. 1 and the interval packet No. 2 may be combined into one interval packet (here, the case where the same sliding window risk category is assumed), the interval packet No. 11 and the interval packet No. 12 may be combined into one interval packet (here, the case where the same sliding window risk category is assumed), and the sequence numbers of the respective interval packets may be adaptively adjusted on the basis of the same, so that the interval packet shown in the third row in fig. 7 may be obtained, and then the monotonicity determination may be performed on the interval packet shown in the third row in fig. 7.

For the section packet shown in the third line in fig. 7, by judging confirmation: the real occurrence probability of the dangerous event in the group of the 10 th interval group and the 11 th interval group does not accord with the monotonic increment, and the real occurrence probability of the dangerous event in the group of the other two adjacent interval groups accords with the monotonic increment. Therefore, the interval packet No. 10 and the interval packet No. 11 may be combined into one interval packet (here, the case of assuming that the same sliding window risk category belongs to), and the sequence numbers of the respective interval packets are adaptively adjusted on the basis of the interval packets, so that the interval packet shown in the fourth row in fig. 7 may be obtained, and then the monotonicity determination is performed on the interval packet shown in the fourth row in fig. 7.

For the section packet shown in the fourth line in fig. 7, by judging confirmation: monotonic increases are met between the true occurrence probabilities of the intra-group dangerous events of all adjacent two interval groups in the fourth row in fig. 7, so that monotonic processing of the plurality of interval groups is completed. Correspondingly, ten interval groups shown in the fourth row in fig. 7 are a plurality of interval groups obtained after monotonicity processing.

The individual interval groups obtained after monotonic processing guarantee a monotonic increase in risk probability, but cannot guarantee that each interval group is trusted (i.e. in line with facts). Therefore, in some embodiments of the present specification, after step S203, a hypothesis testing process may be further performed on the plurality of interval groups conforming to monotonicity to obtain a plurality of interval groups passing the hypothesis test. In this case, the classifying the plurality of interval groups according to the probability of occurrence of the dangerous event may include: and classifying the interval groups passing the hypothesis test according to the real occurrence probability of the dangerous event.

In some embodiments of the present disclosure, any suitable hypothesis testing method may be used to perform hypothesis testing on the multiple interval groups according to monotonicity, which is not limited in this disclosure, and may be specifically selected according to needs. For example, in an embodiment of the present specification, the hypothesis testing process may be performed on the plurality of section packets conforming to monotonicity based on the hypothesis test. The theoretical basis of hypothesis testing is the principle of small probability in probability theory, which considers that a small probability event should not occur in one observation. In other words, if a small probability event occurs in one observation, such a determination should be made: such a small probability event is not itself a small probability event, but a large probability event.

For example, in an exemplary embodiment, the hypothesis testing process may be performed for each monotonically compliant interval group based on a hypothesis testing method such as t-test, Z-test, chi-square test, or F-test. When the plurality of section packets conforming to monotonicity are all judged by the hypothesis test, step S205 may be performed. However, when there is an interval packet that fails the hypothesis test, the interval packet is combined with the following interval packet, and the hypothesis test process is performed again after the combination until each interval packet passes the hypothesis test.

For example, in the embodiment shown in fig. 7, assuming that the interval packet No. 5 does not pass the hypothesis test in the ten interval packets shown in the fourth row, the interval packet No. 5 and the interval packet No. 6 may be combined into a new interval packet, the sequence numbers of the respective interval packets are adaptively adjusted on the basis of the new interval packet, and then a round of hypothesis test is performed again. When all the section packets are judged by hypothesis testing, the section packets which are reserved at the moment are a plurality of section packets obtained after the hypothesis testing processing.

Referring to fig. 6, in some embodiments of the present disclosure, the classifying the plurality of interval packets passing the hypothesis test according to the probability of occurrence of the dangerous event may include the steps of:

S601, determining the real occurrence probability of dangerous events in the group of each interval group passing the hypothesis test.

S602, for each interval group passing the hypothesis test, determining the magnitude relation between the real occurrence probability of the dangerous event in the group and the real occurrence probability of the dangerous event.

In the embodiment of the present specification, the interval range of the risk category may be preset according to the probability of actually occurring the dangerous event of the historical service data. For example, the range of intervals in which the risk category may be preset according to the probability of occurrence of the dangerous event actually may be as shown in table 2 below. In table 2, mean _i is the probability of occurrence of a dangerous event in the group of the ith interval packet passing the hypothesis test, and P is the probability of occurrence of a dangerous event in the historical service data.

TABLE 2

For the ith interval group passing the hypothesis test, when the real occurrence probability mean _i of the dangerous event in the group is calculated, the size relation between the real occurrence probability mean and the dangerous event can be determined by looking up table 2.

S603, correspondingly determining the risk category of each interval group passing the hypothesis test according to the size relation.

As can be seen in conjunction with table 2 above, since each risk category has a unique corresponding interval range; after determining the probability of actually occurring a dangerous event in a group of interval groups passing the hypothesis test, the risk category to which the interval groups belong can be determined by the interval range to which the interval groups belong in table 2. For example, assuming that p=0.0005, the probability of occurrence of a real occurrence of a dangerous event in a group of a certain section group passing the hypothesis test is 0.0008, it can be confirmed by referring to table 2 that the risk class of the section group is a class 2 risk.

In some embodiments of the present disclosure, the determining the risk classification threshold according to the classification result may include: and combining all interval groups with the same risk category into one interval group, thereby obtaining a risk classification threshold.

For example, taking the embodiment shown in fig. 7 as an example, after determining the real occurrence probability (mean ₁～mean₁₀) of the dangerous event in the group of each interval group in the 4 th row in fig. 7, comparing the real occurrence probability with the real occurrence probability P of the dangerous event in the historical service data, a risk category of each interval group in the 4 th row in fig. 7 may be obtained (see, specifically, the risk category identifier below each interval group in the 4 th row in fig. 7). Since the risk categories of the interval group No. 0 to the interval group No. 3 in the 4 th row in fig. 7 are all 1 type, they can be combined into one interval group. Similarly, since the risk categories of the interval group No. 4 to the interval group No. 6 in the 4 th row in fig. 7 are all 2 types, they may be combined into one interval group; since the risk categories of the interval group No. 7 to the interval group No. 9 in the 4 th row in fig. 7 are all 3 categories, they may be combined into one interval group; only the interval packet No.10 in line 4 in fig. 7 has the risk category of 4, so the interval packet No.10 in line 4 in fig. 7 is not merged with any interval packet. Thus, four section packets shown in line 5 in fig. 7 can be obtained. Assume that the four section packets shown in line 5 in fig. 7 are: [0,0.2 ], [0.2,0.5 ], [0.5,0.8 ], [0.8,1], the risk classification threshold as shown in Table 3 below can be obtained.

TABLE 3 Table 3

Interval grouping	Risk category
		[0,0.2)	Class 1 risk (i.e. low risk)
[0.2,0.5)	Class 2 risk (i.e., middle risk)
		[0.5,0.8)	Class 3 risk (i.e. high risk)
[0.8,1]	Class 4 risk (i.e., ultra high risk)

And subsequently, carrying out risk identification on the service data corresponding to the historical service data according to the risk classification threshold shown in the table 3. For example, according to the preset risk prediction model, a risk prediction value of 0.3 of the service data is obtained, and since 0.3 is located in the range of the interval [0.2,0.5 ], the risk class of the service data can be a class 2 risk.

The following illustrates a risk of illness identification as an exemplary application scenario.

In the present exemplary application scenario, a disease risk prediction model (hereinafter referred to as a model) is constructed based on a linear discriminant analysis (LINEAR DISCRIMINANT ANALYSIS, LDA) algorithm.

Assuming that the current selection step length is 0.05, in the value range of 0-1, the prediction result output by the model can be divided into 20 basic groups according to the step length of 0.05. Because of the different algorithms ultimately employed within the model, the probability of illness output sometimes surrounds the true probability value, sometimes because the algorithm logic outputs the probability of illness for the individual. The present exemplary application scenario needs to consider the case where the output result surrounds around the true illness probability value, and then a packet for the true illness probability is added. As shown in fig. 8, the actual probability of illness of the current illness in the range of the history data is 0.001997252647941293, so that the first packet of 20 packets can be subdivided into the first 4 packets (i.e., the 0 th packet to the 3 rd packet in fig. 8) according to the actual probability value of illness, thereby forming the 0 th packet to the 22 nd packet in fig. 8.

With continued reference to fig. 8, after performing a round of monotonicity determination, it is found that: monotonic incrementation is not met between two partially adjacent packets (in a column corresponding to "monotonic" in fig. 8, true indicates monotonic incrementation is met, false indicates monotonic incrementation is not met). For example, monotone increment is not satisfied between the 0 th packet and the 1 st packet, between the 1 st packet and the 2 nd packet, between the 4 th packet and the 5 th packet, between the 9 th packet and the 10 th packet, between the 10 th packet and the 11 th packet, and between the 17 th packet and the 18 th packet. Then, through grouping merging adjustment, the monotonicity between every two adjacent groupings finally reaches the state of completely conforming to monotonicity increment as shown in fig. 9. It can be found that in this process, the original group 0, group 1 and group 2 are combined, the original group 4 and group 5 are combined, the original group 9, group 10 and group 11 are combined, and the original group 17 and group 18 are combined; that is, the 23 packets in fig. 8 are changed to 17 packets as shown in fig. 9 after packet merging adjustment.

On this basis, the risk category of each group in fig. 9 can be correspondingly determined by calculating the real occurrence probability of the dangerous event in the group of each group and determining the magnitude relation between the real occurrence probability and the real occurrence probability value (see the corresponding numerical value of the last column in fig. 9); and then combining the groups with the same risk classification level to obtain a final classification result. That is, the final result of the present exemplary application scenario is: and [0,0.2] is low-risk, (02,0.45) is medium-risk and (0.45,1) is high-risk, and the risk prediction value of the corresponding disease can be classified according to the risk classification threshold.

While the process flows described above include a plurality of operations occurring in a particular order, it should be apparent that the processes may include more or fewer operations, which may be performed sequentially or in parallel (e.g., using a parallel processor or a multi-threaded environment).

Corresponding to the above method for determining the risk classification threshold of the service data, the present disclosure further provides an embodiment of the device for determining the risk classification threshold of the service data. Referring to fig. 10, in some embodiments of the present disclosure, the service data risk classification threshold determining apparatus may include: an input data acquisition module 101, an interval grouping division module 102, a monotonicity processing module 103 and a classification threshold determination module 104. Wherein:

the input data acquisition module 101 may be configured to acquire a risk probability prediction value of historical service data and a real occurrence probability of a dangerous event;

The interval grouping division module 102 may be configured to divide the risk probability prediction value into a plurality of interval groups according to a size;

a monotonicity processing module 103, configured to perform monotonicity processing on the plurality of interval groups, so as to obtain a plurality of interval groups that conform to monotonicity;

the classification threshold determining module 104 may be configured to classify the multiple interval packets according to the real occurrence probability of the dangerous event, and determine a risk classification threshold according to a classification result, so as to be used for service risk prediction.

In some embodiments of the present disclosure, the traffic data risk classification threshold determining apparatus may further include a hypothesis testing module, which may be configured to perform a hypothesis testing process on the plurality of interval packets conforming to monotonicity, to obtain a plurality of interval packets passing the hypothesis test. Correspondingly, the classification threshold determining module 104 classifies the plurality of interval packets conforming to monotonicity according to the probability of occurrence of the dangerous event, and may include: the classification threshold determining module 104 classifies the plurality of interval packets passing the hypothesis test according to the real occurrence probability of the dangerous event.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

Embodiments of the present description also provide a computer device. As shown in fig. 11, in some embodiments of the present description, the computer device 1102 may include one or more processors 1104, such as one or more Central Processing Units (CPUs) or Graphics Processors (GPUs), each of which may implement one or more hardware threads. The computer device 1102 may also comprise any memory 1106 for storing any kind of information, such as code, settings, data, etc., and in a particular embodiment a computer program on the memory 1106 and executable on the processor 1104, which when executed by the processor 1104, may execute instructions according to the methods described above. For example, and without limitation, memory 1106 may comprise any one or more of the following combinations: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may store information using any technique. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of the computer device 1102. In one case, when the processor 1104 executes associated instructions stored in any memory or combination of memories, the computer device 1102 may perform any of the operations of the associated instructions. The computer device 1102 also includes one or more drive mechanisms 1108 for interacting with any memory, such as a hard disk drive mechanism, optical disk drive mechanism, and the like.

The computer device 1102 may also include an input/output module 1110 (I/O) for receiving various inputs (via an input device 1112) and for providing various outputs (via an output device 1114). One particular output mechanism may include a presentation device 1116 and an associated graphical user interface 1118 (GUI). In other embodiments, input/output module 1110 (I/O), input device 1112, and output device 1114 may not be included, but merely as a computer device in a network. The computer device 1102 may also include one or more network interfaces 1120 for exchanging data with other devices via one or more communication links 1122. One or more communication buses 1124 couple together the components described above.

The communication link 1122 may be implemented in any manner, for example, through a local area network, a wide area network (e.g., the internet), a point-to-point connection, etc., or any combination thereof. Communication link 1122 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc. governed by any protocol or combination of protocols.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to some embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processor to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processor, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processor to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processor to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computer device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description embodiments may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present embodiments may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processors that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present specification. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method for determining a risk classification threshold of service data, comprising:

Classifying the plurality of interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event, and determining a risk classification threshold according to a classification result so as to be used for predicting business risks;

Wherein the monotonicity processing for the plurality of interval groups includes:

When the real occurrence probability of the intra-group dangerous event of each two adjacent interval groups does not accord with the monotonic increment, the interval merging processing is carried out and the monotonic processing is carried out again until the real occurrence probability of the intra-group dangerous event of each two adjacent interval groups accords with the monotonic increment; the section merging processing includes:

2. The traffic data risk classification threshold determination method according to claim 1, further comprising, after said obtaining a plurality of interval packets conforming to monotonicity:

3. The traffic data risk classification threshold determination method according to claim 1, wherein said dividing the risk probability prediction value into a plurality of interval groups according to a size includes:

4. The traffic data risk classification threshold determination method according to claim 2, wherein said performing a hypothesis testing process on the plurality of monotonically compliant interval packets comprises:

5. The traffic data risk classification threshold determination method according to claim 2, wherein said classifying said plurality of interval packets passing said hypothesis test according to said probability of occurrence of a dangerous event, comprises:

6. The method for determining risk classification threshold of service data according to claim 5, wherein determining risk classification threshold according to classification result comprises:

7. A traffic data risk classification threshold determining apparatus, comprising:

The classification threshold determining module is used for classifying the plurality of interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event and determining a risk classification threshold according to a classification result so as to be used for predicting business risks;

8. The traffic data risk classification threshold determination apparatus of claim 7, further comprising:

9. A computer device comprising a memory, a processor, and a computer program stored on the memory, characterized in that the computer program, when being executed by the processor, performs the instructions of the method according to any of claims 1-6.

10. A computer storage medium having stored thereon a computer program, which, when executed by a processor of a computer device, performs the instructions of the method according to any of claims 1-6.