CN112927068A

CN112927068A - Method, device and equipment for determining risk classification threshold of business data and storage medium

Info

Publication number: CN112927068A
Application number: CN202110341559.0A
Authority: CN
Inventors: 曹若迪
Original assignee: Good Diagnosis Shanghai Information Technology Co ltd
Current assignee: Good Diagnosis Shanghai Information Technology Co ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-06-08
Anticipated expiration: 2041-03-30
Also published as: CN112927068B

Abstract

An embodiment of the present specification provides a method, an apparatus, a device and a storage medium for determining a risk classification threshold of service data, where the method includes: acquiring a risk probability predicted value of historical service data and a real occurrence probability of a dangerous event; dividing the risk probability predicted value into a plurality of interval groups according to the size; performing monotonicity processing on the plurality of interval groups to obtain a plurality of interval groups which accord with monotonicity; and classifying the plurality of interval groups which accord with monotonicity according to the real occurrence probability of the dangerous event, and determining a risk classification threshold according to a classification result so as to predict business risk. The embodiment of the specification can improve the accuracy and efficiency of setting the business data risk classification threshold.

Description

Method, device and equipment for determining risk classification threshold of business data and storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining a risk classification threshold of service data.

Background

Risk classification identification is involved in many business areas. For example, safe production risk classification identification in the industrial field; classifying and identifying credit loan risks in the financial field; network and information security risk classification and identification in the internet field; health risk classification identification in the medical and insurance fields, and the like.

Before the business risk classification identification is carried out, a business data risk classification threshold needs to be determined. In the prior art, the service data risk classification threshold is generally set manually according to experience. However, in many cases, the manually set service data risk classification threshold is not necessarily accurate, and when risk classification identification is performed on service data with the inaccurate service data risk classification threshold as a reference, it is difficult to obtain the true risk of the service data. Moreover, for each application scenario, a corresponding risk classification threshold needs to be manually and individually set, and the efficiency of setting the business data risk classification threshold is low.

Disclosure of Invention

An object of the embodiments of the present specification is to provide a method, an apparatus, a device, and a storage medium for determining a risk classification threshold of service data, so as to improve accuracy and efficiency of setting the risk classification threshold of service data.

In order to achieve the above object, in one aspect, an embodiment of the present specification provides a method for determining a risk classification threshold of service data, including:

acquiring a risk probability predicted value of historical service data and a real occurrence probability of a dangerous event;

dividing the risk probability predicted value into a plurality of interval groups according to the size;

performing monotonicity processing on the plurality of interval groups to obtain a plurality of interval groups which accord with monotonicity;

and classifying the plurality of interval groups which accord with monotonicity according to the real occurrence probability of the dangerous event, and determining a risk classification threshold according to a classification result so as to predict business risk.

In an embodiment of the present specification, after obtaining the plurality of interval groups conforming to monotonicity, the method further includes:

performing hypothesis testing processing on the interval groups conforming to monotonicity to obtain a plurality of interval groups passing hypothesis testing;

correspondingly, the classifying the monotonicity-compliant interval groups according to the real occurrence probability of the dangerous event includes:

and classifying the plurality of interval groups passing the hypothesis test according to the true occurrence probability of the dangerous event.

In an embodiment of the present specification, the dividing the risk probability prediction value into a plurality of interval groups according to size includes:

dividing the risk probability predicted value into M interval groups according to the size; m is a positive integer greater than 1;

determining a home interval group of the M interval groups of the true occurrence probability of the dangerous event;

further dividing the attribution interval grouping into N interval groupings according to the real occurrence probability of the dangerous events; n is a positive integer greater than 1;

the overlapping portions of the N section groups and the M section groups are further divided and grouped together with the non-overlapping portions as a plurality of section groups.

In an embodiment of the present specification, the performing monotonicity processing on the plurality of section groups includes:

determining the real occurrence probability of dangerous events in each interval group;

judging whether the real occurrence probability of the dangerous events in the groups grouped by every two adjacent intervals accords with monotone increasement;

and when the real occurrence probabilities of the dangerous events in the groups of two adjacent interval groups do not accord with the monotone increasement, carrying out interval combination processing and carrying out monotone processing again until the real occurrence probabilities of the dangerous events in the groups of two adjacent interval groups accord with the monotone increasement.

In an embodiment of the present specification, the performing section merging processing includes:

determining the real occurrence probability of dangerous events in the sliding window of each interval group;

for each interval group, determining the magnitude relation between the real occurrence probability of the dangerous event in the sliding window and the real occurrence probability of the dangerous event;

correspondingly determining the sliding window risk category of each interval group according to the size relation;

for each interval group which does not conform to monotonicity, when the sliding window risk category of the interval group is the same as that of the next interval group, merging the interval group with the next interval group;

and for each interval group which does not conform to the monotonicity, merging two interval groups after the interval group when the sliding window risk class of the interval group is different from that of the interval group after the interval group.

In an embodiment of the present specification, the performing hypothesis testing processing on the plurality of interval groups conforming to monotonicity includes:

performing hypothesis test processing on the plurality of interval groups which accord with monotonicity;

when the interval group does not pass the hypothesis test, the interval group is combined with the next interval group, and the hypothesis test processing is carried out again after the combination until each interval group passes the hypothesis test.

In an embodiment of the present specification, the classifying the plurality of interval groups subjected to hypothesis testing according to the true occurrence probability of the dangerous event includes:

determining the real occurrence probability of dangerous events in each group grouped by the hypothesis test interval;

for each interval grouping passing hypothesis testing, determining the magnitude relation between the real occurrence probability of the dangerous events in the group and the real occurrence probability of the dangerous events;

and correspondingly determining the risk category of each interval grouping passing the hypothesis test according to the size relation.

In an embodiment of the present specification, the determining a risk classification threshold according to a classification result includes:

and combining all the interval groups with the same risk category into one interval group to obtain a risk classification threshold.

On the other hand, an embodiment of the present specification further provides a device for determining a risk classification threshold of service data, including:

the input data acquisition module is used for acquiring a risk probability predicted value of historical service data and a real occurrence probability of a dangerous event;

the interval grouping and dividing module is used for dividing the risk probability predicted value into a plurality of interval groups according to the size;

the monotonicity processing module is used for performing monotonicity processing on the plurality of interval groups to obtain a plurality of interval groups which accord with monotonicity;

and the classification threshold determining module is used for classifying the plurality of interval groups which accord with monotonicity according to the real occurrence probability of the dangerous event and determining a risk classification threshold according to a classification result so as to predict the service risk.

In an embodiment of the present specification, the apparatus further includes:

a hypothesis testing module, configured to perform hypothesis testing processing on the plurality of interval packets that conform to monotonicity to obtain a plurality of interval packets that pass hypothesis testing;

correspondingly, the classifying threshold determining module classifies the plurality of interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event, including:

and the classification threshold determination module classifies the plurality of interval groups passing the hypothesis test according to the real occurrence probability of the dangerous event.

In another aspect, the present specification further provides a computer storage medium, on which a computer program is stored, and the computer program is executed by a processor of a computer device to execute the instructions of the method.

According to the technical scheme provided by the embodiment of the specification, the embodiment of the specification can automatically calculate the risk classification threshold according to historical service data, and compared with the mode of manually setting the risk classification threshold, the mode greatly improves the efficiency of setting the risk classification threshold of the service data. Moreover, in the embodiments of the present description, a plurality of initial interval groups are subjected to monotonicity processing, so that the initial interval groups can meet the monotonicity requirement; and then classifying the interval groups conforming to monotonicity according to the real occurrence probability of the dangerous events, and determining a risk classification threshold according to a classification result for service risk prediction, so that the service data risk classification threshold determined by the embodiment of the specification is more accurate, and the accuracy of subsequent service risk identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. In the drawings:

fig. 1 is a schematic diagram illustrating an application scenario of a method for determining a risk classification threshold of business data in an embodiment of the present specification;

FIG. 2 illustrates a flow chart of a method for determining a traffic data risk classification threshold in some embodiments of the present description;

FIG. 3 is a flow chart illustrating section grouping in one embodiment of the present specification;

FIG. 4 shows a flow diagram of monotonicity processing in an embodiment of the specification;

fig. 5 is a flowchart showing a section merge process in the monotonicity process in an embodiment of the present specification;

FIG. 6 is a flow chart illustrating interval classification in one embodiment of the present specification;

FIG. 7 is a schematic diagram illustrating a process of determining a risk classification threshold of traffic data according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of the determination of the risk classification threshold of prevalence probability (before group adjustment) in one embodiment of the present specification;

FIG. 9 is a schematic diagram of the determination of the risk of morbidity risk classification threshold (after group adjustment) in one embodiment of the present specification;

fig. 10 is a block diagram illustrating an architecture of a traffic data risk classification threshold determination apparatus in some embodiments of the present disclosure;

FIG. 11 is a block diagram illustrating the architecture of a computer device in some embodiments of the present description.

[ description of reference ]

10. A service data risk classification threshold determination device;

20. a database;

30. a business data risk identification device;

40. a business system;

101. an input data acquisition module;

102. an interval grouping and dividing module;

103. a monotonicity processing module;

104. a classification threshold determination module;

1102. a computer device;

1104. a processor;

1106. a memory;

1108. a drive mechanism;

1110. an input/output module;

1112. an input device;

1114. an output device;

1116. a presentation device;

1118. a graphical user interface;

1120. a network interface;

1122. a communication link;

1124. a communication bus.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

The present specification relates to a technology for determining a risk classification threshold of service data, which can be applied to any service scenario requiring service risk identification. For example, may include, but is not limited to: a safety production risk classification and identification scene in the industrial field; a classification identification scene of credit loan risks in the financial field; network and information security risk classification and identification scenes in the field of the Internet; a health risk classification identification scene of the insurance field; a health risk classification identification scene in the medical field; and (4) identifying scenes of the classification of the driving risks in the automatic driving field and the like.

In view of the problem that the accuracy of the manually set business data risk classification threshold is low, the embodiment of the present specification provides a method that can automatically determine a business data risk classification gate. Referring to fig. 1, the method for determining a business data risk classification threshold according to the embodiment of the present disclosure may be used in a business data risk classification threshold determining apparatus 10. The service data risk classification threshold determination device 10 may obtain historical service data from the database 20; automatically calculating a risk classification threshold of the historical service data according to the historical service data; and provides the risk classification threshold to the service data risk identification device 30, so that the service data risk identification device 30 can perform risk identification on the service data provided by the service system 40 according to the risk classification threshold. In implementation, the service data risk classification threshold determining device 10 and the service data risk identifying device 30 may be configured on different computer devices, or may be configured on the same computer device, and may be specifically selected according to actual needs.

Referring to fig. 2, in some embodiments of the present disclosure, the method for determining a risk classification threshold of service data may include the following steps:

s201, acquiring a risk probability predicted value of historical business data and a real occurrence probability of a dangerous event.

S202, dividing the risk probability predicted value into a plurality of interval groups according to the size.

S203, performing monotonicity processing on the plurality of interval groups to obtain a plurality of interval groups conforming to monotonicity.

And S204, classifying the plurality of interval groups which accord with monotonicity according to the real occurrence probability of the dangerous event, and determining a risk classification threshold according to a classification result for service risk prediction.

The embodiment of the specification can automatically calculate the risk classification threshold according to historical service data, and compared with the method of manually setting the risk classification threshold, the method greatly improves the efficiency of setting the risk classification threshold of the service data. Moreover, in the embodiments of the present description, a plurality of initial interval groups are subjected to monotonicity processing, so that the initial interval groups can meet the monotonicity requirement; and then classifying the interval groups conforming to monotonicity according to the real occurrence probability of the dangerous events, and determining a risk classification threshold according to a classification result for service risk prediction, so that the service data risk classification threshold determined by the embodiment of the specification is more accurate, and the accuracy of subsequent service risk identification is improved.

In the embodiments of the present specification, determining the risk classification threshold is an interval for dividing the risk classification, specifically, determining the number of intervals of the risk classification and the end point value of each interval. For example, in an exemplary embodiment, a value range of the risk probability is [0,1], and based on the method for determining the risk classification threshold of the service data, if [0,1] can be divided into five risk classification intervals, namely [0,0.2], [0.2,0.5 ], [0.5,0.8 ], [0.8,1], the five risk classification intervals are the risk classification threshold to be determined.

In some embodiments of the present specification, the true occurrence probability of a dangerous event of historical business data refers to: the ratio of the number of samples in the historical traffic data in which a dangerous event has actually occurred to the total number of samples in the historical traffic data. For example, the total number of samples of the historical service data of the last three years is 1000000, wherein the number of samples in which a dangerous event actually occurs is 1000, the true occurrence probability of the dangerous event in the historical service data is:

in some embodiments of the present description, the risk probability prediction value of each sample in the historical service data may be calculated according to any suitable risk prediction model, specifically, the risk prediction model may be selected according to an actual application scenario, and this is not specifically limited in this description. For example, for autodrive risk, a suitable autodrive risk prediction model may be used for prediction; for the leakage risk of the oil and gas pipeline, a proper oil and gas pipeline leakage risk prediction model can be used for prediction; for financial institution credit risk, predictions may be made using a suitable credit risk prediction model; for the risk of disease, a suitable risk of disease prediction model can be used for prediction.

The risk probability predicted value obtained in the last step is firstly divided into a plurality of interval groups according to the size, so that a proper risk classification threshold can be conveniently obtained on the basis of the size. However, in the same application scenario, for the same historical business data sample, when different risk prediction models are used for prediction, the output risk probability prediction values may be different, but basically around the real occurrence probability of the dangerous event. Specifically, the above-mentioned differences can be mainly classified into the following two types: one is to judge the risk probability predicted value of a single sample, and only aims at the individual sample; one is the true probability of occurrence of the dangerous event (i.e., the true probability of occurrence of the dangerous event in the group) within the population sample range in which the individual sample is located, which is closer to the true probability of occurrence of the dangerous event. Therefore, it is not enough to meet the two situations simply by equally dividing the value range [0,1] of the risk probability into a plurality of interval groups, so that a plurality of interval groups aiming at the real occurrence probability of the dangerous event need to be added.

Therefore, in some embodiments of the present specification, as shown in fig. 3, the dividing the risk probability prediction value into a plurality of interval groups according to size may include the following steps:

s301, dividing the risk probability predicted value into M interval groups according to the size; m is a positive integer greater than 1.

Wherein, the size of M can be set according to actual needs. For example, in an exemplary embodiment, the risk probability has a value range of [0,1], and assuming that [0,1] and so on are divided into 20 parts (where M is 20) in steps of 0.05, 20 equally divided interval groups are obtained. On this basis, according to the magnitude of each risk probability prediction value, the home interval group of the risk probability prediction values in 20 equally divided interval groups can be determined. For example, a certain risk probability prediction value is 0.008, which falls within the range of the fourth interval group (0.0005, 0.1) of the 20 equally divided interval groups, and thus, the fourth interval group (0.0005, 0.1) is the home interval group of the risk probability prediction value 0.008.

S302, determining the attribution interval grouping of the real occurrence probability of the dangerous event in the M interval groupings.

Also taking the fourth interval group (0.0005, 0.1) of the 20 equally divided interval groups as an example, since the real occurrence probability 0.001 of the dangerous event in the historical service data falls within the range of the fourth interval group (0.0005, 0.1), the fourth interval group (0.0005, 0.1) is the home interval group of the risk probability prediction value 0.001.

S303, further dividing the attribution interval group into N interval groups according to the real occurrence probability of the dangerous event; n is a positive integer greater than 1.

The size of N may be set according to actual needs, and in general, 1< N < M. For example, for the home interval group (0.0005, 0.1) with the true occurrence probability of 0.001 of the dangerous event, the group can be split into three interval groups (0.0005, 0.001), (0.001, 0.01), (0.01, 0.1).

S304, the overlapped parts of the N interval groups and the M interval groups are further divided, and the N interval groups and the non-overlapped parts are used as a plurality of interval groups together.

For example, in the case of the 20 equally divided section groups, the other 19 section groups excluding the fourth section group (0.0005, 0.1) and three section groups divided based on the fourth section group (0.0005, 0.1) may be collectively divided into a plurality of section groups.

Each value in the interval grouping represents a risk probability predicted value of an individual sample, and when the actual occurrence probability of the dangerous event in the group of the interval grouping is higher, the risk probability predicted value of each sample in the interval grouping is higher correspondingly. However, sometimes, there is fluctuation near the critical point of the interval grouping, so that the real occurrence probability of dangerous events in the group of the interval grouping is not monotonously increased. Therefore, it is necessary to determine to which section group the fluctuating portion should be attributed, so that the probability of true occurrence of the intra-group risk event of all the section groups increases with the section group.

Thus, referring to fig. 4, in some embodiments of the present disclosure, the monotonicity processing of the plurality of section groups may include:

s401, determining the real occurrence probability of dangerous events in each interval group.

S402, judging whether the real occurrence probability of the dangerous events in the groups grouped in every two adjacent intervals accords with monotone increasement.

And judging whether the real occurrence probability of the dangerous events in the groups of every two adjacent interval groups accords with monotone incrementation or not, namely judging whether the real occurrence probability of the dangerous events in the groups of one interval group is greater than the real occurrence probability of the dangerous events in the groups of the previous interval group.

For example, in the embodiment shown in FIG. 7, there are sixteen zone packets from 0 to 15. Starting from the grouping of the No. 0 interval, judging whether the real occurrence probability of the dangerous events in the group of the No. 0 interval grouping and the real occurrence probability of the dangerous events in the group of the No. 1 interval grouping accord with monotonic increment or not, namely judging whether the real occurrence probability of the dangerous events in the group of the No. 1 interval grouping is greater than the real occurrence probability of the dangerous events in the group of the No. 0 interval grouping. If the real occurrence probability of the dangerous events in the group grouped in the No. 1 interval is greater than the real occurrence probability of the dangerous events in the group grouped in the No. 0 interval, the real occurrence probability of the dangerous events in the group grouped in the No. 0 interval and the real occurrence probability of the dangerous events in the group grouped in the No. 1 interval are considered to accord with monotonic increasing property; otherwise, the real occurrence probability of the dangerous events in the groups of the interval group 0 and the interval group 1 is not considered to be monotone increasing. As shown in the first row interval group in fig. 7, by judging confirmation: the real occurrence probabilities of dangerous events in the groups of the interval group No. 0 and the interval group No. 1 do not accord with monotonic increment, the real occurrence probabilities of dangerous events in the groups of the interval group No. 12 and the interval group No. 13 do not accord with monotonic increment, and the real occurrence probabilities of dangerous events in the groups of the other two adjacent interval groups all accord with monotonic increment.

And S403, when the real occurrence probabilities of the dangerous events in the groups grouped by two adjacent intervals do not accord with the monotone increasement, carrying out interval combination processing and carrying out monotone processing again until the real occurrence probabilities of the dangerous events in the groups grouped by two adjacent intervals accord with the monotone increasement.

As shown in fig. 5, in some embodiments of the present specification, the performing section merging processing in step S403 may further include the following steps:

s501, determining the real occurrence probability of the dangerous events in the sliding window of each interval group.

In the embodiment of the present specification, the real occurrence probability of the dangerous event in the sliding window is used for the merging range of the subsequent judgment interval group. The real occurrence probability of the dangerous events in the sliding window can overcome the problem of abnormal risk probability possibly caused by abnormal local sample distribution in a single interval group; therefore, the actual occurrence probability of the dangerous events in the sliding window is used for grouping and combining, so that the business data risk classification threshold determined by the embodiment of the specification is more accurate; and further the accuracy of subsequent business risk identification is further improved.

The size of the sliding window of the actual occurrence probability of the dangerous event in the sliding window can be selected according to actual needs, for example, in an embodiment of the present specification, the size of the sliding window can be grouped into three intervals. In this case, for the current ith interval group, the real occurrence probability of dangerous event in the sliding window can be according to the formula

And (4) calculating. Wherein ave is the real occurrence probability of the dangerous event in the sliding window of the ith interval group, sum (i-1), sum (i +1) are the number of risk individuals (i.e. individuals having dangerous event) in the ith-1, ith and ith +1 interval groups respectively, and count (i-1), count (i), (i) and count (i +1) are respectivelyThe number of samples in groups in the i-1 th, i-1 th and i +1 th interval groups respectively. In an embodiment of the present specification, for a first-ranked interval group of a plurality of interval groups, it may be default that the number of samples in the group and the number of risk individuals in the previous interval group are both zero. In another embodiment of the present specification, for the interval packet arranged at the last bit in the plurality of interval packets, since no interval packet can be merged thereafter, it may not be necessary to calculate the true occurrence probability of the dangerous event in the sliding window.

Those skilled in the art will appreciate that the above-described method of calculating the true probability of occurrence of a hazardous event within a sliding window is merely exemplary. In other embodiments of this specification, the data smoothing algorithm may be implemented by a weighted moving average, SG (Savitsky-Golay) filtering, alpha (α) mean filtering, or the like, according to actual needs, and this specification does not limit this.

S502, for each interval group, determining the magnitude relation between the real occurrence probability of the dangerous event in the sliding window and the real occurrence probability of the dangerous event.

In the embodiment of the present specification, the interval range of the sliding window risk category may be preset according to the actual occurrence probability of the dangerous event of the historical business data. For example, the range of the interval of the sliding window risk category, which may be preset according to the true occurrence probability of the dangerous event, may be as shown in table 1 below. In Table 1, ave_iAnd the real occurrence probability of the dangerous event in the sliding window grouped for the ith interval, wherein P is the real occurrence probability of the dangerous event.

TABLE 1

For the ith interval group, when the real occurrence probability ave of the dangerous event in the sliding window is calculated_iThe magnitude relation between the actual occurrence probability of the dangerous event and the actual occurrence probability of the dangerous event can be determined by looking up the table 1.

And S503, correspondingly determining the sliding window risk category of each interval group according to the size relationship.

As can be seen by combining table 1 above, each sliding window risk category has a unique corresponding interval range; after the actual occurrence probability of the dangerous event in the sliding window of one interval group is determined, the sliding window risk category to which the dangerous event belongs can be determined according to the range of the interval to which the dangerous event belongs in the table 1. For example, assuming that P is 0.0001 and the probability of true occurrence of a dangerous event in the sliding window of a certain section group is 0.001, it can be confirmed by looking up table 1 that the sliding window risk category of the section group is a class 3 risk.

S504, for each interval group which does not conform to monotonicity, merging the interval group with the next interval group when the sliding window risk class of the interval group is the same as that of the next interval group; otherwise, merging the two interval groups after the interval group.

For example, for the ith section group that does not conform to monotonicity, if the sliding window risk category is 1 class and the sliding window risk category of the (i +1) th section group is also 1 class, the (i) th section group and the (i +1) th section group may be merged. For the ith interval group which does not conform to monotonicity, if the sliding window risk category of the ith interval group is 1 type and the sliding window risk category of the (i +1) th interval group is 2 type, the (i +1) th interval group and the (i + 2) th interval group can be combined.

For example, in the embodiment shown in fig. 7, since the real occurrence probabilities of the dangerous events in the group of the interval group No. 0 and the interval group No. 1 do not conform to the monotonic increase and both belong to the same sliding window risk category (assuming that the dangerous events belong to the same sliding window risk category as an example), the interval group No. 0 and the interval group No. 1 can be combined into one interval group. Similarly, since the real probability of occurrence of the dangerous event in the group of the interval group No. 12 and the interval group No. 13 does not conform to the monotonic increase, and both belong to the same sliding window risk category (assuming that the dangerous event belongs to the same sliding window risk category as an example), the interval group No. 12 and the interval group No. 13 can also be combined into one interval group. On the basis of this, the sequence numbers of the respective section packets are adaptively adjusted so that the section packet shown in the second row in fig. 7 can be obtained, and then the section packet shown in the second row in fig. 7 is monotonously judged.

For the section group shown in the second row in fig. 7, by judgment confirmation: the real occurrence probability of the dangerous events in the groups of the No. 1 interval group and the No. 2 interval group does not accord with monotonic increment, the real occurrence probability of the dangerous events in the groups of the No. 11 interval group and the No. 12 interval group does not accord with monotonic increment, and the real occurrence probabilities of the dangerous events in the groups of the other two adjacent interval groups accord with monotonic increment. Therefore, the section group No. 1 and the section group No. 2 may be combined into one section group (assuming that they belong to the same sliding window risk category, for example), the section group No. 11 and the section group No. 12 may be combined into one section group (assuming that they belong to the same sliding window risk category, for example), and the sequence number of each section group is adaptively adjusted based on this, so that the section group shown in the third row in fig. 7 may be obtained, and then the section group shown in the third row in fig. 7 may be monotonously determined.

For the section packet shown in the third row in fig. 7, it is confirmed by judgment that: the real occurrence probability of the dangerous events in the groups of the No. 10 interval group and the No. 11 interval group does not accord with the monotone increasing performance, and the real occurrence probability of the dangerous events in the groups of the other two adjacent interval groups accords with the monotone increasing performance. Therefore, the section group No. 10 and the section group No. 11 can be combined into one section group (assuming that they belong to the same sliding window risk category as an example), and the sequence numbers of the respective section groups are adaptively adjusted on the basis of the section group No. 10 and the section group No. 11, so that the section group shown in the fourth row in fig. 7 can be obtained, and then the section group shown in the fourth row in fig. 7 is monotonously judged.

For the section group shown in the fourth row in fig. 7, it is confirmed by judgment that: the real occurrence probabilities of the dangerous events in the group of all the adjacent two interval groups in the fourth row in the graph in FIG. 7 are consistent with monotonous incrementation, and thus the monotonicity processing of the interval groups is completed. Accordingly, the ten interval groups shown in the fourth row in fig. 7 are the plurality of interval groups obtained after the monotonicity processing.

The interval groups obtained by monotonicity processing guarantee monotonous increase of risk probability, but each interval group cannot be guaranteed to be credible (namely, consistent with the fact). Therefore, in some embodiments of the present specification, after step S203, the plurality of interval groups conforming to monotonicity may further be subjected to hypothesis testing processing to obtain a plurality of interval groups passing hypothesis testing. In this case, the classifying the monotonicity-compliant interval groups according to the real occurrence probability of the dangerous event may include: and classifying the plurality of interval groups passing the hypothesis test according to the true occurrence probability of the dangerous event.

In some embodiments of the present specification, any suitable hypothesis testing processing method may be used to perform hypothesis testing processing on the plurality of interval groups that are consistent with monotonicity, which is not limited in the present specification and may be specifically selected according to needs. For example, in an embodiment of the present specification, the plurality of interval groups that conform to monotonicity may be subjected to hypothesis test processing based on hypothesis test. The theoretical basis of hypothesis testing is the principle of small probability in probability theory, which considers that small probability events should not occur in one observation. In other words, if a small probability event occurs in an observation, a determination should be made that: such a small probability event is not itself a small probability event but a large probability event.

For example, in an exemplary embodiment, the hypothesis testing process may be performed on each interval packet that conforms to monotonicity based on a hypothesis testing method such as t-test, Z-test, chi-square test, or F-test. When the plurality of section packets conforming to monotonicity are all judged by the hypothesis test, step S205 may be performed. However, when there is a section packet that fails the hypothesis test, the section packet is combined with the subsequent section packet, and the hypothesis test process is performed again after the combination until each section packet passes the hypothesis test.

For example, for the embodiment shown in fig. 7, assuming that the No. 5 section packet does not pass the hypothesis test among the ten section packets shown in the fourth row, the No. 5 section packet and the No. 6 section packet may be merged into a new section packet, and the sequence numbers of the respective section packets may be adaptively adjusted based on the merged section packet and then a round of hypothesis test may be performed again. When all the interval groups are judged through hypothesis testing, the reserved interval groups are a plurality of interval groups obtained after hypothesis testing processing.

Referring to fig. 6, in some embodiments of the present disclosure, the classifying the interval groups passing through hypothesis testing according to the true occurrence probability of the dangerous event may include:

s601, determining the real occurrence probability of dangerous events in each group grouped by the hypothesis test.

S602, grouping each interval passing hypothesis test, and determining the magnitude relation between the real occurrence probability of the dangerous events in the groups and the real occurrence probability of the dangerous events.

In the embodiment of the present specification, the interval range of the risk category may be preset according to the actual occurrence probability of the dangerous event of the historical business data. For example, the range of the interval in which the risk category can be preset according to the true occurrence probability of the dangerous event can be shown in table 2 below. In Table 2, mean_iAnd P is the real occurrence probability of the dangerous events in the group grouped by the ith interval passing the hypothesis test.

TABLE 2

For the ith interval grouping which passes the hypothesis test, when calculating the true occurrence probability mean of the dangerous event in the group_iThe magnitude relation between the actual occurrence probability of the dangerous event and the actual occurrence probability of the dangerous event can be determined by looking up the table 2.

S603, correspondingly determining the risk category of each interval grouping passing the hypothesis test according to the size relation.

As can be seen by combining table 2 above, each risk category has a unique corresponding interval range; after the probability of the real occurrence of the dangerous event in the group grouped by the hypothesis test is determined, the risk category to which the dangerous event belongs can be determined according to the range of the section to which the dangerous event belongs in the table 2. For example, assuming that P is 0.0005, the probability of true occurrence of a dangerous event in a group of a certain interval group passing the hypothesis test is 0.0008, and the risk category of the interval group can be confirmed as a class 2 risk by referring to table 2.

In some embodiments of the present specification, the determining a risk classification threshold according to the classification result may include: and combining all interval groups with the same risk category into one interval group, thereby obtaining the risk classification threshold.

For example, taking the embodiment shown in FIG. 7 as an example, the probability of true occurrence of a hazard event in a group (i.e., mean) for each interval group in row 4 of FIG. 7 is determined₁～mean₁₀) Then, comparing the actual occurrence probability P with the dangerous events in the historical service data, the risk category of each interval group in the 4 th row in fig. 7 can be obtained (see the risk category identifier below each interval group in the 4 th row in fig. 7 specifically). Since the risk categories of section 0 to section 3 groups in row 4 in fig. 7 are all 1 type, they can be combined into one section group. Similarly, since the risk categories of the section 4 group to the section 6 group in the 4 th row in fig. 7 are all 2 types, they may be combined into one section group; since the risk categories of the interval group No. 7 to the interval group No. 9 in the 4 th row in fig. 7 are all 3 types, they may be combined and grouped into one interval group; the risk category of only the section group No. 10 in the 4 th row in fig. 7 is 4 types, and therefore the section group No. 10 in the 4 th row in fig. 7 is not merged with any section group. Thus, four span groupings shown in line 5 of FIG. 7 can be obtained. Assume that the four interval groups shown in line 5 in fig. 7 are: [0,0.2), [0.2,0.5), [0.5,0.8), [0.8,1]Then the risk classification threshold as shown in table 3 below can be obtained.

TABLE 3

Section grouping	Risk classes
		[0,0.2)	Class 1 risk (i.e., low risk)
[0.2,0.5)	Class 2 risk (i.e. intermediate risk)
		[0.5,0.8)	Class 3 risk (i.e., high risk)
[0.8,1]	Class 4 risk (i.e., ultra-high risk)

Subsequently, risk identification can be performed on the service data corresponding to the historical service data according to the risk classification threshold shown in table 3. For example, the risk prediction value of one service data obtained according to the preset risk prediction model is 0.3, and since 0.3 is located in the range of the interval [0.2,0.5), the risk category of the service data can be 2 types of risks.

The identification of the risk of illness is exemplified below as an exemplary application scenario.

In the exemplary application scenario, the disease risk prediction model (hereinafter referred to as "model") is constructed based on a Linear Discriminant Analysis (LDA) algorithm.

Assuming that the current selection step size is 0.05, the prediction result output by the model can be divided into 20 basic groups according to the step size of 0.05 in the value range of 0-1. Because the algorithms finally adopted in the model are different, the output disease probability sometimes surrounds the real disease probability value, and sometimes the algorithm logic outputs the disease probability for the individual. The present exemplary application scenario needs to consider the case where the output result surrounds around the true prevalence probability value, and thus adds a packet for the true prevalence probability. As shown in fig. 8, the true prevalence probability of the current disease in the history data range is 0.001997252647941293, and then the first packet of the 20 packets can be subdivided into the first 4 packets (i.e., packet No. 0 to packet No. 3 in fig. 8) according to the true prevalence probability value, thereby forming packet No. 0 to packet No. 22 in fig. 8.

Continuing with FIG. 8, after a single round of monotonicity determination, it is found that: there is no monotone increasing property between some adjacent two packets (in a column corresponding to "monotone" in fig. 8, True indicates that there is monotone increasing property, and False indicates that there is no monotone increasing property). For example, monotone increments are not satisfied between the packet No. 0 and the packet No. 1, between the packet No. 1 and the packet No. 2, between the packet No. 4 and the packet No. 5, between the packet No. 9 and the packet No. 10, between the packet No. 10 and the packet No. 11, and between the packet No. 17 and the packet No. 18. Then, through grouping combination adjustment, the monotonicity between every two adjacent groups is finally enabled to reach a completely consistent monotonicity increasing state as shown in the figure 9. It can be found that in the process, the original No. 0 packet, the original No. 1 packet and the original No. 2 packet are merged, the original No. 4 packet and the original No. 5 packet are merged, the original No. 9 packet, the original No. 10 packet and the original No. 11 packet are merged, and the original No. 17 packet and the original No. 18 packet are merged; that is, the 23 packets in fig. 8 are changed into 17 packets as shown in fig. 9 after being subjected to packet merging adjustment.

On the basis, by calculating the real occurrence probability of the risk event in each group of each group in fig. 9 and determining the magnitude relationship between the real occurrence probability and the real disease probability value, the risk category of the group can be correspondingly determined (see the corresponding numerical value in the last column in fig. 9); and then combining the groups with the same risk classification grade to obtain a final classification result. That is, the end result of this exemplary application scenario is: the low risk is [0,0.2], the medium risk is [ 02,0.45], and the high risk is [ 0.45,1], and then the risk prediction values of the corresponding diseases can be classified according to the risk classification threshold.

While the process flows described above include operations that occur in a particular order, it should be appreciated that the processes may include more or less operations that are performed sequentially or in parallel (e.g., using parallel processors or a multi-threaded environment).

Corresponding to the method for determining the business data risk classification threshold, the present specification also provides an embodiment of a device for determining the business data risk classification threshold. Referring to fig. 10, in some embodiments of the present specification, the traffic data risk classification threshold determining apparatus may include: an input data acquisition module 101, an interval grouping and dividing module 102, a monotonicity processing module 103 and a classification threshold determination module 104. Wherein:

the input data acquisition module 101 may be configured to acquire a risk probability prediction value of historical service data and a true occurrence probability of a dangerous event;

an interval grouping and dividing module 102, configured to divide the risk probability prediction value into a plurality of interval groups according to size;

a monotonicity processing module 103, configured to perform monotonicity processing on the multiple interval packets to obtain multiple interval packets meeting monotonicity;

the classification threshold determining module 104 may be configured to classify the plurality of interval groups meeting the monotonicity according to the real occurrence probability of the dangerous event, and determine a risk classification threshold according to a classification result, so as to predict a business risk.

In some embodiments of the present specification, the apparatus for determining a risk classification threshold of service data may further include a hypothesis testing module, which may be configured to perform a hypothesis testing process on the monotonicity-compliant interval packets to obtain interval packets that pass the hypothesis testing. Correspondingly, the classifying threshold determining module 104 classifies the plurality of interval groups conforming to monotonicity according to the real occurrence probability of the dangerous event, which may include: the classification threshold determination module 104 classifies the plurality of interval groups passing hypothesis testing according to the true occurrence probability of the dangerous event.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

Embodiments of the present description also provide a computer device. As shown in FIG. 11, in some embodiments of the present description, the computer device 1102 may include one or more processors 1104, such as one or more Central Processing Units (CPUs) or Graphics Processors (GPUs), each of which may implement one or more hardware threads. The computer device 1102 may also include any memory 1106 for storing any kind of information, such as code, settings, data, etc., and in a particular embodiment a computer program that is run on the memory 1106 and on the processor 1104, which computer program, when executed by the processor 1104, may perform instructions according to the above-described method. For example, and without limitation, memory 1106 may include any one or more of the following in combination: any type of RAM, any type of ROM, flash memory devices, hard disks, optical disks, etc. More generally, any memory may use any technology to store information. Further, any memory may provide volatile or non-volatile retention of information. Further, any memory may represent fixed or removable components of computer device 1102. In one case, when the processor 1104 executes the associated instructions, which are stored in any memory or combination of memories, the computer device 1102 can perform any of the operations of the associated instructions. The computer device 1102 also includes one or more drive mechanisms 1108, such as a hard disk drive mechanism, an optical disk drive mechanism, etc., for interacting with any memory.

Computer device 1102 can also include an input/output module 1110(I/O) for receiving various inputs (via input device 1112) and for providing various outputs (via output device 1114). One particular output mechanism may include a presentation device 1116 and an associated graphical user interface 1118 (GUI). In other embodiments, input/output module 1110(I/O), input device 1112, and output device 1114 may also be excluded, as only one computer device in a network. Computer device 1102 can also include one or more network interfaces 1120 for exchanging data with other devices via one or more communication links 1122. One or more communication buses 1124 couple the above-described components together.

Communication link 1122 may be implemented in any manner, e.g., via a local area network, a wide area network (e.g., the Internet), a point-to-point connection, etc., or any combination thereof. Communications link 1122 may include any combination of hardwired links, wireless links, routers, gateway functions, name servers, etc., governed by any protocol or combination of protocols.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products of some embodiments of the specification. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processor to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processor, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processor to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processor to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computer device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The embodiments of this specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The described embodiments may also be practiced in distributed computing environments where tasks are performed by remote processors that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of an embodiment of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for determining a risk classification threshold of service data is characterized by comprising the following steps:

2. The method for determining traffic data risk classification threshold according to claim 1, further comprising, after obtaining the plurality of interval groups conforming to monotonicity:

3. The method for determining traffic data risk classification threshold according to claim 1, wherein the dividing the risk probability prediction value into a plurality of interval groups according to size comprises:

4. The method for determining a traffic data risk classification threshold according to claim 1, wherein the performing monotonicity processing on the plurality of interval groups comprises:

5. The method for determining traffic data risk classification threshold according to claim 4, wherein the performing interval combination processing includes:

6. The method for determining traffic data risk classification threshold according to claim 2, wherein the performing hypothesis testing process on the monotonicity-compliant interval packets includes:

7. The method for determining traffic data risk classification threshold according to claim 2, wherein the classifying the interval groups passing the hypothesis test according to the true occurrence probability of the dangerous event comprises:

8. The method for determining a risk classification threshold of service data according to claim 7, wherein the determining a risk classification threshold according to the classification result comprises:

9. A device for determining a risk classification threshold of service data, comprising:

10. The traffic data risk classification threshold determining apparatus of claim 9, further comprising:

11. A computer device comprising a memory, a processor, and a computer program stored on the memory, wherein the computer program, when executed by the processor, performs the instructions of the method of any one of claims 1-8.

12. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor of a computer device, executes instructions of a method according to any one of claims 1-8.