WO2022269818A1

WO2022269818A1 - Probability calculation device, probability calculation method, and probability calculation program

Info

Publication number: WO2022269818A1
Application number: PCT/JP2021/023834
Authority: WO
Inventors: 篤彦前田; 和昭尾花; 幸雄菊谷; 健一福田
Original assignee: 日本電信電話株式会社
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2022-12-29
Also published as: JPWO2022269818A1

Abstract

A probability calculation device according to the present invention calculates, for each target and on the basis of a basic occurrence probability and a conditional occurrence probability, a first fluctuation vector as a vector expressing a fluctuation corresponding to a specified explanatory variable, said calculation being performed in a first fluctuation calculation unit. The targets are classified into clusters as a result of a clustering unit clustering respective fluctuation vectors of the targets. For each cluster, a second fluctuation calculation unit aggregates event occurrence data which corresponds to the targets belonging to said cluster, and calculates a second fluctuation vector as a vector expressing a fluctuation corresponding to a specified explanatory variable for said cluster. For each vector, an integrated calculation unit uses the second fluctuation vector of the cluster to which the target belongs to calculate the event occurrence probability for the target in a case in which the specific explanatory variable serves as a condition.

Description

Probability calculation device, probability calculation method, and probability calculation program

The disclosed technique relates to a probability calculation device, a probability calculation method, and a probability calculation program.

We want to predict the occurrence probability of some kind of event, such as the occurrence of an accident or the occurrence of a sick or injured person, based on past event occurrence data, taking into account that it may vary depending on several conditions such as the day of the week or time of day for each subdivided region. Sometimes. That is, there are cases where it is desired to obtain conditional event occurrence probabilities. For example, based on the obtained event occurrence probability, there is a case where an emergency vehicle such as a police car or an ambulance is desired to be deployed in advance at an appropriate location. Non-Patent Document 1 describes that real-time emergency demand prediction is performed to optimize the deployment of ambulances.

For the above applications, it is necessary to subdivide the target area from the entire area into small areas to some extent and calculate the event occurrence probability for each small area, instead of obtaining the event occurrence probability for the entire area. Become. However, subdividing the entire area reduces the event occurrence probability in each small area.

The disclosed technology has been made in view of the above points, and provides a probability calculation device, a probability calculation method, and a probability calculation method that can appropriately calculate the event occurrence probability of a target while suppressing errors even when there is little observed data about the target. and to provide a probability calculation program.

A first aspect of the present disclosure is a probability calculation device that calculates a basic occurrence probability, which is an occurrence probability obtained from past event occurrence data for each of a plurality of objects, and a specific explanatory variable of the event occurrence data. a first variation calculation unit that calculates a first variation vector as a vector representing a variation according to the specific explanatory variable for each of the targets, based on the conditional occurrence probability that is the occurrence probability as a condition; A clustering unit that classifies the targets into clusters by clustering each variation vector, and for each cluster, aggregating the event occurrence data associated with the targets belonging to the cluster, and clustering the cluster a second variation calculation unit that calculates a second variation vector as a vector representing variation according to the specific explanatory variable; and for each target, using the second variation vector of the cluster to which the target belongs, and an integrated calculation unit that calculates the event occurrence probability of the target under the condition of the explanatory variables.

A second aspect of the present disclosure is a probability calculation method, wherein a basic occurrence probability, which is an occurrence probability obtained from past event occurrence data for each of a plurality of objects, and a specific explanatory variable of the event occurrence data A first variation vector is calculated as a vector representing variation according to the specific explanatory variable for each target based on the conditional occurrence probability, which is the probability of occurrence as a condition, and the variation vector for each of the targets is calculated. By clustering, the targets are classified into clusters, and for each cluster, the event occurrence data associated with the targets belonging to the cluster are aggregated, and the variation according to the specific explanatory variable of the cluster. A second variation vector is calculated as a vector representing, and for each target, using the second variation vector of the cluster to which the target belongs, the event occurrence probability of the target under the condition of the specific explanatory variable is calculated, and the computer executes the process.

A third aspect of the present disclosure is a probability calculation program, which calculates a basic occurrence probability that is an occurrence probability obtained from past event occurrence data in each of a plurality of objects, and a specific explanatory variable of the event occurrence data. A first variation vector is calculated as a vector representing variation according to the specific explanatory variable for each target based on the conditional occurrence probability, which is the probability of occurrence as a condition, and the variation vector for each of the targets is calculated. By clustering, the targets are classified into clusters, and for each cluster, the event occurrence data associated with the targets belonging to the cluster are aggregated, and the variation according to the specific explanatory variable of the cluster. A second variation vector is calculated as a vector representing, and for each target, using the second variation vector of the cluster to which the target belongs, the event occurrence probability of the target under the condition of the specific explanatory variable is calculated, and the computer executes the process.

According to the disclosed technology, even if there is little observed data about the target, it is possible to appropriately calculate the event occurrence probability of the target while suppressing errors.

FIG. 10 is a diagram showing an example of a data string generated assuming subdivision into small areas; It is a block diagram showing the hardware configuration of the probability calculation device of the embodiment of the present disclosure. 1 is a block diagram showing a functional configuration of a probability calculation device according to an embodiment of the present disclosure; FIG. FIG. 10 is a diagram showing an example of a graph of results of obtaining first variation vectors for a plurality of subregions; FIG. 10 is a diagram showing the result of clustering the first variation vectors for each small area into two using k-means and obtaining the second variation vectors for each cluster; 4 is a flow chart showing the flow of probability calculation processing by the probability calculation device of the embodiment of the present disclosure;

An example of an embodiment of the disclosed technology will be described below with reference to the drawings. In each drawing, the same or equivalent components and portions are given the same reference numerals. Also, the dimensional ratios in the drawings are exaggerated for convenience of explanation, and may differ from the actual ratios.

First, the outline of this disclosure will be explained. As described in the above problem, subdividing the entire area reduces the event occurrence probability in each small area. For convenience of explanation, the probability of occurrence of an event that is ultimately desired in this embodiment will be referred to as event occurrence probability, and the probability of occurrence of other events will simply be referred to as probability of occurrence.

If the probability of occurrence is small, more event occurrence data (observation data) is required, but there may be cases where the amount of data is insufficient. For example, if the true probability of occurrence of casualties in the entire region is 0.1, a computer generates a binary random number each time according to this probability of occurrence. Suppose we make an observation that repeats the calculation that we assume that it has occurred. To stably obtain the original probability of occurrence from the observation result (a data string consisting of 0s and 1s), a random number generation algorithm normally requires approximately 100 observations. Here, as a result of subdividing the whole area into small areas, it is assumed that the true probability of occurrence in a certain small area is 0.01. In that case, event occurrence data of about 1000 times is required. FIG. 1 is a diagram showing an example of a data string generated assuming subdivision into small areas. Assuming that the observation is performed 1000 times, if only A1 in the data string is observed, the probability of occurrence is 0/10=0, and if only A2 is observed, the probability of occurrence is 1/10=0.1. It will appear several times. In actual observation, it is assumed that such a large amount of event occurrence data will not be available, so it will be necessary to obtain the occurrence probability of a small area from the event occurrence data of a small number of observations. However, if the occurrence probability of a small area is obtained from the event occurrence data of a small number of observations, in many cases the occurrence probability will be 0, or it may be calculated by mistake that the probability is greater than the true occurrence probability. have a nature.

If we try to obtain conditional event occurrence probabilities, it is assumed that event occurrence data will become increasingly scarce. For example, if it is desired to obtain the variation in the probability of occurrence of an injured or sick person depending on the day of the week, the data string of the event occurrence data must be divided into seven. In particular, assuming that the effects of conditions such as the day of the week only change the base probability of occurrence by about 20 to 30%, the probability of occurrence of 0.01 will change to 0.012 or 0.013. As such, the lack of data becomes more pronounced.

In the case of the above example, it is possible that some of the small subdivisions have similar urban functions, such as so-called residential areas or entertainment districts. For example, in all entertainment districts, there is a possibility that the number of acute alcoholics will increase on a specific day of the week and during a specific time period, and the probability of injury or illness will increase. In addition, in residential areas, there is a possibility that the probability of injury or illness will increase due to the time and date when many people are spending time at home.

In such small areas with similar urban functions, even if they are not adjacent, it can be assumed that the probability of occurrence fluctuates due to the same influence from the conditions of specific explanatory variables. Therefore, by aggregating past event occurrence data for small regions, once the scale of the probability of occurrence is increased, fluctuations due to specific explanatory variables are calculated, and then fluctuations for individual small regions are calculated. It can be considered to be used for As a result, probabilities close to the true occurrence probabilities for each of a plurality of small areas should be easily obtained even from relatively small amount of event occurrence data.

As for the method of aggregating small regions with similar urban functions, a vector (first variation vector) that represents the fluctuations of multiple regions from the conditional probability of occurrence under the condition of specific explanatory variables in each small region Calculate and cluster. By clustering the regions based on the vectors representing the changes in each small region, it is possible to estimate the individual changes in the subdivided small regions even if they cannot be accurately calculated due to the small number of observation data. Observation error can be canceled.

The reason for clustering using vectors that represent fluctuations is that the absolute amount of occurrence (probability) differs for each small area. If clustering is performed by the above method, for example, an entertainment district in the city center with a particularly high population density and an entertainment district in a small area adjacent to the city center can be classified into the same cluster.

In the above explanation, in order to simply explain the application of assumed small areas, the image of a typical town such as an entertainment district or a residential area was applied to the small area. Residences and restaurants are distributed and should have complex attributes. Therefore, it is not easy to determine that a certain small area and another small area show the same variation, so it is important to cluster after calculating a vector representing the variation for each small area.

In addition to the above, it is also possible to use temperature as a specific explanatory variable for each of multiple small areas with differences in elevation above sea level. In this case, if it is possible to cluster small areas with similar altitudes, it will be easier to predict fluctuations in the number of occurrences of heatstroke and the like from limited event occurrence data. Each of the plurality of small areas as described above is an example of each of the plurality of targets of the technology of the present disclosure.

In this embodiment, an example of a case where the event occurrence probability is to be obtained is a small area will be described, but the scope of application of this embodiment is not limited to small areas where event occurrence data can be aggregated in real space. . As long as the event occurrence data can be aggregated, the method of the present embodiment can be applied regardless of the target such as the Internet space or network environment.

Based on the above, the above idea is realized by configuring the probability calculation device according to the embodiment of the present disclosure as follows.

The configuration of this embodiment will be described below.

FIG. 2 is a block diagram showing the hardware configuration of the probability calculation device 100 according to the embodiment of the present disclosure.

As shown in FIG. 2, the probability calculation device 100 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a storage 14, an input unit 15, a display unit 16, and a communication interface. (I/F) 17. Each component is communicatively connected to each other via a bus 19 .

The CPU 11 is a central processing unit that executes various programs and controls each section. That is, the CPU 11 reads a program from the ROM 12 or the storage 14 and executes the program using the RAM 13 as a work area. The CPU 11 performs control of each configuration and various arithmetic processing according to programs stored in the ROM 12 or the storage 14 . In this embodiment, the ROM 12 or storage 14 stores an occurrence probability calculation program.

The ROM 12 stores various programs and various data. The RAM 13 temporarily stores programs or data as a work area. The storage 14 is configured by a storage device such as a HDD (Hard Disk Drive) or SSD (Solid State Drive), and stores various programs including an operating system and various data.

The input unit 15 includes a pointing device such as a mouse and a keyboard, and is used for various inputs.

The display unit 16 is, for example, a liquid crystal display, and displays various information. The display unit 16 may employ a touch panel system and function as the input unit 15 .

The communication interface 17 is an interface for communicating with other devices such as terminals. The communication uses, for example, a wired communication standard such as Ethernet (registered trademark) or FDDI, or a wireless communication standard such as 4G, 5G, or Wi-Fi (registered trademark).

Next, each functional configuration of the probability calculation device 100 will be described. FIG. 3 is a block diagram showing the functional configuration of the probability calculation device 100 according to the embodiment of the present disclosure. Each functional configuration is realized by the CPU 11 reading out an occurrence probability calculation program stored in the ROM 12 or the storage 14, developing it in the RAM 13, and executing it.

As shown in FIG. 3, the probability calculation device 100 includes an event storage unit 102, a basic probability calculation unit 110, a conditional probability calculation unit 112, a first fluctuation calculation unit 114, a clustering unit 116, and a second fluctuation calculation unit. It includes a unit 118 and an integrated calculation unit 120 .

The probability calculation device 100 receives event occurrence data as input and stores it in the event storage unit 102 . The event occurrence data records the date and time when the event occurred, the day of the week, the small area where the event occurred, and the like in each of the plurality of targets in a certain period of time in the past.

For simplicity of explanation, the following explanation assumes that the event occurrence probability is constant from the past to the future if the occurrence time zone, day of the week, and small area are the same. In other words, it is possible to calculate the event occurrence probability under the same conditions in the future by simply accumulating sufficient event occurrence data under the same conditions. However, it is assumed that sufficient event occurrence data is not accumulated for the scale of the event occurrence probability to be obtained.

Based on the event occurrence data stored in the event storage unit 102, the basic probability calculation unit 110 calculates the occurrence probability (basic occurrence probability) for each time zone for each small area. The basic occurrence probability can be calculated by counting events in a specific small area and in a specific time period in the past fixed period of the event occurrence data and dividing by the number of days in the fixed period. For example, if the period is 350 days and a total of 35 events occurred in a small area between 10:00 and 350, the basic occurrence probability is 0.1 at 35/350.

Based on the event occurrence data stored in the event storage unit 102, the conditional probability calculation unit 112 calculates the occurrence probability (conditional occurrence probability) for each day of the week and time zone for each small area. The conditional probability of occurrence can be calculated by counting events in a specific small area and in a specific time period in the past fixed period of the event occurrence data and dividing by the number of days in the fixed period. For example, if the period is 350 days and a total of 35 events occurred in a small area between 10:00 and 350, the conditional probability of occurrence is 0.1 at 35/350. Setting the day of the week as a condition is an example of setting a specific explanatory variable as a condition of the technology of the present disclosure. Note that the probability calculation device 100 may receive the basic occurrence probability and the conditional occurrence probability calculated in advance by another device or the like. In that case, the basic probability calculation unit 110 and the conditional probability calculation unit 112 may be omitted in terms of configuration.

The first fluctuation calculation unit 114 receives inputs of the basic occurrence probability from the basic probability calculation unit 110 and the conditional occurrence probability from the conditional probability calculation unit 112 . The first fluctuation calculator 114 calculates a first fluctuation vector for each small area based on the basic occurrence probability and the conditional occurrence probability. The first variation vector is a vector representing variation according to a specific explanatory variable, here the day of the week. The first variation vector here is obtained by calculating how many times the conditional occurrence probability for each day of the week is the basic occurrence probability ignoring the day of the week for each small area. In the above example, the variation on Sunday between 10:00 and 10:00 is 0.06/0.1=0.6 times, and the combination of all time zones and all days of the week is obtained and used as the first variation vector. FIG. 4 is a diagram showing an example of a graph of results of obtaining first variation vectors for a plurality of small areas. In the example of FIG. 4, as the first variation vector, the amount of variation for each day of the week and each time slot for subregions A, B, C, and D are obtained.

The clustering unit 116 receives the input of the variation vector of each small area from the first variation calculation unit 114 . The clustering unit 116 classifies the small areas into clusters by clustering the first variation vectors of the small areas. A typical clustering method such as k-means may be used. For example, the number of clusters may be set as a parameter to be manually adjusted, and the number of clusters may be tried and adjusted so that the future prediction result is the most suitable.

The second variation calculation unit 118 receives the result of classifying each small area into each cluster as a clustering result from the clustering unit 116 . For each cluster, the second variation calculation unit 118 aggregates the event occurrence data associated with the small areas belonging to the cluster, and calculates a second variation vector. Specifically, when calculating the second variation vector, the second variation calculation unit 118 acquires event occurrence data of small areas included in the cluster from the event storage unit 102 for each cluster and aggregates them. The second fluctuation calculation unit 118 uses the event occurrence data aggregated for the cluster to obtain the basic occurrence probability of the cluster in the time period, and the probability of occurrence of the cluster in each day of the week and time period. Find the conditional probability of occurrence. Then, a second variation vector is calculated based on the basic occurrence probability of the cluster and the conditional occurrence probability of the cluster. As a result, the second variation vector can be calculated as a vector representing the variation according to the specific explanatory variable of the cluster.

FIG. 5 is a diagram showing the result of clustering the first variation vector for each small area into two by k-means and obtaining the second variation vector for each cluster. When graphed, the graph is basically smoother than the original first variation vector for each small area. In addition, instead of obtaining the average vector directly from the variation vector for each small area, the event occurrence data of the clustered small areas is aggregated and then the second variation vector is recalculated. more strongly influenced by sub-regions of This effectively weakens the influence of small regions with large observation errors. If the time zone width is widened in the original small area, the result may be similar, but if there is a place where the change in the true probability of occurrence increases or decreases every hour, the method of the present disclosure is appropriate. be.

The integrated calculation unit 120 receives input of the second fluctuation vector for each cluster from the second fluctuation calculation unit 118 . The integrated calculation unit 120 calculates the event occurrence probability for each small area based on the basic occurrence probability for each time zone of the small area and the variation value in the second variation vector of the cluster to which the small area belongs. The event occurrence probability is obtained as a conditional occurrence probability for each small area. For example, if the event occurrence probability for each sub-area on Sunday between 10:00 and 10:00 is to be obtained, the basic occurrence probability between 10:00 and 10:00 is obtained for each sub-area regardless of the day of the week. After that, the variation value for Sunday and 10:00 is extracted from the second variation vector of the cluster to which the small area corresponds, and the conditional occurrence probability for Sunday and 10:00 for each small area is calculated as the event occurrence probability. .

Next, the operation of the probability calculation device 100 will be described.

FIG. 6 is a flowchart showing the flow of probability calculation processing by the probability calculation device 100 of the embodiment of the present disclosure. The CPU 11 reads out the probability calculation program from the ROM 12 or the storage 14, develops it in the RAM 13, and executes it, thereby performing the probability calculation process. The CPU 11 executes the following processes as each part of the probability calculation device 100 .

In step S100, the CPU 11, as the basic probability calculation unit 110, calculates the occurrence probability (basic occurrence probability) for each time period for each small area based on the event occurrence data stored in the event storage unit 102.

In step S102, the CPU 11, as the conditional probability calculation unit 112, calculates the occurrence probability (conditional occurrence probability) for each day of the week and for each time period for each small area based on the event occurrence data stored in the event storage unit 102. Calculate

In step S104, the CPU 11, as the first fluctuation calculator 114, calculates a first fluctuation vector for each small area based on the basic occurrence probability and the conditional occurrence probability. The first variation vector is a vector representing variation according to a specific explanatory variable, here the day of the week.

In step S106, the CPU 11, as the clustering unit 116, classifies the small regions into clusters by clustering the first variation vectors of each of the small regions.

In step S108, the CPU 11, as the second fluctuation calculation unit 118, aggregates the event occurrence data associated with the small areas belonging to the cluster for each cluster, and calculates the second fluctuation vector.

In step S110, the CPU 11, as the integrated calculation unit 120, for each small area, based on the basic occurrence probability for each time period of the small area and the variation value in the second variation vector of the cluster to which the small area belongs, Calculate the event occurrence probability.

As described above, according to the probability calculation device 100 of the present embodiment, even when there is little data observed for subdivided small areas, it is possible to appropriately calculate the event occurrence probability of the small areas by suppressing errors.

It should be noted that the probability calculation processing executed by the CPU reading the software (program) in the above embodiment may be executed by various processors other than the CPU. In this case, the processor is a PLD (Programmable Logic Device) whose circuit configuration can be changed after manufacturing, such as an FPGA (Field-Programmable Gate Array), and an ASIC (Application Specific Integrated Circuit) to execute specific processing. A dedicated electric circuit or the like, which is a processor having a specially designed circuit configuration, is exemplified. Further, the probability calculation processing may be executed by one of these various processors, or by a combination of two or more processors of the same or different type (for example, multiple FPGAs and a combination of CPU and FPGA). etc.). More specifically, the hardware structure of these various processors is an electric circuit in which circuit elements such as semiconductor elements are combined.

Also, in the above embodiment, a mode in which the probability calculation program is pre-stored (installed) in the storage 14 has been described, but the present invention is not limited to this. Programs are stored in non-transitory storage media such as CD-ROM (Compact Disk Read Only Memory), DVD-ROM (Digital Versatile Disk Read Only Memory), and USB (Universal Serial Bus) memory. may be provided in the form Also, the program may be downloaded from an external device via a network.

Regarding the above embodiments, the following additional remarks are disclosed.

(Appendix 1)
memory;
at least one processor connected to the memory;
including
The processor
Based on the basic occurrence probability, which is the occurrence probability obtained from past event occurrence data for each of a plurality of subjects, and the conditional occurrence probability, which is the occurrence probability conditioned on a specific explanatory variable of the event occurrence data , for each target, calculating a first variation vector as a vector representing variation according to the specific explanatory variable;
classifying the objects into clusters by clustering the variation vectors of each of the objects;
For each cluster, the event occurrence data associated with the object belonging to the cluster is aggregated, and a second variation vector is calculated as a vector representing the variation according to the specific explanatory variable of the cluster;
For each target, using the second variation vector of the cluster to which the target belongs, calculating the event occurrence probability of the target under the condition of the specific explanatory variable;
A probability calculation device configured as follows.

(Appendix 2)
A non-transitory storage medium storing a program executable by a computer to perform a probability calculation process,
Based on the basic occurrence probability, which is the occurrence probability obtained from past event occurrence data for each of a plurality of subjects, and the conditional occurrence probability, which is the occurrence probability conditioned on a specific explanatory variable of the event occurrence data , for each target, calculating a first variation vector as a vector representing variation according to the specific explanatory variable;
classifying the objects into clusters by clustering the variation vectors of each of the objects;
For each cluster, the event occurrence data associated with the object belonging to the cluster is aggregated, and a second variation vector is calculated as a vector representing the variation according to the specific explanatory variable of the cluster;
For each target, using the second variation vector of the cluster to which the target belongs, calculating the event occurrence probability of the target under the condition of the specific explanatory variable;
Non-transitory storage media.

100 probability calculation device 102 event storage unit 110 basic probability calculation unit 112 conditional probability calculation unit 114 first fluctuation calculation unit 116 clustering unit 118 second fluctuation calculation unit 120 integrated calculation unit

Claims

Based on the basic occurrence probability, which is the occurrence probability obtained from past event occurrence data for each of a plurality of subjects, and the conditional occurrence probability, which is the occurrence probability conditioned on a specific explanatory variable of the event occurrence data , a first variation calculation unit that calculates, for each target, a first variation vector as a vector representing a variation according to the specific explanatory variable;
A clustering unit that classifies the objects into clusters by clustering the variation vectors of each of the objects;
for each cluster, the event occurrence data associated with the object belonging to the cluster is aggregated, and a second variation vector is calculated as a vector representing variation according to the specific explanatory variable of the cluster; a fluctuation calculator;
an integrated calculation unit that calculates, for each target, the event occurrence probability of the target when the specific explanatory variable is used as a condition, using the second variation vector of the cluster to which the target belongs;
Probability calculator including
The integrated calculation unit calculates, for each target, the event occurrence probability based on the basic occurrence probability of the target and a variation value in the second variation vector of the cluster to which the target belongs. 2. The probability calculation device according to 1.
For each cluster, the second variation calculation unit obtains the basic occurrence probability of the cluster using the event occurrence data aggregated for the cluster, and calculates the basic occurrence probability of the cluster with the specific explanatory variable as a condition. 3. The second variation vector is calculated based on the basic occurrence probability of the cluster and the conditional occurrence probability of the cluster by obtaining the conditional occurrence probability of Probability calculator.
Based on the basic occurrence probability, which is the occurrence probability obtained from past event occurrence data for each of a plurality of subjects, and the conditional occurrence probability, which is the occurrence probability conditioned on a specific explanatory variable of the event occurrence data , for each target, calculating a first variation vector as a vector representing variation according to the specific explanatory variable;
classifying the objects into clusters by clustering the variation vectors of each of the objects;
For each cluster, the event occurrence data associated with the object belonging to the cluster is aggregated, and a second variation vector is calculated as a vector representing the variation according to the specific explanatory variable of the cluster;
For each target, using the second variation vector of the cluster to which the target belongs, calculating the event occurrence probability of the target under the condition of the specific explanatory variable;
A probability calculation method that causes a computer to execute a process.
Based on the basic occurrence probability, which is the occurrence probability obtained from past event occurrence data for each of a plurality of subjects, and the conditional occurrence probability, which is the occurrence probability conditioned on a specific explanatory variable of the event occurrence data , for each target, calculating a first variation vector as a vector representing variation according to the specific explanatory variable;
classifying the objects into clusters by clustering the variation vectors of each of the objects;
For each cluster, the event occurrence data associated with the object belonging to the cluster is aggregated, and a second variation vector is calculated as a vector representing the variation according to the specific explanatory variable of the cluster;
For each target, using the second variation vector of the cluster to which the target belongs, calculating the event occurrence probability of the target under the condition of the specific explanatory variable;
A probability calculation program that causes a computer to execute processing.