CN111275227A

CN111275227A - Risk prediction method, risk prediction device, electronic equipment and computer-readable storage medium

Info

Publication number: CN111275227A
Application number: CN201811474114.4A
Authority: CN
Inventors: 刘刚刚; 李奘; 卓呈祥
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-12-04
Filing date: 2018-12-04
Publication date: 2020-06-12

Abstract

The embodiment of the application provides a risk prediction method, a risk prediction device, electronic equipment and a computer-readable storage medium, wherein the method comprises the following steps: acquiring historical characteristic information of a plurality of users to be predicted; and aiming at each user to be predicted in the plurality of users to be predicted, determining the risk probability of the user to be predicted according to the historical characteristic information of the user to be predicted, the correlation degree between the user to be predicted and other users to be predicted, the historical characteristic information of other users to be predicted having correlation with the user to be predicted and a pre-trained risk prediction model. According to the method and the device, the danger probability of the user to be predicted is determined in advance, so that the safety of the riding environment is improved.

Description

Risk prediction method, risk prediction device, electronic equipment and computer-readable storage medium

Technical Field

The present application relates to the field of information technology, and in particular, to a risk prediction method, an apparatus, an electronic device, and a computer-readable storage medium.

Background

With the continuous and rapid development of the automobile electronic technology, the travel modes such as taxi taking and private car taking in appointment are greatly developed, the irreplaceable effect is achieved in the daily life and travel of people, and great convenience is brought to the daily life and traffic travel of people.

With the further development of society, the traditional taxi can not meet the traveling requirements of people, and in order to meet the requirements of more convenient users, a network reservation car appears in the market at present, so that the users can reserve the cars according with the travel by using car software.

With the increase of the number of taxis and private cars providing services, the problem of service safety becomes more and more important, especially in the environment that passengers take the car alone, various dangerous situations may be met, and how to guarantee the safety of taking the car to the maximum extent is a problem which needs to be solved urgently at present.

Disclosure of Invention

In view of the above, an object of the present application is to provide a risk prediction method, apparatus, electronic device and computer readable storage medium to improve safety of a riding environment.

In a first aspect, an embodiment of the present application provides a risk prediction method, including:

acquiring historical characteristic information of a plurality of users to be predicted;

and aiming at each user to be predicted in the plurality of users to be predicted, determining the risk probability of the user to be predicted according to the historical characteristic information of the user to be predicted, the correlation degree between the user to be predicted and other users to be predicted, the historical characteristic information of other users to be predicted having correlation with the user to be predicted and a pre-trained risk prediction model.

In some embodiments, for each user to be predicted in the plurality of users to be predicted, determining the risk probability of the user to be predicted according to the historical feature information of the user to be predicted, the degree of correlation between the user to be predicted and other users to be predicted, the historical feature information of other users to be predicted having correlation with the user to be predicted, and a pre-trained risk prediction model, includes:

determining the initial risk probability of the user to be predicted according to the historical characteristic information of the user to be predicted and the risk prediction model;

determining initial risk probability of other users to be predicted, which are relevant to the user to be predicted, according to historical feature information of the other users to be predicted, which are relevant to the user to be predicted, and the risk prediction model;

and determining the updated risk probability of the user to be predicted according to the initial risk probability of the user to be predicted, the initial risk probabilities of other users to be predicted having correlation with the user to be predicted and the correlation degrees of the user to be predicted and other users to be predicted.

In some embodiments, the risk prediction model is trained according to the following steps:

constructing a training sample library, wherein the training sample library comprises characteristic information of a plurality of users to be trained in a preset historical time period and a risk result corresponding to the characteristic information;

and training to obtain the risk prediction model by taking the characteristic information as a model input characteristic and taking a risk result as a model output characteristic.

In some embodiments, the constructing a training sample library comprises:

determining a positive sample and a negative sample in a sample set according to the risk result;

and screening the positive samples and the negative samples which accord with the preset proportion from the sample set according to the preset proportion of the positive samples and the negative samples to generate the training sample library.

In some embodiments, for each user to be predicted in the plurality of users to be predicted, determining an updated risk probability of the user to be predicted according to the initial risk probability of the user to be predicted, the initial risk probabilities of other users to be predicted having a correlation with the user to be predicted, and the correlation between the user to be predicted and the other users to be predicted, includes:

taking the initial risk probability of each user to be predicted as the current risk probability;

aiming at each user to be predicted, updating the current risk probability of each user to be predicted according to the current risk probability of the user to be predicted, the current risk probabilities of other users to be predicted having correlation with the user to be predicted and the correlation degree of the user to be predicted and other users to be predicted;

and repeating the step of updating the current risk probability of each user to be predicted according to the current risk probability of the user to be predicted, the current risk probability of other users to be predicted having correlation with the user to be predicted and the correlation between the user to be predicted and other users to be predicted by taking the updated risk probability of each user to be predicted as the current risk probability until a set condition is reached and finally updated risk probability of each user to be predicted is obtained.

In some embodiments, the setting conditions include:

and the set updating times are reached, or the change rate of the current risk probability of the users to be predicted, which exceeds the set number, and the risk probability before updating is smaller than the set change rate.

In some embodiments, the updated risk probability of the user to be predicted is calculated according to the following formula:

wherein, p (i) is the updated risk probability of the user i to be predicted; p is a radical of_iThe current risk probability of the user i to be predicted is obtained; p is a radical of_jThe current risk probability of the jth user to be predicted, which has correlation with the user i to be predicted, is obtained; w is a_ijThe correlation between the user i to be predicted and the jth user to be predicted is obtained; n represents the number of other users to be predicted having a correlation with the user i to be predicted.

In some embodiments, the method further comprises:

for any user to be predicted, if the initial risk probability of the user to be predicted is greater than or equal to a set threshold, determining that the user to be predicted is a dangerous user;

and if the initial risk probability of the user to be predicted is smaller than the set threshold value, but the updated risk probability of the user to be predicted is larger than or equal to the set threshold value, determining that the user to be predicted is a dangerous user.

In some embodiments, the historical feature information includes at least one of the following feature information:

a user attribute feature; a service provisioning feature; credit record feature.

In a second aspect, an embodiment of the present application provides a risk prediction apparatus, including:

the information acquisition module is used for acquiring historical characteristic information of a plurality of users to be predicted;

and the probability determining module is used for determining the risk probability of the user to be predicted according to the historical characteristic information of the user to be predicted, the correlation degree of the user to be predicted and other users to be predicted, the historical characteristic information of other users to be predicted having correlation with the user to be predicted and a pre-trained risk prediction model for each user to be predicted in the plurality of users to be predicted.

In some embodiments, the probability determination module is specifically configured to:

In some embodiments, the risk prediction system further comprises a model training module, wherein the model training module trains the risk prediction model according to the following steps:

In some embodiments, the model training module is specifically configured to:

In some embodiments, for each user to be predicted in the plurality of users to be predicted, the probability determination module is specifically configured to:

In some embodiments, the setting conditions include:

In some embodiments, the probability determination module calculates the updated risk probability of the user to be predicted according to the following formula:

In some embodiments, the system further comprises a dangerous user determination module, wherein the dangerous user determination module is configured to:

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, the processor and the storage medium communicate with each other through the bus when the electronic device runs, and the processor executes the machine-readable instructions to execute the steps of the risk prediction method according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method for predicting risk according to the first aspect is performed.

According to the risk prediction method, the device, the electronic equipment and the computer readable storage medium provided by the embodiment of the application, the risk probability of the user to be predicted is determined according to the historical characteristic information of the user to be predicted, the correlation degree of the user to be predicted and other users to be predicted, the historical characteristic information of other users to be predicted which are correlated with the user to be predicted and a pre-trained risk prediction model aiming at each user to be predicted, so that the risk probability of the user can be predicted comprehensively, the user with high risk is intervened in advance, and the safety of a riding environment is improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart illustrating a risk prediction method provided in an embodiment of the present application;

fig. 2 is a flowchart illustrating a method for determining a risk probability of a user to be predicted according to an embodiment of the present application;

fig. 3 is a flowchart illustrating a method for determining a dangerous user according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for training a risk prediction model according to an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating a method for constructing a training sample library according to an embodiment of the present disclosure;

fig. 6 is a flowchart illustrating an updating method for a risk probability of a user to be predicted according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating a correlation relationship between users to be predicted according to an embodiment of the present application;

fig. 8 is a schematic structural diagram illustrating a risk prediction apparatus according to an embodiment of the present application;

fig. 9 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

In order to enable a person skilled in the art to use the present disclosure, the following embodiments are given in conjunction with a specific application scenario "risk prediction". It will be apparent to those skilled in the art that the general principles defined herein may be applied to travel scenarios and also to other scenarios requiring security monitoring without departing from the spirit and scope of the present application. Although the present application is primarily described in the context of a travel scenario, it should be understood that this is only one exemplary embodiment. The application can be applied to any transportation type and also to other relevant processing services. For example, the present application may be applied to different transportation system environments, including terrestrial, marine, or airborne, among others, or any combination thereof. The vehicle of the transportation system may include a taxi, a private car, a windmill, a bus, a train, a bullet train, a high speed rail, a subway, a ship, an airplane, a spacecraft, a hot air balloon, or an unmanned vehicle, etc., or any combination thereof. The present application may also include any service system for providing a service selection prediction process. Applications of the method and apparatus of the present application may include web pages, plug-ins for browsers, client terminals, customization systems, internal analysis systems, or artificial intelligence robots, etc., or any combination thereof.

The embodiment of the application can serve a travel service platform, and the travel service platform is used for providing corresponding services for the user according to the received travel service request of the client. The trip service platform may include a plurality of taxi taking systems, such as a taxi taking system, a express taxi taking system, a special taxi taking system, a tailgating taxi taking system, and the like. The travel service request of the client includes departure place information and destination information.

The risk prediction method of the embodiment of the application can be applied to a server of a trip service platform and can also be applied to any other computing equipment with a processing function. In some embodiments, the server or computing device may include a processor. The processor may process information and/or data related to the service request to perform one or more of the functions described herein.

At present, the safety problem of the riding environment mainly relates to two levels, namely, deliberate injuries such as robbery and the like caused by a driver or a passenger to the other party actively, and traffic accidents happen. How to avoid the safety problem of the riding environment and ensure the safety of both the owner and the passenger is the key for improving the travel service quality. The embodiment of the application mainly provides a solution for the first-level security problem.

For convenience of description, definitions and risk probabilities of dangerous users are introduced, the dangerous users refer to users who will have invasive behaviors in the future, the risk probabilities refer to probabilities that users are dangerous users, and a user can be considered as a dangerous user if the risk probability of the user is extremely high.

Taking the above-mentioned users as the drivers, because the dangerous drivers are a small part of the driver group, which has high sparsity and is not easy to find, but the part of the driver group has a great threat to the personal and property safety of the passengers, if the dangerous drivers can be found in advance and the possibility of accidents can be reduced by a certain intervention method, the safety of the riding environment can be improved.

By studying historical data related to dangerous drivers, the dangerous drivers are found to have some commonalities, and by establishing a relationship between the characteristics of the drivers and the danger probability in advance, the danger prediction can be performed on the part of drivers to be predicted by collecting the characteristics of the drivers to be predicted, so as to determine the probability that the part of drivers to be predicted are dangerous drivers, and the embodiment of the application is described in detail based on the idea.

The embodiment of the application provides a risk prediction method, which is applied to a vehicle platform server, and specifically includes the following steps S101 to S102 as shown in fig. 1:

s101, obtaining historical characteristic information of a plurality of users to be predicted.

The user to be predicted can be a service requester or a service provider, in a travel scene, the service requester can be a passenger specifically, and the service provider can be a driver.

Whether a user becomes a dangerous user or not is associated with various characteristic information, so that when the danger of a user to be predicted is predicted, the historical characteristic information of the user to be predicted is acquired, wherein the historical characteristic information comprises at least one of the following characteristic information:

The user attribute features comprise personal attribute information of the user such as the age, sex, working year, native place and school calendar of the user.

Through historical data, the danger of the user is found to have a certain relation with the attribute characteristics of the user, for example, the danger probability of a male driver is higher than that of a female driver under the general condition.

The service providing features mainly include:

(1) the behavior characteristics of the user logging in and out of the service system, such as the time and frequency of logging in the service system, the time and frequency of logging out of the service system, and the regional environment condition when logging in the service system.

The regional environment condition when logging in the service system may include the population confidentiality of the region, whether the region is a remote place, or not. The service system is a driver order receiving system, and the login service system is a driver login order receiving system to prepare for starting to receive orders distributed by the vehicle platform server.

The time and frequency of logging in the service system, the time and frequency of logging out of the service system, and the regional environment condition when logging in the service system have certain influence on the danger of the driver, for example, historical data shows that the danger probability of the driver with higher frequency of logging in the service system and logging out of the service system in one day is higher than that of the driver with lower frequency of logging in the service system and logging out of the service system; the time to log in to the service system is often the evening or morning driver, which has a higher probability of danger than the driver who logs in to the service system is often the day.

(2) The complaint condition of the user in the service system specifically refers to the complaint category and the complaint times of the user in the service system, and the complaint category can include situations of invading passengers, deviating from the original route and the like.

The type and the number of complained drivers in the service system have certain influence on the dangerousness of the drivers, for example, historical data shows that the dangerousness probability corresponding to the driver receiving the complaining with more number is higher.

(3) The statistical characteristics of the order class of the user in the service system, such as the order placing area of the order received by the user, and the like.

The driver's taking quantity and placing area in the service system will have a certain influence on the driver's danger, for example, it is found through historical data that the more remote the placing area corresponding to the received order is, the higher the probability that the corresponding driver becomes a dangerous driver.

(4) User subsidy features such as coupons, rewards, and the percentage of revenue the user receives.

In the total income of the driver every month, the income proportion of the coupon and the reward to the user has certain influence on the danger of the driver, for example, the historical data shows that the probability of the driver with higher income proportion of the coupon and the reward to be dangerous is smaller.

(5) And the track type statistical characteristics of the user when the service is provided, such as the geographic position of the user in the driving process and the corresponding time period of the geographic position.

Historical data shows that the drivers often pass through remote geographical positions in the driving process, and the probability that the drivers at night are dangerous drivers in corresponding time periods is high.

The credit investigation recording characteristics comprise the characteristics of the credit loss behavior of the user, namely the recorded illegal behaviors of the user and the credit investigation report of the user.

The recorded violations of the user and the credit investigation report of the user have certain influence on the danger of the driver, for example, the more recorded violations, the higher the probability that the driver becomes a dangerous driver is found through historical data; the lower the credit report score, the higher the probability that the driver becomes a dangerous driver.

Through the research on historical accident data, the historical characteristic information (including at least one of the user attribute characteristic, the service providing characteristic and the credit investigation recording characteristic) has certain relevance with whether the driver becomes a dangerous driver, so that the danger probability of the user to be predicted can be preliminarily determined by acquiring the historical characteristic information of the user to be predicted and the pre-established relevance relationship between the characteristic information and the danger probability.

Based on the above, at least one of the user attribute feature, the service providing feature and the credit investigation feature of the user to be predicted can be obtained in advance, and then the risk prediction can be performed on the user to be predicted according to the risk prediction model trained in advance.

Specifically, the historical feature information of the user to be predicted in which historical time period is obtained, the risk probability of the user to be predicted in the target time period can be predicted, and the method depends on the corresponding relationship between the historical time period selected by the model input feature and the time period corresponding to the model output feature in the risk prediction model training process, for example, the historical feature information of a training sample obtained from a certain specific historical time to a previous month is obtained by the model input feature, and the risk result of the training sample obtained from the specific historical time to a next week is obtained by the model output feature.

S102, aiming at each user to be predicted in a plurality of users to be predicted, determining the risk probability of the user to be predicted according to the historical characteristic information of the user to be predicted, the correlation degree between the user to be predicted and other users to be predicted, the historical characteristic information of other users to be predicted having correlation with the user to be predicted and a pre-trained risk prediction model.

According to the risk prediction model and the historical characteristic information of the users to be predicted, the risk probability of the users to be predicted can be preliminarily predicted, but in order to more effectively find hidden dangerous users, a mode of further confirming the risk probability of the users to be predicted based on the correlation degree between the users to be predicted is introduced.

The relevancy refers to the distance between two users to be predicted, which is determined according to the communication records of the two users to be predicted, and the communication records may include the record information of communication through various instant messaging software, such as the telephone contact frequency, the short message contact frequency, the WeChat contact frequency, the mail contact frequency, and the like.

If the relevance between two users to be predicted is high, they are likely to influence each other, such as a driver a with a low initial risk, and if other drivers closely related to the driver a are dangerous drivers, the probability that the driver a is a dangerous driver is high.

In step S102, the initial risk probability of each user to be predicted may be determined by the history feature information acquired in advance and the risk prediction model trained in advance, and then the initial risk probability of the user to be predicted is updated by using the correlation degree, if the initial risk probability of the user to be predicted is not high, but the risk probabilities of other users to be predicted having high correlation degree with the user to be predicted are high, the initial risk probability of the user to be predicted becomes high after updating, that is, the probability that the user to be predicted becomes a dangerous driver is increased, so that not only a part of dangerous users may be determined by the risk prediction model, but also the dangerous users missed by the risk prediction model may be screened out based on the method of updating the risk probability of the correlation degree.

In one embodiment, in step S102, for each of a plurality of users to be predicted, the risk probability of the user to be predicted is determined according to the historical feature information of the user to be predicted, the degree of correlation between the user to be predicted and other users to be predicted, the historical feature information of other users to be predicted having correlation with the user to be predicted, and a risk prediction model trained in advance, as shown in fig. 2, the method specifically includes the following steps S201 to S203:

s201, determining the initial risk probability of the user to be predicted according to the historical characteristic information and the risk prediction model of the user to be predicted.

And inputting the historical characteristic information of the user to be predicted into the risk prediction model as a model input characteristic, so as to obtain the initial risk probability of the user to be predicted.

S202, according to historical feature information of other users to be predicted and related to the user to be predicted and a risk prediction model, determining initial risk probability of the other users to be predicted and related to the user to be predicted.

Similarly, historical feature information of other users to be predicted, which have relevance to the user to be predicted, is input into the risk prediction model as a model input feature, and the initial risk probability of the other users to be predicted, which can have relevance to the user to be predicted, is input.

Step S201 and step S202, finally, determine their respective initial risk probabilities for each of the users to be predicted.

According to the initial risk probability and the set threshold, the risk user in the plurality of users to be predicted can be determined, for example, the user to be predicted, of which the initial risk probability is greater than the set threshold, is the risk user. However, some users to be predicted may be dangerous users and cannot be determined by the initial danger probability, for example, users to be predicted whose initial danger probability is smaller than the set threshold may still be dangerous users, and in the embodiment of the present application, the initial danger probability of each user to be predicted is updated based on the correlation between the users to be predicted, as in step S203.

And S203, determining the updated risk probability of the user to be predicted according to the initial risk probability of the user to be predicted, the initial risk probabilities of other users to be predicted having correlation with the user to be predicted and the correlation between the user to be predicted and other users to be predicted.

For example, a certain user to be predicted whose initial risk probability is smaller than a set threshold may not be a dangerous user before updating, but if the updated risk probability of the user to be predicted reaches the set threshold, the user to be predicted becomes a dangerous user, so that all dangerous users can be screened more effectively, a certain measure is taken in advance to intervene, and safety of a riding environment is provided to the maximum extent, and the following is a manner of determining that any user to be predicted is a dangerous user, as shown in fig. 3, specifically, the following steps are as follows S301 to S302:

s301, for any user to be predicted, if the initial risk probability of the user to be predicted is greater than or equal to a set threshold, determining that the user to be predicted is a dangerous user.

And S302, if the initial risk probability of the user to be predicted is smaller than a set threshold, but the updated risk probability of the user to be predicted is larger than or equal to the set threshold, determining that the user to be predicted is a dangerous user.

For example, the set threshold is 0.7, where the set threshold is set in advance, and may be determined by comprehensively considering the number of all users to be predicted, the safety level to be achieved, and the user experience, and if the set threshold is set too low, although dangerous accidents may be further reduced, the number of dangerous users obtained in this way may be too large, and normal services may be affected, but the set threshold cannot be set too high, so as to prevent potential dangerous users with high possibility from being missed, and specifically, the set threshold may be determined and adjusted through experience.

The risk prediction model mentioned above may predict the risk probability of the user in a certain future time period according to the historical feature information of the user, and as shown in fig. 4, the risk prediction model is specifically trained according to the following steps S401 to S402:

s401, a training sample library is constructed, wherein the training sample library comprises characteristic information of a plurality of users to be trained in a preset historical time period and a danger result corresponding to the characteristic information.

The set historical time period may be a period of time from a certain historical time, for example, 2018, 1/day, and feature information of a plurality of users to be trained in one month before 2018, 1/month, and the risk result may be whether each user to be trained is a dangerous user, that is, whether customer-assault behavior occurs, in a period of time after 2018, 1/month, 1/day, such as one day or one week, and if so, the risk result may be a dangerous user, and if not, the risk result may be a dangerous user.

The quality of the training sample library construction can directly affect the prediction result of the risk prediction model, and in order to improve the prediction accuracy of the model, the training sample library is specifically constructed in the following manner, as shown in fig. 5, specifically including the following steps S501 to S502:

and S501, determining a positive sample and a negative sample in the sample set according to the risk result.

Here, the sample corresponding to the dangerous user as the dangerous result is defined as a positive sample, the sample corresponding to the dangerous user as the dangerous result is defined as a negative sample, and the ratio of the number of the positive sample and the negative sample has a certain influence on the prediction accuracy of the dangerous prediction model, so that the step S502 needs to be executed:

and S502, screening positive samples and negative samples according with the proportion from the sample set according to the preset proportion of the positive samples and the negative samples to generate a training sample library.

In order to further improve the accuracy of the risk prediction model for risk prediction, the number of generally selected positive samples and the number of generally selected negative samples should be close to each other, for example, the ratio of the positive samples to the negative samples is controlled to be 1: 10-10: 1, specifically, in the embodiment of the present application, because a dangerous user has very high sparsity, the number of the negative samples is far greater than the number of the positive samples in a sample set obtained in a historical accident recording table, and the embodiment of the present application may adopt a down-sampling and up-sampling method to screen proportional positive samples and proportional negative samples from the sample set, as shown below:

down-sampling: if the number of the negative samples in the sample set is ten thousand times of that of the positive samples, the negative samples are screened in the sample set according to the sampling proportion of 0.001, and the number proportion of the positive samples to the negative samples is adjusted to be within 1: 10.

And (3) upsampling: if the total amount of the screened positive and negative samples is thousands of levels, which may result in a smaller number of models, the final training sample library may be formed by combining the positive and negative samples and features obtained from a plurality of historical time periods, for example, by combining the samples screened at the dates of 20180110, 20180201, 20180301, etc. to form the final training sample library.

And S402, training by taking the characteristic information as a model input characteristic and taking the risk result as a model output characteristic to obtain a risk prediction model.

Specifically, the feature information of the user to be trained is obtained, the feature information can be converted into a digital vector which can be recognized by a machine, correspondingly, the risk result of the user to be trained is assigned, for example, the model output feature of the user who is dangerous is assigned to 1, the model output feature of the user who is not dangerous is assigned to 0, and a pre-selected learning model is input for training to obtain a risk prediction model.

The learning model may be one or a combination of a classification tree model, a logistic regression model, and a neural network model, which are not specifically described herein.

Specifically, for each of a plurality of users to be predicted, the updated risk probability of the user to be predicted is determined according to the initial risk probability of the user to be predicted, the initial risk probabilities of other users to be predicted having a correlation with the user to be predicted, and the correlation between the user to be predicted and the other users to be predicted, as shown in fig. 6, specifically including the following steps S601 to S603:

before updating the initial risk probabilities of a plurality of users to be predicted, users with aggressive behaviors in a recent time period in history, such as drivers who have invaded passengers in the last 3 months, can be screened out from all users to be predicted, so that the users to be predicted are directly determined as dangerous users and the risk probability of the part of users to be predicted is 1 no matter what the initial risk probability of the users to be predicted is, and the risk probability 1 of the dangerous users does not participate in the updating process of the risk probability in the later period, thereby simplifying the updating process.

In addition, if the initial risk probability of the users to be predicted, who have not invaded passengers recently, is greater than or equal to the set threshold, the users to be predicted are also determined as dangerous users, and even if the risk probability of the users is reduced to be within the set threshold in the process of updating the risk probability, the fact that the users are dangerous users is not influenced.

Although it is not determined that the user to be predicted has a dangerous user by the danger prediction model for the user to be predicted, which has not recently invaded the passenger and has an initial danger probability smaller than the set threshold, if the finally updated danger probability is greater than or equal to the set threshold in the later-stage danger probability updating process, the part of the user to be predicted is determined as a dangerous user, and the following is an updating process for the danger probability of the user to be predicted, which has not recently invaded the passenger:

s601, taking the initial risk probability of each user to be predicted as the current risk probability.

And S602, aiming at each user to be predicted, updating the current risk probability of each user to be predicted according to the current risk probability of the user to be predicted, the current risk probabilities of other users to be predicted having correlation with the user to be predicted and the correlation between the user to be predicted and other users to be predicted.

Such as for one of the users to be predictedA, the other users to be predicted having correlation with the user A to be predicted comprise two users to be predicted, namely a user B to be predicted and a user C to be predicted respectively, if the current risk probability of the user A to be predicted is to be updated, the correlation degree between the user B to be predicted and the user A to be predicted is w_ABThe correlation degree of the user C to be predicted and the user A to be predicted is w_ACIf the user A to be predicted is based on the current risk probability of the user A to be predicted, the current risk probability of the user B to be predicted, the current risk probability of the user C to be predicted and the correlation degree w_ABCorrelation is w_ACUpdating, specifically, calculating the updated risk probability of the user to be predicted according to the following formula (1):

wherein, p (i) is the updated risk probability of the user i to be predicted; said p is_iThe current risk probability of the user i to be predicted is obtained; p is a radical of_jThe current risk probability of the jth user to be predicted, which has correlation with the user i to be predicted, is obtained; w is a_ijThe correlation between the user i to be predicted and the jth user to be predicted is obtained; n represents the number of other users to be predicted having a correlation with the user i to be predicted.

If there are two users to be predicted having a correlation with the user i to be predicted, as shown in fig. 7, which are a correlation graph between the user i to be predicted and the user i to be predicted having a correlation with the user i to be predicted, the updated risk probability of the user i to be predicted is represented by the following formula (2):

wherein p (i) is the updated risk probability of the user i to be predicted, p_iThe current risk probability of the user i to be predicted is obtained; p is a radical of₁The current risk probability of the 1 st user to be predicted, which is related to the user i to be predicted, is obtained; p is a radical of₂For the 2 nd user to be predicted having a correlation with the user i to be predictedA current probability of risk; w is a_i1The correlation between the user i to be predicted and the 1 st user to be predicted is obtained; w is a_i2Is the correlation between the user i to be predicted and the 2 nd user to be predicted.

For example, the current risk probability of the user i to be predicted is 0.5, p₁And p₂Are respectively 0.8 and 0.9, w_i1And w_i2Respectively 0.9 and 0.8, then the updated probability of the user i to be predicted is determined to be 0.65 through the formula (2), that is, the updated risk probability of the user i to be predicted is improved.

And S603, with the updated risk probability of each user to be predicted as the current risk probability, repeating the step of updating the current risk probability of each user to be predicted according to the current risk probability of the user to be predicted, the current risk probabilities of other users to be predicted having correlation with the user to be predicted and the correlation between the user to be predicted and other users to be predicted, and obtaining the finally updated risk probability of each user to be predicted after the set conditions are reached.

After the current risk probabilities of all users to be predicted, which have not invaded passengers recently, are updated once, namely, one updating is completed, after each updating, if the updating result does not reach the preset condition, the updated risk probability of each user to be predicted is taken as the current risk probability, updating is performed again according to the step S602 until the set updating frequency is reached, or the updating is stopped after the change rate of the current risk probability of the users to be predicted, which exceeds the set number, and the risk probability before updating is smaller than the set change rate, so that the final risk probability of all the users to be predicted is obtained.

And then determining whether the part of users to be predicted is dangerous users according to the final danger probability for the users to be predicted which are not determined to be dangerous users by the danger prediction model.

The setting conditions here are: and the set updating times are reached, or the change rate of the current risk probability of the users to be predicted, which exceeds the set number, and the risk probability before updating is smaller than the set change rate.

For example, the update frequency is set to 1000 times, and when the update frequency reaches 1000 times, the update is stopped to obtain the final risk probability of the plurality of users to be predicted, or, for 100 users to be predicted, if the change rate of the current risk probability of more than 80% of the users to be predicted and the risk probability before update is smaller than the set change rate, the update is stopped for the current risk probability of each user to be predicted.

An embodiment of the present application provides a risk prediction apparatus 800, as shown in fig. 8, including:

an information obtaining module 801, configured to obtain historical feature information of multiple users to be predicted.

A probability determining module 802, configured to determine, for each to-be-predicted user of multiple to-be-predicted users, a risk probability of the to-be-predicted user according to historical feature information of the to-be-predicted user, a correlation degree between the to-be-predicted user and another to-be-predicted user, historical feature information of another to-be-predicted user having a correlation with the to-be-predicted user, and a pre-trained risk prediction model.

In an embodiment, the probability determination module 802 may be specifically configured to:

and determining the initial risk probability of the user to be predicted according to the historical characteristic information of the user to be predicted and the risk prediction model.

And determining the initial risk probability of other users to be predicted, which are relevant to the user to be predicted, according to the historical characteristic information of other users to be predicted, which are relevant to the user to be predicted, and the risk prediction model.

In some embodiments, a model training module 803 is further included, and the model training module trains the risk prediction model according to the following steps:

and constructing a training sample library, wherein the training sample library comprises the characteristic information of a plurality of users to be trained in a preset historical time period and the risk result corresponding to the characteristic information.

And training to obtain a risk prediction model by taking the characteristic information as a model input characteristic and taking the risk result as a model output characteristic.

In one embodiment, the model training module 803 may be specifically configured to:

and determining a positive sample and a negative sample in the sample set according to the risk result.

And screening positive samples and negative samples in accordance with the proportion from the sample set according to the preset proportion of the positive samples and the negative samples to generate a training sample library.

In an embodiment, for each user to be predicted in the plurality of users to be predicted, the probability determining module 802 may be specifically configured to:

and taking the initial risk probability of each user to be predicted as the current risk probability.

And aiming at each user to be predicted, updating the current risk probability of each user to be predicted according to the current risk probability of the user to be predicted, the current risk probabilities of other users to be predicted having correlation with the user to be predicted and the correlation degree of the user to be predicted and other users to be predicted.

And repeating the step of updating the current risk probability of each user to be predicted according to the current risk probability of the user to be predicted, the current risk probability of other users to be predicted having correlation with the user to be predicted and the correlation between the user to be predicted and other users to be predicted by taking the updated risk probability of each user to be predicted as the current risk probability until the set conditions are met, and obtaining the finally updated risk probability of each user to be predicted.

In one embodiment, setting the condition includes:

In one embodiment, the probability determination module 802 calculates the updated risk probability of the user to be predicted according to the following formula:

In one embodiment, the system further includes a dangerous user determining module 804, and the dangerous user determining module 804 may be configured to:

and for any user to be predicted, if the initial risk probability of the user to be predicted is greater than or equal to a set threshold, determining that the user to be predicted is a dangerous user.

And if the initial risk probability of the user to be predicted is smaller than the set threshold value, and the updated risk probability of the user to be predicted is larger than or equal to the set threshold value, determining that the user to be predicted is a dangerous user.

In one embodiment, the historical characteristic information includes at least one of the following characteristic information:

The modules may be connected or in communication with each other via a wired or wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, etc., or any combination thereof. The wireless connection may comprise a connection over a LAN, WAN, bluetooth, ZigBee, NFC, or the like, or any combination thereof. Two or more modules may be combined into a single module, and any one module may be divided into two or more units.

The embodiment of the present application further provides an electronic device 900, where the electronic device 900 may be a general-purpose computer or a special-purpose computer, and both of them may be used to implement the service selection prediction method of the present application. Although only a single computer is shown, for convenience, the functions described herein may be implemented in a distributed fashion across multiple similar platforms to balance processing loads.

As shown in fig. 9, the electronic device 900 may include a network port 901 for connecting to a network, one or more processors 902 for executing program instructions, a communication bus 903, and a storage medium 904 in different forms, such as a disk, ROM, or RAM, or any combination thereof. Illustratively, the computer platform may also include program instructions stored in ROM, RAM, or other types of non-transitory storage media, or any combination thereof. The method of the present application may be implemented in accordance with these program instructions. The electronic device 900 also includes an Input/Output (I/O) interface 905 between the computer and other Input/Output devices (e.g., keyboard, display screen).

For ease of illustration, only one processor is depicted in the electronic device 900. However, it should be noted that the electronic device 900 in the present application may also include multiple processors, and thus steps performed by one processor described in the present application may also be performed by multiple processors in combination or individually. For example, if the processor of the electronic device 900 executes steps a and B, it is to be understood that steps a and B may also be executed by two different processors together or separately in one processor. For example, a first processor performs step a and a second processor performs step B, or the first processor and the second processor perform steps a and B together.

Taking a processor as an example, the processor 902 executes the following program instructions stored in the storage medium 904:

acquiring historical characteristic information of a plurality of users to be predicted.

And aiming at each user to be predicted in a plurality of users to be predicted, determining the risk probability of the user to be predicted according to the historical characteristic information of the user to be predicted, the correlation degree between the user to be predicted and other users to be predicted, the historical characteristic information of other users to be predicted having correlation with the user to be predicted and a pre-trained risk prediction model.

In one embodiment, the program instructions executed by the processor 902 specifically include:

And determining the initial risk probability of other users to be predicted, which are relevant to the user to be predicted, according to the historical characteristic information of the other users to be predicted, which are relevant to the user to be predicted, and the risk prediction model.

In one embodiment, program instructions executed by processor 902 train a risk prediction model according to the following steps:

In one embodiment, the program instructions executed by the processor 902 include, for each of a plurality of users to be predicted:

The setting conditions include:

In one embodiment, the updated risk probability for the user to be predicted is calculated by program instructions executed by processor 902 according to the following formula:

In one embodiment, the program instructions executed by the processor 902 further include:

The historical characteristic information comprises at least one of the following characteristic information:

Corresponding to the risk prediction methods in fig. 1 to 6, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the risk prediction method.

Specifically, the computer-readable storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, and when a computer program on the storage medium is executed, the risk prediction method can be executed, so as to solve the problem of low safety of the current riding environment.

Based on the same technical concept, embodiments of the present application further provide a computer program product, which includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the risk prediction method, and specific implementation may refer to the above method embodiments, and will not be described herein again.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A risk prediction method, comprising:

2. The method according to claim 1, wherein the determining, for each user to be predicted in the plurality of users to be predicted, a risk probability of the user to be predicted according to historical feature information of the user to be predicted, a correlation degree between the user to be predicted and other users to be predicted, historical feature information of other users to be predicted having a correlation with the user to be predicted, and a pre-trained risk prediction model comprises:

3. The method of claim 1 or 2, wherein the risk prediction model is trained according to the following steps:

4. The method of claim 3, wherein the constructing a training sample library comprises:

5. The method according to claim 2, wherein determining, for each of the plurality of users to be predicted, an updated risk probability of the user to be predicted according to the initial risk probability of the user to be predicted, the initial risk probabilities of other users to be predicted having a correlation with the user to be predicted, and the correlation between the user to be predicted and the other users to be predicted, comprises:

6. The method according to claim 5, wherein the setting conditions include:

7. The method of claim 5, wherein the updated risk probability for the user to be predicted is calculated according to the following formula:

wherein, p (i) is the updated risk probability of the user i to be predicted; p is a radical of_iThe current risk probability of the user i to be predicted is obtained; p is a radical of_jFor the j to-be-predicted user i having correlation with the to-be-predicted user iThe current probability of risk of the user; w is a_ijThe correlation between the user i to be predicted and the jth user to be predicted is obtained; n represents the number of other users to be predicted having a correlation with the user i to be predicted.

8. The method of claim 2, further comprising:

9. The method of claim 1, wherein the historical feature information comprises at least one of the following feature information:

10. A risk prediction device, comprising:

11. The apparatus of claim 10, wherein the probability determination module is specifically configured to:

12. The apparatus of claim 10 or 11, further comprising a model training module that trains the risk prediction model according to the steps of:

13. The apparatus of claim 12, wherein the model training module is specifically configured to:

14. The apparatus according to claim 11, wherein the probability determination module is specifically configured to, for each of the plurality of users to be predicted:

15. The apparatus of claim 14, wherein the setting condition comprises:

16. The apparatus of claim 14, wherein the probability determination module calculates the updated risk probability for the user to be predicted according to the following formula:

wherein, p (i) is the updated risk probability of the user i to be predicted; p is a radical of_iThe current risk probability of the user i to be predicted is obtained; p is a radical of_jThe current risk probability of the jth user to be predicted, which has correlation with the user i to be predicted, is obtained; w is a_ijThe correlation between the user i to be predicted and the jth user to be predicted is obtained; n represents other to-be-predicted users having correlation with the to-be-predicted user iThe number of households.

17. The apparatus of claim 11, further comprising a dangerous user determination module configured to:

18. The apparatus of claim 10, wherein the historical characterization information comprises at least one of the following characterization information:

19. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the risk prediction method according to any one of claims 1 to 9.

20. A computer-readable storage medium, having stored thereon a computer program for performing, when executed by a processor, the steps of the risk prediction method according to any one of claims 1 to 9.