CN117455652A

CN117455652A - Anti-fraud detection method, device, system and computer readable storage medium

Info

Publication number: CN117455652A
Application number: CN202311261464.3A
Authority: CN
Inventors: 朱峰; 张昆鹏; 陆俊; 原菁菁; 高舒; 杜依迪; 马淞
Original assignee: Postal Savings Bank of China Ltd
Current assignee: Postal Savings Bank of China Ltd
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2024-01-26

Abstract

The application discloses an anti-fraud detection method, device, system and computer readable storage medium, wherein the anti-fraud detection method comprises the following steps: acquiring characteristic information of a first user; determining a risk user to be determined according to the characteristic information of the first user; acquiring outbound data of the undetermined risk user; and constructing an anti-fraud detection statistical model according to the outbound data of the undetermined risk user, and performing anti-fraud detection on the undetermined risk user according to the anti-fraud detection statistical model to obtain the fraud risk level of the undetermined risk user. According to the anti-fraud detection method, more dimensionality outbound data of the undetermined risk user are further acquired by combining an intelligent outbound technology, the behavior characteristics of fraud partners are fully considered, an anti-fraud detection statistical model is built to conduct user fraud risk identification, and the identification capacity of fraud is improved.

Description

Anti-fraud detection method, device, system and computer readable storage medium

Technical Field

The present invention relates to the field of anti-fraud detection technologies, and in particular, to an anti-fraud detection method, apparatus, system, and computer readable storage medium.

Background

In the banking system, a plurality of business scenes such as credit card credit, petty loan and the like exist, in the business, the requirement on the inspector is high, the business can not be processed in a large scale due to the great degree of working experience, the problem of low efficiency exists, and the system is not suitable for the financial application scene of the mobile internet age. Meanwhile, with the progress and development of technology, the fraud behavior becomes more specialized and systematic, and more fraudulent party patterns are presented. The user information presented by the fraudulent party is more and more similar to that of the common user, how to accurately judge the fraudulent party, and the less the influence on the common user is, the more accurately the fraudulent behavior is identified, which is a difficult problem to be solved.

The anti-fraud detection engine adopted by the current banking system mainly comprises two parts, namely an anti-fraud detection rule and an anti-fraud detection model. The anti-fraud rule is mainly used, the model algorithm is used as an auxiliary, and the algorithms related to the model are mainly an unsupervised algorithm, a social network algorithm and a deep learning algorithm.

However, the existing detection scheme based on the anti-fraud detection rule has high coupling degree with a specific service scene, and the detection scheme based on the anti-fraud detection model often lacks sufficient label data support, and does not fully consider the behavior characteristics of fraud partners, so that the accuracy of fraud detection needs to be further improved.

Disclosure of Invention

The embodiment of the application provides an anti-fraud detection method, an anti-fraud detection device, an anti-fraud detection system and a computer readable storage medium so as to improve the accuracy of fraud detection.

The embodiment of the application adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides an anti-fraud detection method, including:

acquiring characteristic information of a first user;

determining a risk user to be determined according to the characteristic information of the first user;

acquiring outbound data of the undetermined risk user;

and constructing an anti-fraud detection statistical model according to the outbound data of the undetermined risk user, and performing anti-fraud detection on the undetermined risk user according to the anti-fraud detection statistical model to obtain the fraud risk level of the undetermined risk user.

Optionally, the determining the risk waiting user according to the characteristic information of the first user includes:

clustering the first user by using a preset clustering algorithm according to the characteristic information of the first user to obtain a plurality of user groups;

and determining the undetermined risk user according to the characteristic information of the first user in each user group.

Optionally, the acquiring outbound data of the undetermined risk user includes:

Determining outbound configuration information according to the characteristic information of the undetermined risk user;

sending the outbound configuration information to an intelligent outbound system, so that the intelligent outbound system outbound the undetermined risk user according to the outbound configuration information;

and receiving outbound data of the undetermined risk user returned by the intelligent outbound system.

Optionally, the outbound configuration information includes questions of a pending risk user, the questions include a first type of questions and a second type of questions, and determining the outbound configuration information according to the feature information of the pending risk user includes:

determining a first type question of the undetermined risk user according to user identity associated information in the feature information of the undetermined risk user;

and determining the characteristic information of the same group of pending risk users with the similarity smaller than a preset similarity threshold value in the user group where the pending risk users are located, and determining a second type question of the pending risk users according to the characteristic information of the same group of pending risk users.

Optionally, the constructing an anti-fraud detection statistical model according to outbound data of the undetermined risk user, and performing anti-fraud detection on the undetermined risk user according to the anti-fraud detection statistical model, so as to obtain a fraud risk level of the undetermined risk user includes:

Constructing the anti-fraud detection statistical model according to outbound data of the undetermined risk user;

performing multidimensional statistical analysis according to the anti-fraud detection statistical model and outbound data of the undetermined risk user to obtain a risk score of the undetermined risk user;

and determining the fraud risk level of the undetermined risk user according to the risk score of the undetermined risk user.

Optionally, the outbound data includes at least one of call basic data, call content data and biometric data, and the performing multidimensional statistical analysis according to the anti-fraud detection statistical model and the outbound data of the risk user to be determined, where obtaining a risk score of the risk user to be determined in each dimension includes:

according to the conversation basic index dimension in the anti-fraud detection statistical model, carrying out statistical analysis on conversation basic data of the undetermined risk user to obtain conversation basic risk scores of the undetermined risk user; and/or the number of the groups of groups,

according to the conversation content index dimension in the anti-fraud detection statistical model, carrying out statistical analysis on conversation content data of the undetermined risk user to obtain conversation content risk scores of the undetermined risk user; and/or the number of the groups of groups,

And carrying out statistical analysis on the biological characteristic data of the undetermined risk user according to the biological characteristic index dimension in the anti-fraud detection statistical model to obtain the biological characteristic risk score of the undetermined risk user.

Optionally, the performing statistical analysis on the call content data of the pending risk user according to the call content index dimension, and obtaining the call content risk score of the pending risk user includes:

determining the correct number of answers of the undetermined risk user to the first type of question questions and the correct number of answers of the undetermined risk user to the second type of question questions according to the call content data of the undetermined risk user;

and determining the call content risk score of the undetermined risk user according to the correct answer number of the undetermined risk user to the first type question and the correct answer number of the undetermined risk user to the second type question.

In a second aspect, embodiments of the present application further provide an anti-fraud detection method, where the anti-fraud detection method includes:

acquiring characteristic information and/or outbound data of a second user;

performing anti-fraud detection by using an anti-fraud detection classification model according to the characteristic information and/or outbound data of the second user to obtain a fraud risk level of the second user;

The anti-fraud detection classification model is obtained through training based on fraud risk levels of undetermined risk users output by the anti-fraud detection statistical model.

Optionally, the anti-fraud detection classification model is trained by:

constructing training sample data of the anti-fraud detection classification model according to the characteristic information and/or outbound data of the undetermined risk user and the fraud risk level of the undetermined risk user;

training the anti-fraud detection classification model using training sample data of the anti-fraud detection classification model.

In a third aspect, embodiments of the present application further provide an anti-fraud detection apparatus, where the anti-fraud detection apparatus includes:

the first acquisition unit is used for acquiring the characteristic information of the first user;

the determining unit is used for determining a risk user to be determined according to the characteristic information of the first user;

the second acquisition unit is used for acquiring outbound data of the undetermined risk user;

the first detection unit is used for constructing an anti-fraud detection statistical model according to the outbound data of the undetermined risk user, and carrying out anti-fraud detection on the undetermined risk user according to the anti-fraud detection statistical model to obtain the fraud risk level of the undetermined risk user.

In a fourth aspect, embodiments of the present application further provide an anti-fraud detection apparatus, where the anti-fraud detection apparatus includes:

the third acquisition unit is used for acquiring the characteristic information and/or outbound data of the second user;

the second detection unit is used for performing anti-fraud detection by using an anti-fraud detection classification model according to the characteristic information and/or outbound data of the second user to obtain a fraud risk level of the second user;

In a fifth aspect, embodiments of the present application further provide an anti-fraud detection system, including:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform any of the anti-fraud detection methods of the first aspect or any of the anti-fraud detection methods of the second aspect.

In a sixth aspect, embodiments of the present application also provide a computer-readable storage medium storing one or more programs that, when executed by an anti-fraud detection system including a plurality of application programs, cause the anti-fraud detection system to perform any of the anti-fraud detection methods of the first aspect or to perform any of the anti-fraud detection methods of the second aspect.

The above-mentioned at least one technical scheme that this application embodiment adopted can reach following beneficial effect: according to the anti-fraud detection method, characteristic information of a first user is acquired; then determining a risk user to be determined according to the characteristic information of the first user; then, outbound data of the undetermined risk user is obtained; and finally, constructing an anti-fraud detection statistical model according to outbound data of the undetermined risk user, and performing anti-fraud detection on the undetermined risk user according to the anti-fraud detection statistical model to obtain the fraud risk level of the undetermined risk user. According to the anti-fraud detection method, more dimensionality outbound data of the undetermined risk user are further acquired by combining an intelligent outbound technology, the behavior characteristics of fraud partners are fully considered, an anti-fraud detection statistical model is built to conduct user fraud risk identification, and the identification capacity of fraud is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a schematic flow chart of an anti-fraud detection method according to an embodiment of the present application;

FIG. 2 is a flow chart of another anti-fraud detection method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an overall flow of anti-fraud detection in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an anti-fraud detection apparatus according to an embodiment of the present application;

FIG. 5 is a schematic diagram of another anti-fraud detection apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an anti-fraud detection system according to an embodiment of the present application.

Detailed Description

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

The anti-fraud detection engine adopted by the bank system mainly comprises an anti-fraud detection rule and an anti-fraud detection model, wherein the anti-fraud rule is taken as a main part, and a model algorithm is taken as an auxiliary part. The reasons for adopting rules as main and models as auxiliary are as follows:

First, the existing statistical model has poor detection effect on fraud. Taking credit service scenario as an example, fraud refers to the action of losing or refusing to return money after lending by some black products or individuals by packaging themselves into users with good credit. The fraud group can perform reverse cracking aiming at an interpretable model of the wind control personnel, so that the detection effect of a statistical model established based on subjective ideas of the wind control personnel is poor.

Second, sample data labels required by the existing deep learning model and the like are difficult to obtain, the data volume is relatively small, and the high-precision supervised learning model is difficult to train. There is a significant class imbalance problem with data due to the small proportion of fraud in the entire user population. Machine learning models tend to mark samples as non-fraud categories in fraud detection tasks, failing to effectively identify fraud.

Based on this, the embodiment of the present application provides a method for detecting anti-fraud, as shown in fig. 1, and provides a schematic flow chart of the method for detecting anti-fraud in the embodiment of the present application, where the method for detecting anti-fraud at least includes the following steps S110 to S140:

step S110, obtaining the characteristic information of the first user.

When the anti-fraud detection is performed, the characteristic information of the user needs to be acquired firstly, and the specifically acquired characteristic information of the user is mainly related to the behavior characteristics of fraud groups in an actual service scene. For example, in a credit business scenario, a fraudulent party generally uses a batch to purchase mobile phone cards, registers a plurality of accounts, or uses a cloud mobile phone simulation mode to perform fraud activities with the same operation in a centralized manner by using the plurality of accounts after finding a vulnerability of an anti-fraud system. Therefore, by analyzing the fraudulent party behavior of the service scene, the method can be found to have the following characteristics:

1) The information registered in the banking system has high correlation, such as home address, nickname, occupation, virtual number, etc.;

2) The mobile phone number, wiFi, equipment information (including cloud virtual mobile phones), software version, operating system and other software and hardware information have extremely strong similarity;

3) And simultaneously operating a plurality of accounts to conduct fraudulent activities.

Based on the behavior characteristics of the fraud group, the characteristic information of the user can be obtained from multiple dimensions, and for example, the method can comprise the following steps:

1) Registration information of the user: such as home address, nickname, occupation, virtual number, etc.;

2) Software and hardware information: such as mobile phone number, device information, software version, operating system, etc.;

3) Account activity information for the user: such as transaction amount, whether it is an active customer, whether it is a new user, etc.

Of course, the above feature information of several dimensions is merely an exemplary description in a service scenario, and those skilled in the art may flexibly select the feature information of the user according to an actual service scenario to perform subsequent analysis, which is not limited herein.

And step S120, determining a risk user to be determined according to the characteristic information of the first user.

Because users registered in the actual service system often have larger orders of magnitude, and users belonging to the fraudulent party may only occupy a small part, after the characteristic information of the users in the previous step is obtained, the users with undetermined risks need to be further detected from the characteristic information, namely, users obviously not belonging to the fraudulent party are filtered out, and the rest of users are used as users needing to further identify the fraud risk level.

And step S130, obtaining outbound data of the undetermined risk user.

The anti-fraud detection logic adopted by the bank system is mostly in a data cross-validation mode, a user image is built through the acquired user characteristic information such as the registration information, the software and hardware information and the like of the user, abnormal characteristic detection is carried out, on one hand, whether isolated points deviating from a normal user group appear or not can be judged through observing the user image, and on the other hand, a user with higher fraud risk can be found through observing time sequence change information of the user image, such as whether a transaction request of one user is more active in a short time and the like. However, this approach does not adequately consider the behavioral characteristics of the fraudulent party, and the identification of fraudulent activity remains to be improved.

Considering the continuous perfection of a voice dialogue system in recent years, the embodiment of the application further designs anti-fraud post detection logic, focuses on further collecting user related data with more dimensions in an intelligent outbound mode, and fully considers the behavior characteristics of fraudulent parties, thereby improving the identification capability of fraudulent activities.

And step S140, constructing an anti-fraud detection statistical model according to outbound data of the undetermined risk user, and carrying out anti-fraud detection on the undetermined risk user according to the anti-fraud detection statistical model to obtain fraud risk level of the undetermined risk user.

Because the intelligent outbound mode relates to interaction with the undetermined risk user, if the user is a fraudulent party user, more fraudulent behavior characteristics are easily exposed in the interaction process with the intelligent outbound system, namely the outbound data of the undetermined risk user can reflect the fraudulent risk of the undetermined risk user to a great extent. Therefore, the embodiment of the application can construct an anti-fraud detection statistical model based on the outbound data of the undetermined risk user obtained by the steps, and statistically analyze the outbound data of the undetermined risk user through the statistical analysis indexes defined in the anti-fraud detection statistical model, so that the fraud risk level of the undetermined risk user is identified, and whether the related departments need to be reported for risk early warning can be determined according to the fraud risk level of the undetermined risk user.

It should be noted that the anti-fraud detection statistical model is particularly suitable for fraud detection under the condition of insufficient tag data quantity, and the statistical analysis result obtained based on outbound data is more objective. Of course, for the case that the data volume is sufficient and can meet the training requirement of the deep learning model, the anti-fraud detection statistical model of the embodiment of the application can also be adopted for fraud detection.

According to the anti-fraud detection method, more dimensionality outbound data of the undetermined risk user are further acquired by combining an intelligent outbound technology, the behavior characteristics of fraud partners are fully considered, an anti-fraud detection statistical model is built to conduct user fraud risk identification, and the identification capacity of fraud is improved.

In some embodiments of the present application, the determining the pending risk user according to the characteristic information of the first user includes: clustering the first user by using a preset clustering algorithm according to the characteristic information of the first user to obtain a plurality of user groups; and determining the undetermined risk user according to the characteristic information of the first user in each user group.

Considering that fraud groups generally present a small-range aggregation characteristic and feature information is highly similar, when determining a user with undetermined risk, the embodiment of the application can firstly perform feature cluster analysis by utilizing a certain clustering algorithm according to the multidimensional feature information of the user to obtain a plurality of user groups. The user characteristic information in the same user group has higher similarity, and the risk of the users belonging to the fraudulent party is higher, so that unreasonable user groups which are aggregated can be detected by utilizing a certain anomaly detection algorithm such as XGBoost by combining the characteristic information of the users in each user group according to the behavior characteristics of the fraudulent party.

In the credit business scenario, the anomaly detection algorithm can determine whether the users in each user group are pending risk users from multiple aspects of credit data, loan situation, address book, location, loan time interval, third party information data and the like of the users. For example, the users registered in a certain area within a period of time have similar corresponding address information, mobile phone account and other user information, and initiate a credit request at the same time, and can be considered as pending risk users.

The preset clustering algorithm can be obtained based on clustering algorithms such as K-means (K mean), DBSCAN (Density-Based Spatial Clustering Algorithm with Noise-based noise spatial clustering algorithm) and the like, and particularly how to perform clustering can be flexibly determined by a person skilled in the art in combination with the prior art, and is not particularly limited herein.

In some embodiments of the present application, the acquiring outbound data of the pending risk user includes: determining outbound configuration information according to the characteristic information of the undetermined risk user; sending the outbound configuration information to an intelligent outbound system, so that the intelligent outbound system outbound the undetermined risk user according to the outbound configuration information; and receiving outbound data of the undetermined risk user returned by the intelligent outbound system.

Under different service scenes, the outbound data required to be acquired may be different, and the feature difference of different user groups is larger, and the corresponding outbound configuration may be different, so that the embodiment of the application can configure the outbound operation according to the multidimensional feature information of the undetermined risk user. Specific configuration information may include, for example, dimensions such as call flows, random question questions, redial intervals, and number of redials that need to be performed by the intelligent outbound system. Of course, how to determine outbound configuration information specifically, those skilled in the art may flexibly set according to actual service scenarios, which is not specifically limited herein.

And sending the determined outbound configuration information to an intelligent outbound system, wherein the intelligent outbound system can respectively carry out centralized call on the users with risk to be determined in each group according to the outbound configuration information corresponding to each user group, and record related outbound data.

In some embodiments of the present application, the outbound configuration information includes questions of a pending risk user, the questions include a first type of questions and a second type of questions, and determining the outbound configuration information according to the feature information of the pending risk user includes: determining a first type question of the undetermined risk user according to user identity associated information in the feature information of the undetermined risk user; and determining the characteristic information of the same group of pending risk users with the similarity smaller than a preset similarity threshold value in the user group where the pending risk users are located, and determining a second type question of the pending risk users according to the characteristic information of the same group of pending risk users.

The outbound configuration information of the embodiment of the application can comprise the setting of the question questions of the users with risk to be determined, and in view of the fact that the similarity of the characteristic information in the same user group is higher, the question questions can be set from two aspects, namely, information content which is common to the users and is not easy to remember and wrong, such as six-bit information after an identity card, family addresses, working units and the like and related information with small user identities are selected as first class question questions; and on the other hand, selecting the characteristic information with the minimum characteristic information similarity between the characteristic information similarity of the same group of pending risk users and the characteristic information similarity of the current pending risk users as a second type question.

For example, for the ith pending risk user, it may divide the same group of users into two parts, denoted m _i ，M ₀ Wherein:

the feature information for each dimension can be represented by a feature vector (x _i ,y _i ) Representing, namely, calculating the feature information with the farthest distance from the ith user feature, namely, the smallest similarity by using a similarity algorithm such as cosine similarity, and selecting n in the feature information ₀ The characteristic information is used as a second type of question, namely:

of course, besides the cosine similarity algorithm, those skilled in the art can flexibly adopt other forms of similarity algorithm according to actual requirements, and the similarity algorithm is not particularly limited herein.

For both types of question questions, if the user is a normal user, all or most of the question questions can be correctly answered, and if the user is a risk user belonging to a fraudulent party, since the fraudulent party usually carries out fraudulent actions by registering a plurality of account sets for the same operation, the question questions which can be answered correctly are relatively few. Thus, call content related data of the undetermined risk user can be further collected based on the setting of the questioning questions in the two aspects, so that further data support is provided for identifying fraudulent behaviors.

In some embodiments of the present application, the constructing an anti-fraud detection statistical model according to outbound data of the pending risk user, and performing anti-fraud detection on the pending risk user according to the anti-fraud detection statistical model, to obtain a fraud risk level of the pending risk user includes: constructing the anti-fraud detection statistical model according to outbound data of the undetermined risk user; performing multidimensional statistical analysis according to the anti-fraud detection statistical model and outbound data of the undetermined risk user to obtain a risk score of the undetermined risk user; and determining the fraud risk level of the undetermined risk user according to the risk score of the undetermined risk user.

The statistics index dimension and the statistics mode related in the anti-fraud detection statistics model can be defined in advance, but different service scenes and different user groups, and the data dimension related to the actually collected outbound data of the users with undetermined risks can be different, so that the embodiment of the application can construct the current anti-fraud detection statistics model according to the actually collected outbound data, and mainly the related statistics index dimension.

Based on the established anti-fraud detection statistical model, multidimensional statistical analysis is carried out on the obtained outbound data of the undetermined risk users, the risk score of each undetermined risk user can be calculated, and further the risk grade of the undetermined risk users can be determined according to the risk scores of the undetermined risk users, so that more accurate and finer identification of fraud is realized in a quantitative mode.

The quantification of the risk score and the corresponding relation between the risk score and the risk grade can be defined in advance, for example, the range of the risk score can be defined to be 0-1, wherein the risk grade corresponding to 0-0.6 is high, the risk grade corresponding to 0.6-0.8 is low, namely, the higher the risk score is, the lower the risk grade is. Of course, this is merely an exemplary description of how to quantify the risk scores and categorize the risk levels, and those skilled in the art can flexibly set the requirements, and is not specifically limited herein.

In some embodiments of the present application, the outbound data includes at least one of call basic data, call content data, and biometric data, and the performing multidimensional statistical analysis according to the anti-fraud detection statistical model and the outbound data of the pending risk user, and obtaining a risk score of the pending risk user in each dimension includes: according to the conversation basic index dimension in the anti-fraud detection statistical model, carrying out statistical analysis on conversation basic data of the undetermined risk user to obtain conversation basic risk scores of the undetermined risk user; and/or, according to the conversation content index dimension in the anti-fraud detection statistical model, carrying out statistical analysis on the conversation content data of the undetermined risk user to obtain a conversation content risk score of the undetermined risk user; and/or, according to the dimension of the biological characteristic index in the anti-fraud detection statistical model, carrying out statistical analysis on the biological characteristic data of the risk user to be determined, and obtaining the biological characteristic risk score of the risk user to be determined.

The outbound data in the embodiment of the present application may include at least one of call basic data, call content data, and biometric data, and the corresponding statistical index dimension set in the anti-fraud detection statistical model includes at least one dimension such as a call basic index dimension, a call content index dimension, and a biometric index dimension.

The call basic data refers to basic attribute data generated in the call process, for example, may include data such as a ringing duration time_ring and a call duration time_call, where for a normal user, the ringing duration is generally shorter, the call duration is relatively complete, and a fraudulent party is just opposite, so the embodiment of the present application may calculate a call basic risk score Voice score of a pending risk user in the following manner:

Voice score＝exp(1/time_ring)+In(time_call+1)， (4)

it can be seen that the longer the ringing duration, the shorter the call duration, and the lower the corresponding call base risk score. Of course, the above formula (4) is merely an exemplary description of calculating the risk score of the call base, and those skilled in the art may flexibly adjust to the actual requirements, which is not specifically limited herein.

The call content data mainly refers to data of the answer condition of the user to the question questions during the call, and may include, for example, answer results of the user to the first-type question questions and the second-type question questions set in the foregoing embodiment, total question number, and the like. And carrying out statistical analysis on answer results of the two types of questions through the conversation content index dimension, and calculating the number of questions with correct answer of the user, thereby calculating the conversation content risk score of the user with undetermined risk.

The biological characteristic data mainly comprise voiceprint information, face image information and the like of the user recorded in the conversation process, the voiceprint information, the face image information and the like of the user are compared and analyzed through biological characteristic index dimensions, whether the user is the user himself or herself is judged, and therefore biological characteristic risk scores of users with undetermined risks can be calculated.

In some embodiments of the present application, the performing statistical analysis on the call content data of the pending risk user according to the call content index dimension, to obtain a call content risk score of the pending risk user includes: determining the correct number of answers of the undetermined risk user to the first type of question questions and the correct number of answers of the undetermined risk user to the second type of question questions according to the call content data of the undetermined risk user; and determining the call content risk score of the undetermined risk user according to the correct answer number of the undetermined risk user to the first type question and the correct answer number of the undetermined risk user to the second type question.

When the statistical analysis of the call content index dimension is performed, the correct answer number right_number of the risk users to the first type question, the correct answer number feature_number of the second type question and the total question number sum_number of the risk users to be determined can be counted. If the user is a fraudulent party, the fewer the number of questions to answer correctly should be, so the embodiment of the present application can calculate the session content risk score Acc as follows:

It can be seen that the smaller the number of questions that are answered correctly, the lower the accuracy, i.e., the lower the call content risk score.

In some embodiments of the present application, the biometric data includes voiceprint data and/or facial image data, and the performing statistical analysis on the biometric data of the risk-to-be-determined user according to the biometric index dimension in the anti-fraud detection statistical model, to obtain a biometric risk score of the risk-to-be-determined user includes: verifying the user identity of the risk user to be determined according to the voiceprint data and/or the face image data of the risk user to be determined; and determining the biological feature risk score of the undetermined risk user according to the user identity of the undetermined risk user.

And comparing the voiceprint information and the face image information of the user recorded in the conversation process with the user information reserved in the system to judge whether the user is the user. The biometric risk score may be calculated specifically by:

the voiceprint_recovery is a biometric risk score obtained based on Voiceprint information of the user, and the face_recovery is a biometric risk score obtained based on Face image information of the user.

Based on the session base risk score, session content risk score, and biometric risk score mentioned in the foregoing embodiments, the final fraud risk score for the pending risk user may be calculated as follows:

Scoes＝Call_weight ⁿ *(ω ₁ *Voice score+ω ₂ Acc)*Voiceprint_recognition*Face_recognition， (8)

wherein omega ₁ And omega ₂ The method is characterized in that the method is used for adjusting weights of a Call basic risk score and a Call content risk score for a user-defined coefficient, n is the number of redial times, namely, the Call is not connected, the Call is dialed again after waiting for a period of time, the maximum number of redial times can be configured, the call_weight can be directly reported to be processed after exceeding a threshold value of the number of redial times, and the value is between (0 and 1).

It should be noted that, if the user phone is not dialed all the time, the data and Voiceprint information related to the call content cannot be obtained, so Acc and voiceprint_recording in the above formula (8) can be omitted accordingly. Since the Face image information is acquired in the case of the video call, if the user does not open the video call, the face_recovery in the above formula (8) may be omitted accordingly.

Of course, how to calculate the final score is specifically, those skilled in the art may flexibly set according to actual requirements, and is not specifically limited herein.

The embodiment of the application also provides another anti-fraud detection method, as shown in fig. 2, and provides a flow chart of the other anti-fraud detection method in the embodiment of the application, where the anti-fraud detection method at least includes the following steps S210 to S220:

step S210, obtaining characteristic information and/or outbound data of a second user;

step S220, performing anti-fraud detection by using an anti-fraud detection classification model according to the characteristic information and/or outbound data of the second user to obtain a fraud risk level of the second user;

As another anti-fraud detection scheme, the embodiment of the application can perform fraud risk identification through the pre-trained anti-fraud detection classification model, the detection effect of the scheme is greatly dependent on the earlier-stage tag data amount, and under the condition that the tag data amount is sufficient, the high-precision anti-fraud detection classification model can be trained, so that an accurate anti-fraud detection result can be obtained, and compared with the existing statistical detection scheme, the detection efficiency is higher.

When the anti-fraud detection classification model is actually applied to perform fraud risk identification, the feature information and/or outbound data of the second user need to be acquired first, and the specific user information is acquired, which is mainly dependent on the sample data type used for training the anti-fraud detection classification model in the training stage. For example, if the training phase is performed by directly using the feature information of the user and the corresponding fraud risk level tag, the application phase may directly acquire the feature information of the user as the input of the model, and if the training phase is performed by using the outbound data of the user and the corresponding fraud risk level tag, the application phase may need to acquire the outbound data of the user as the input of the model.

It should be noted that, the second user may be any user registered in the service system and needing fraud risk analysis, and of course, may also be a user with a risk to be determined, which is determined by using a clustering algorithm according to the feature information of the user in the foregoing embodiment.

The anti-fraud detection classification model of the embodiment of the application can be obtained by training or optimizing the fraud risk level of the undetermined risk user output by the anti-fraud detection statistical model in any embodiment, and tag data required for training the classification model is enriched through the anti-fraud detection statistical model, so that the anti-fraud detection classification model has enough detection precision.

In some embodiments of the present application, the anti-fraud detection classification model is trained by: constructing training sample data of the anti-fraud detection classification model according to the characteristic information and/or outbound data of the undetermined risk user and the fraud risk level of the undetermined risk user; training the anti-fraud detection classification model using training sample data of the anti-fraud detection classification model.

Because the fraud risk level corresponding to the user with the risk to be determined can be obtained based on the anti-fraud detection statistical model in the foregoing embodiment, these data can be used as a basis for training the anti-fraud detection classification model, that is, the corresponding user can be directly marked according to the fraud risk level output by the anti-fraud detection statistical model in the foregoing embodiment, and training sample data of the anti-fraud detection classification model is constructed by combining the feature information and/or outbound data of the user.

Of course, the method can also further combine the existing data in the current service system to perform joint modeling, so that the data volume is further enriched, and the accuracy of classification model detection is improved.

The network structure of the anti-fraud detection classification model can be obtained by using the existing network training such as LSTM (Long Short-Term Memory) and Transformer. The specific network architecture used for training can be flexibly selected by those skilled in the art in combination with the prior art, and is not specifically limited herein.

It should be noted that, the anti-fraud detection method in the above embodiments of the present application may be applied to various business scenarios such as credit card credit, small loan, etc. existing in a banking system, and of course, may also be flexibly extended to other business scenarios that may involve fraudulent activities of a user, such as behaviors of "happy wool" of a user that may occur in a marketing campaign, etc. specifically applied to which business scenarios, those skilled in the art may flexibly adjust according to actual needs, which is not limited in detail herein.

To facilitate an understanding of embodiments of the present application, an overall flow diagram of anti-fraud detection in embodiments of the present application is provided, as shown in fig. 3. The anti-fraud detection flow in the embodiment of the present application may be divided into two schemes, the first scheme is implemented based on an anti-fraud detection statistical model, and the second scheme is implemented based on an anti-fraud detection classification model.

The detection flow of the first scheme is as follows: the method comprises the steps of firstly obtaining characteristic information of a first user, then clustering the first user by using a preset clustering algorithm according to the characteristic information of the first user to obtain a plurality of user groups, and determining undetermined risk users according to the characteristic information of the first user in each user group. Later, outbound data of the undetermined risk user, including dimensions such as call basic data, call content data, biological feature data and the like, are acquired through an intelligent outbound system; and constructing an anti-fraud detection statistical model according to the outbound data of the undetermined risk user, and carrying out multi-latitude statistical analysis on the outbound data of the undetermined risk user according to the anti-fraud detection statistical model so as to calculate and obtain the fraud risk level of the undetermined risk user.

The detection flow of the second scheme can be specifically divided into a training stage of an anti-fraud detection classification model and an anti-fraud detection stage, and training sample data is constructed according to the fraud risk level of the user output by the anti-fraud detection statistical model in the first scheme in the training stage of the anti-fraud detection classification model, so that the sample data amount required by training of the classification model is enriched. After the trained anti-fraud detection classification model is obtained, the anti-fraud detection classification model can be used for identifying fraud, and the characteristic information and/or outbound data of the user are used as the input of the classification model to directly output the fraud risk level of the user.

In summary, the anti-fraud detection method of the present application at least achieves the following technical effects:

1) The embodiment of the application designs anti-fraud post detection logic, acquires more relevant data of the user by combining an intelligent outbound technology, performs data cross-validation, and fully considers the behavior characteristics of fraudulent parties, thereby improving the accuracy of fraud detection;

2) The tag data volume of the user is enriched through collection and analysis of outbound data, so that training and optimization of the classification model are facilitated, and the identification capacity of the classification model on fraudulent activity is improved.

The embodiment of the application also provides a device 400 for detecting anti-fraud, as shown in fig. 4, and provides a schematic structural diagram of the device 400 for detecting anti-fraud in the embodiment of the application, where the device 400 includes:

a first obtaining unit 410, configured to obtain feature information of a first user;

a determining unit 420, configured to determine a risk waiting user according to the feature information of the first user;

a second obtaining unit 430, configured to obtain outbound data of the undetermined risk user;

the first detection unit 440 is configured to construct an anti-fraud detection statistical model according to outbound data of the undetermined risk user, and perform anti-fraud detection on the undetermined risk user according to the anti-fraud detection statistical model, so as to obtain a fraud risk level of the undetermined risk user.

In some embodiments of the present application, the determining unit 420 is specifically configured to: clustering the first user by using a preset clustering algorithm according to the characteristic information of the first user to obtain a plurality of user groups; and determining the undetermined risk user according to the characteristic information of the first user in each user group.

In some embodiments of the present application, the second obtaining unit 430 is specifically configured to: determining outbound configuration information according to the characteristic information of the undetermined risk user; sending the outbound configuration information to an intelligent outbound system, so that the intelligent outbound system outbound the undetermined risk user according to the outbound configuration information; and receiving outbound data of the undetermined risk user returned by the intelligent outbound system.

In some embodiments of the present application, the outbound configuration information includes questions of a pending risk user, where the questions include a first type of questions and a second type of questions, and the second obtaining unit 430 is specifically configured to: determining a first type question of the undetermined risk user according to user identity associated information in the feature information of the undetermined risk user; and determining the characteristic information of the same group of pending risk users with the similarity smaller than a preset similarity threshold value in the user group where the pending risk users are located, and determining a second type question of the pending risk users according to the characteristic information of the same group of pending risk users.

In some embodiments of the present application, the first detection unit 440 is specifically configured to: constructing the anti-fraud detection statistical model according to outbound data of the undetermined risk user; performing multidimensional statistical analysis according to the anti-fraud detection statistical model and outbound data of the undetermined risk user to obtain a risk score of the undetermined risk user; and determining the fraud risk level of the undetermined risk user according to the risk score of the undetermined risk user.

In some embodiments of the present application, the outbound data includes at least one of call basic data, call content data, and biometric data, and the first detection unit 440 is specifically configured to: according to the conversation basic index dimension in the anti-fraud detection statistical model, carrying out statistical analysis on conversation basic data of the undetermined risk user to obtain conversation basic risk scores of the undetermined risk user; and/or, according to the conversation content index dimension in the anti-fraud detection statistical model, carrying out statistical analysis on the conversation content data of the undetermined risk user to obtain a conversation content risk score of the undetermined risk user; and/or, according to the dimension of the biological characteristic index in the anti-fraud detection statistical model, carrying out statistical analysis on the biological characteristic data of the risk user to be determined, and obtaining the biological characteristic risk score of the risk user to be determined.

In some embodiments of the present application, the first detection unit 440 is specifically configured to: determining the correct number of answers of the undetermined risk user to the first type of question questions and the correct number of answers of the undetermined risk user to the second type of question questions according to the call content data of the undetermined risk user; and determining the call content risk score of the undetermined risk user according to the correct answer number of the undetermined risk user to the first type question and the correct answer number of the undetermined risk user to the second type question.

It can be understood that the above anti-fraud detection apparatus can implement the steps of the anti-fraud detection method provided in the foregoing embodiment, and the explanation about the anti-fraud detection method is applicable to the anti-fraud detection apparatus and will not be repeated herein.

The embodiment of the present application further provides another anti-fraud detection apparatus 500, as shown in fig. 5, and a schematic structural diagram of another anti-fraud detection apparatus in the embodiment of the present application is provided, where the anti-fraud detection apparatus 500 includes:

a third obtaining unit 510, configured to obtain feature information and/or outbound data of the second user;

the second detecting unit 520 is configured to perform anti-fraud detection by using an anti-fraud detection classification model according to the feature information and/or outbound data of the second user, so as to obtain a fraud risk level of the second user;

FIG. 6 is a schematic diagram of the architecture of an anti-fraud detection system of an embodiment of the present application. Referring to fig. 6, at the hardware level, the anti-fraud detection system includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the anti-fraud detection system may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 6, but not only one bus or type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs, and forms an anti-fraud detection device on a logic level. And the processor executes the program stored in the memory.

The method performed by the anti-fraud detection apparatus disclosed in the embodiment shown in fig. 1 or fig. 2 of the present application may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

The anti-fraud detection system may also perform the method performed by the anti-fraud detection apparatus in fig. 1 or fig. 2, and implement the functions of the anti-fraud detection apparatus in the embodiment shown in fig. 1 or fig. 2, which are not described herein.

The present embodiments also provide a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an anti-fraud detection system comprising a plurality of application programs, enable the anti-fraud detection system to perform the method performed by the anti-fraud detection apparatus of the embodiments shown in fig. 1 or fig. 2.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A method of anti-fraud detection, the method comprising:

acquiring characteristic information of a first user;

acquiring outbound data of the undetermined risk user;

2. The anti-fraud detection method of claim 1, wherein the determining a pending risk user from the characteristic information of the first user comprises:

3. The anti-fraud detection method of claim 1, wherein the obtaining outbound data of the pending risk user comprises:

4. The anti-fraud detection method of claim 1, wherein the outbound configuration information includes questions of a pending risk user, the questions including a first type of questions and a second type of questions, the determining the outbound configuration information based on characteristic information of the pending risk user comprising:

5. The anti-fraud detection method according to claim 1, wherein the constructing an anti-fraud detection statistical model according to the outbound data of the undetermined risk user, and performing anti-fraud detection on the undetermined risk user according to the anti-fraud detection statistical model, and obtaining a fraud risk level of the undetermined risk user includes:

6. The anti-fraud detection method of claim 5, wherein the outbound data includes at least one of call base data, call content data, and biometric data, and wherein the performing a multi-dimensional statistical analysis based on the anti-fraud detection statistical model and the outbound data of the pending risk user, the obtaining a risk score of the pending risk user in each dimension comprises:

7. The anti-fraud detection method of claim 6, wherein the performing statistical analysis on the call content data of the pending risk user according to the call content indicator dimension to obtain a call content risk score of the pending risk user includes:

8. A method of anti-fraud detection, the method comprising:

acquiring characteristic information and/or outbound data of a second user;

the anti-fraud detection classification model is obtained based on fraud risk level training of the undetermined risk user output by the anti-fraud detection statistical model according to any one of claims 1 to 7.

9. The anti-fraud detection method of claim 8, wherein the anti-fraud detection classification model is trained by:

10. An anti-fraud detection apparatus, the anti-fraud detection apparatus comprising:

11. An anti-fraud detection apparatus, the anti-fraud detection apparatus comprising:

12. An anti-fraud detection system, comprising:

a processor; and

a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the anti-fraud detection method of any of claims 1 to 7 or to perform the anti-fraud detection method of any of claims 8 to 9.

13. A computer readable storage medium storing one or more programs, which when executed by an anti-fraud detection system comprising a plurality of application programs, cause the anti-fraud detection system to perform the anti-fraud detection method of any of claims 1-7 or to perform the anti-fraud detection method of any of claims 8-9.