CN114549018A

CN114549018A - Method and device for determining anti-fraud policy and electronic equipment

Info

Publication number: CN114549018A
Application number: CN202210126336.7A
Authority: CN
Inventors: 徐世界; 刘昊骋; 田建; 王天祺; 徐靖宇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-02-10
Filing date: 2022-02-10
Publication date: 2022-05-27

Abstract

The disclosure provides a method and a device for determining an anti-fraud policy and electronic equipment, and relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, cloud computing and natural language processing. The scheme is as follows: determining a plurality of object identifications to be processed, object data corresponding to each object identification and a feature vector sequence corresponding to each object identification; clustering the plurality of object identifiers according to the characteristic vector sequence to obtain at least one object identifier group; aiming at each object identification group, acquiring labels corresponding to at least part of object identifications in the object identification group, and determining the labels of the object identification group according to the labels corresponding to at least part of object identifications; according to the label of at least one object identification group and the object data corresponding to the object identification in each object identification group, the anti-fraud strategy aiming at the object identification is determined, the accuracy of the anti-fraud strategy is further improved, and the accuracy and the efficiency of anti-fraud detection are further improved.

Description

Method and device for determining anti-fraud policy and electronic equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, cloud computing and natural language processing, and in particular relates to a method and a device for determining an anti-fraud policy and an electronic device.

Background

At present, a great deal of fraud situations exist in participating objects in service activities of service platform organizations. For the fraud condition, a corresponding anti-fraud policy needs to be established to identify the fraud condition. In the related technology, fraud tag labeling is carried out manually based on object data of a large number of objects in a service platform, and a corresponding anti-fraud strategy is analyzed and determined, so that the cost is high, the time consumption is long, the accuracy of the determined anti-fraud strategy is low, and the anti-fraud detection efficiency is poor.

Disclosure of Invention

The disclosure provides a method and a device for determining an anti-fraud policy and electronic equipment.

According to an aspect of the present disclosure, there is provided a method of determining an anti-fraud policy, including: determining a plurality of object identifications to be processed, object data corresponding to each object identification and a feature vector sequence corresponding to each object identification; clustering the object identifications according to the characteristic vector sequences corresponding to the object identifications to obtain at least one object identification group; for each object identification group, acquiring labels corresponding to at least part of object identifications in the object identification group, and determining the labels of the object identification group according to the labels corresponding to at least part of object identifications; wherein the tag characterizes whether a fraud condition exists; and determining an anti-fraud policy aiming at the object identification according to the label of the at least one object identification group and the object data corresponding to the object identification in each object identification group.

According to another aspect of the present disclosure, there is provided an apparatus for determining an anti-fraud policy, including: the device comprises a first determining module, a second determining module and a processing module, wherein the first determining module is used for determining a plurality of object identifications to be processed, object data corresponding to each object identification and a characteristic vector sequence corresponding to each object identification; the clustering module is used for clustering the object identifiers according to the characteristic vector sequences corresponding to the object identifiers to obtain at least one object identifier group; the processing module is used for acquiring labels corresponding to at least part of object identifications in each object identification group, and determining the labels of the object identification groups according to the labels corresponding to the at least part of object identifications; wherein the tag characterizes whether a fraud condition exists; and the second determining module is used for determining an anti-fraud policy aiming at the object identification according to the label of the at least one object identification group and the object data corresponding to the object identification in each object identification group.

According to still another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of determining an anti-fraud policy set forth above in this disclosure.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method for determining an anti-fraud policy set forth above in the present disclosure.

According to yet another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method for determining an anti-fraud policy proposed above in the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic illustration of a determination flow of an anti-fraud policy;

FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 5 is a block diagram of an electronic device used to implement an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all meet the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.

At present, in the service activity of a service platform organization, a large number of fraud situations exist for participating objects. For the fraud condition, a corresponding anti-fraud policy needs to be established to identify the fraud condition. In the related technology, fraud tag labeling is carried out manually based on object data of a large number of objects in a service platform, and a corresponding anti-fraud strategy is analyzed and determined, so that the cost is high, the time consumption is long, the accuracy of the determined anti-fraud strategy is low, and the anti-fraud detection efficiency is poor.

In order to solve the above problem, the present disclosure provides a method and an apparatus for determining an anti-fraud policy, and an electronic device.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure, and it should be noted that the method for determining an anti-fraud policy according to the embodiment of the present disclosure is applicable to an apparatus for determining an anti-fraud policy, and the apparatus may be configured in an electronic device, so that the electronic device may perform a function of determining an anti-fraud policy.

The electronic device may be any device having a computing capability, for example, a Personal Computer (PC), a mobile terminal, a server, and the like, and the mobile terminal may be a hardware device having various operating systems, touch screens, and/or display screens, such as an in-vehicle device, a mobile phone, a tablet Computer, a Personal digital assistant, and a wearable device.

As shown in fig. 1, the method for determining the anti-fraud policy may include the following steps:

step 101, determining a plurality of object identifiers to be processed, object data corresponding to each object identifier, and a feature vector sequence corresponding to each object identifier.

In the embodiment of the present disclosure, the object identifiers to be processed may be in a service platform, and the service platform may be, for example, an internet platform, such as a mall, and the like. Taking the a mall as an example, the service activities of the a mall organization may be promotion activities of a mall and a mall merchant combined organization, such as a full discount activity, a discount activity, and the like.

In the embodiment of the present disclosure, the objects corresponding to the multiple object identifiers to be processed may be users who successfully participate in the service activity. Taking a sales promotion campaign with the service campaign being a mall as an example, users who successfully participated in the sales promotion campaign, i.e., users who purchased goods of merchants organizing the sales promotion campaign and used coupons provided by the sales promotion campaign.

In an embodiment of the present disclosure, the object data may include at least one of the following data: basic data, service data, operation behavior information, device information and the like. Taking the object corresponding to the object identifier as an example of the object in the mall a, the basic data may be personal basic information of the object, such as age, account number, gender, and the like.

In the embodiment of the present disclosure, the business data may be data generated by transacting business in the a mall, such as order placing related information, payment related information, financing related information, and the like. Wherein, the order-placing related information may include: the order-placing type, the price, the account number, the ip address, the time and the preference related information. Wherein, the payment related information may include: payment account number, payment amount, ip address when payment is made, payment time and the like. The financing related information may include: the financial type, the financial amount, the financial account number, the ip address during the financial management, the operation time related to the financial management and the like.

In the embodiment of the present disclosure, the operation behavior information may include a plurality of object behavior information, for example, login behavior information, click behavior information, browsing behavior information, staying behavior information, and other data. The login behavior information may include: login behavior, login time, login account, ip address during login and the like. The click behavior information may include: click behavior, click time, clicked page related information, click position in a page, account number when clicked, ip address when clicked, and the like. The staying behavior information may include: account number, ip address, page related information of stay, stay time and the like.

In the embodiment of the present disclosure, the device information may be identification information of a hardware device used by the object to log in the service platform. Such as a smart phone, a personal computer, a pad, etc. Taking the service platform as the mall a as an example, the device information may be identification information of a hardware device such as a smart phone or a personal computer used by the object to log in the mall a. The identification information of the hardware device is a number which can uniquely identify the hardware device, and the like, and can be set according to actual needs.

In the embodiment of the present disclosure, it should be noted that the ip address is determined according to the hardware device used by the object and the network to which the hardware device belongs. If the hardware devices are different, the ip addresses are different; the ip address is different if the network is different.

It should be noted that, the determining device of the anti-fraud policy in the embodiment of the present disclosure may obtain the object data in various public and legal compliance manners, for example, the object data may be obtained from a public data set, or the object data may also be obtained from a user after authorization of the user, which is not limited by the present disclosure.

In this embodiment of the present disclosure, for each object identifier, the determining device of the anti-fraud policy may determine, based on the object data corresponding to the object identifier, a feature vector sequence corresponding to the object identifier. The dimensions of the feature vector sequences corresponding to the object identifiers can be the same, so that clustering processing of the object identifiers is facilitated.

And 102, clustering the plurality of object identifications according to the characteristic vector sequences corresponding to the plurality of object identifications to obtain at least one object identification group.

In the embodiment of the present disclosure, the determining device of the anti-fraud policy may perform the process of step 102, for example, to determine a similarity between any two feature vector sequences in the plurality of feature vector sequences; and clustering the plurality of object identifications based on the similarity to obtain a clustering result. The number of the object identification groups may be preset, and the corresponding clustering result includes the number of the object identification groups. The clustering algorithm may be, for example, a K-Means clustering algorithm, a spatial clustering algorithm, a gaussian mixture model-based clustering algorithm, etc., and may be set according to actual needs.

103, acquiring labels corresponding to at least part of object identifications in the object identification groups aiming at each object identification group, and determining the labels of the object identification groups according to the labels corresponding to at least part of object identifications; wherein the tag characterizes whether a fraud condition exists.

In an example of the embodiment of the present disclosure, the determining apparatus of the anti-fraud policy may randomly extract at least part of the object identifiers in each object identifier group, push object data corresponding to the at least part of the object identifiers to a reviewer, manually label the object data by the reviewer, and return the object data to the determining apparatus of the anti-fraud policy.

In another example, when determining a plurality of object identifiers to be processed, the anti-fraud policy determining apparatus selects at least part of the object identifiers carrying the tags, so that at least part of the object identifiers in the object identifier group carry corresponding tags.

In embodiments of the present disclosure, the tags may include a first tag and a second tag. In one example, the first tag characterizes the presence of fraud and the second tag characterizes the absence of fraud. In another example, the first tag characterizes the absence of fraud and the second tag characterizes the presence of fraud. Wherein the first label and the second label can be characterized by different representations. For example, the first tag may be represented by the number 0 and the second tag may be represented by the number 1.

In an embodiment of the present disclosure, the tag includes a first tag and a second tag, wherein the first tag is indicative of the presence of the fraud condition and the second tag is indicative of the absence of the fraud condition. Correspondingly, the determining apparatus of the anti-fraud policy may execute the process of step 103, for example, to obtain, for each object identification group, a tag corresponding to at least part of the object identifications in the object identification group; determining the fraud ratio of the object identification group according to the number of the first labels and the number of the second labels in the labels corresponding to at least part of the object identifications; when the fraud ratio is greater than or equal to a preset ratio threshold, determining that the label of the object identification group is a first label; and when the fraud ratio is smaller than the ratio threshold value, determining that the label of the object identification group is a second label.

In an embodiment of the present disclosure, in an example, the determining device of the anti-fraud policy may determine the fraud ratio of the object identifier group by, for example, determining a ratio of the number of the first tags to the number of the second tags as the fraud ratio. In another example, the determining of the anti-fraud policy may determine the fraud ratio of the object identifier group by determining a total number of the first tags and the second tags, and determining a ratio of the number of the first tags to the total number as the fraud ratio.

In the embodiment of the disclosure, a fraud ratio is determined according to the number of the first tags and the second tags in the tags corresponding to at least part of the object identifiers, and the tags of the object identifier group are determined to be the first tags or the second tags according to the relationship between the fraud ratio and a ratio threshold, so as to screen out the object identifier group with a fraud condition, so as to extract an anti-fraud policy.

And 104, determining an anti-fraud policy aiming at the object identification according to the label of at least one object identification group and the object data corresponding to the object identification in each object identification group.

In the embodiment of the present disclosure, the anti-fraud policy determining apparatus may perform the process of step 104, for example, to determine a decision tree model according to the label of at least one object identification group and the object data corresponding to the object identification in each object identification group, where the decision tree model includes a plurality of nodes and a splitting condition of each node; the node represents data in the object data or statistical characteristics on a specified dimension determined based on the object data; and updating the existing anti-fraud strategy according to the plurality of nodes in the decision tree model and the splitting condition of each node to obtain the anti-fraud strategy aiming at the object identifier.

In the embodiment of the disclosure, after the decision tree model is determined, a plurality of nodes in the decision tree model and the splitting condition of each node can be fed back to an auditor to determine whether to adopt the decision tree model to update the existing anti-fraud policy; and after receiving the confirmation indication, updating the existing anti-fraud strategy according to the plurality of nodes in the decision tree model and the splitting condition of each node.

In the embodiment of the disclosure, according to the nodes of the decision tree model and the splitting conditions of the nodes, the anti-fraud rule applicable to the object in the latest period of time is obtained, and the existing anti-fraud policy is updated, so that the existing anti-fraud policy can be updated in time, and the accuracy of the anti-fraud policy is further improved.

In this embodiment of the present disclosure, after determining the anti-fraud policy for the object identifier, the apparatus for determining the anti-fraud policy may further perform the following steps: determining an object identifier to be identified and object data corresponding to the object identifier to be identified; and determining whether the object corresponding to the object identifier to be identified has a fraud condition according to the anti-fraud strategy and the object data corresponding to the object identifier to be identified.

In the embodiment of the disclosure, because the updated anti-fraud policy has high accuracy, according to the updated anti-fraud policy, a new fraud condition in a recent period of time can be identified and determined in time, so that the anti-fraud detection efficiency is further improved, and the wind control processing effect is further enhanced.

In summary, a plurality of object identifiers to be processed, object data corresponding to each object identifier, and a feature vector sequence corresponding to each object identifier are determined; clustering the object identifiers according to the characteristic vector sequences corresponding to the object identifiers to obtain at least one object identifier group; aiming at each object identification group, acquiring labels corresponding to at least part of object identifications in the object identification group, and determining the labels of the object identification group according to the labels corresponding to at least part of object identifications; and determining an anti-fraud policy aiming at the object identifier according to the tag of at least one object identifier group and the object data corresponding to the object identifier in each object identifier group when the tag represents whether fraud exists, further improving the accuracy of the anti-fraud policy and further improving the accuracy and efficiency of anti-fraud detection.

In order to further improve the accuracy of the anti-fraud policy, the accuracy of the feature vector sequence may be improved, that is, based on the object data, the statistical features and the behavior feature vectors are extracted and subjected to a stitching process to obtain the feature vector sequence, as shown in fig. 2, where fig. 2 is a schematic diagram according to the second embodiment of the present disclosure. The embodiment shown in fig. 2 may include the following steps:

step 201, a plurality of object identifiers to be processed and object data corresponding to each object identifier are determined.

Step 202, for each object identifier, performing statistical feature extraction processing on object data corresponding to the object identifier in at least one specified dimension to obtain a partial feature vector sequence corresponding to the object identifier.

In this embodiment of the present disclosure, the process of the determining apparatus for anti-fraud policy performing step 202 may be, for example, to perform, for each object identifier, statistical feature extraction processing on object data corresponding to the object identifier in at least one specified dimension to obtain a statistical feature data sequence; and vectorizing and reducing the dimension of the statistical characteristic data sequence to obtain a partial characteristic vector sequence.

In an embodiment of the present disclosure, the statistical features of the specified dimension may include at least one of the following statistical features: the number of times of object login, the number of ips of object login, the number of objects using the same ips in the same time period as the object, the number of objects using the same ips and participating in the same service activities in the same time period as the object, the number of payment times, and the like.

The number of times of object login may be at least one of the following: the number of object logins per hour, the number of object logins per day, the number of object logins per N days, and the like. Wherein N is a positive integer greater than 1. The number of the ip of the object login can be at least one of the following: the number of ip object logins per hour, the number of ip object logins per day, the number of ip object logins per N days, etc.

The number of objects using the same ip in the same time period as the object may be at least one of the following: the number of objects that use the same ip as the object in each hour, the number of objects that use the same ip as the object every day, the number of objects that use the same ip as the object every N days, and the like. The number of objects which use the same ip and participate in the same service activity in the same time period as the object, wherein the same time period may be every hour, every day, every N days, or the like. The payment times can be at least one of the following times: the number of times of payment in each hour, the number of times of payment per day, the number of times of payment per N days and the like.

In the embodiment of the disclosure, the statistical characteristic data sequence is obtained by extracting the statistical characteristic in at least one specified dimension, so that the statistical characteristic quantity in the statistical characteristic data sequence can be increased; the statistical characteristic data sequence is processed to obtain a partial characteristic vector sequence, the object characteristics can be accurately represented, and the accuracy of the characteristic vector sequence is further improved, so that the accuracy of the anti-fraud strategy determined and obtained is further improved.

In the embodiment of the present disclosure, the determining device of the anti-fraud policy may perform vectorization processing and dimension reduction processing on the statistical characteristic data sequence, for example, perform vectorization processing on the statistical characteristic data sequence to obtain a statistical characteristic vector sequence; respectively carrying out dimensionality reduction processing on the statistical feature vector sequences according to at least one dimensionality reduction processing strategy to obtain at least one dimensionality-reduced feature vector sequence; and splicing at least one reduced-dimension feature vector sequence to obtain a partial feature vector sequence.

Wherein, the dimension reduction processing strategy can comprise at least one of the following strategies: principal Component Analysis (PCA), Auto Encoder (AE), and nonlinear dimension reduction (t-distributed systematic neighbor embedding, t-SNE).

In the embodiment of the present disclosure, because a large number of collinear or multiple collinear statistical features exist in the statistical feature vector sequence, if dimension reduction is not performed, clustering is directly performed, so that a clustering result is biased to be more collinear, but is a feature of an unimportant feature of a downstream task, and a clustering effect is poor. The dimension reduction processing is carried out on the statistical feature vector sequence, the collinear statistical features or the multiple collinear statistical features can be filtered, so that the collinear features do not exist in all the features in the feature vector sequence after dimension reduction, the condition of the non-important features biased to downstream tasks does not exist during clustering, the clustering effect is improved, the accuracy of the determined anti-fraud strategy is improved, and the anti-fraud detection efficiency is improved. In addition, the dimension reduction processing of the statistical characteristic data sequence can greatly reduce the calculation amount and improve the calculation speed.

Step 203, performing vectorization processing on the object behavior data corresponding to the object identifier to obtain a behavior feature vector corresponding to the object identifier.

In this embodiment of the present disclosure, the process of the determining apparatus for anti-fraud policy executing step 203 may be, for example, sequentially performing vectorization processing on a plurality of object behavior information in object behavior data corresponding to an object identifier to obtain a behavior feature vector sequence; and (4) taking the maximum value or averaging the multiple behavior feature vectors in the behavior feature vector sequence according to the position to obtain the behavior feature vector corresponding to the object identifier.

In the embodiment of the present disclosure, the vectorization processing algorithm, for example, a Word vector algorithm (Word2Vec), a deep walking algorithm (deep walk), etc., may be set according to actual needs.

In the embodiment of the present disclosure, the maximum value is taken bitwise, that is, the maximum value is taken for the same bit in the behavior feature vectors as the value of the bit in the behavior feature vector corresponding to the object identifier. The average value is obtained according to the position, that is, the average value is obtained for the same position in the plurality of behavior feature vectors, and the average value is used as the value of the position in the behavior feature vector corresponding to the object identifier.

In the embodiment of the disclosure, the maximum value and the average value of the same bit in a plurality of behavior feature vectors are taken as the numerical value of the bit in the behavior feature vector corresponding to the object identifier, so that the number of the feature vectors is reduced while important features are kept, and the accuracy of feature extraction is improved.

And step 204, splicing the partial characteristic vector sequence and the behavior characteristic vector to obtain a characteristic vector sequence.

Step 205, according to the feature vector sequences corresponding to the plurality of object identifiers, clustering the plurality of object identifiers to obtain at least one object identifier group.

Step 206, aiming at each object identification group, obtaining the labels corresponding to at least part of the object identifications in the object identification group, and determining the labels of the object identification group according to the labels corresponding to at least part of the object identifications; wherein the tag characterizes whether a fraud condition exists.

Step 207, determining an anti-fraud policy for the object identifier according to the tag of at least one object identifier group and the object data corresponding to the object identifier in each object identifier group.

It should be noted that details of step 205, step 206, and step 207 may refer to step 102, step 103, and step 104 in the embodiment shown in fig. 1, and details thereof are not described here.

In conclusion, a plurality of object identifications to be processed and object data corresponding to each object identification are determined; for each object identifier, performing statistical feature extraction processing on object data corresponding to the object identifier on at least one specified dimension to obtain a partial feature vector sequence corresponding to the object identifier; vectorizing the object behavior data corresponding to the object identifier to obtain a behavior feature vector corresponding to the object identifier; and splicing the partial characteristic vector sequence and the behavior characteristic vector to obtain a characteristic vector sequence. And clustering the plurality of object identifications according to the characteristic vector sequences corresponding to the plurality of object identifications to obtain at least one object identification group. Aiming at each object identification group, acquiring labels corresponding to at least part of object identifications in the object identification group, and determining the labels of the object identification group according to the labels corresponding to at least part of object identifications; wherein the tag characterizes whether a fraud condition exists; according to the label of at least one object identification group and the object data corresponding to the object identification in each object identification group, the anti-fraud strategy aiming at the object identification is determined, the efficiency of extracting the fraud rule is further improved, and the accuracy of extracting the fraud rule is further improved.

For example, fig. 3 is a schematic diagram of a determination process of an anti-fraud policy, and as shown in fig. 3, the determination apparatus of the anti-fraud policy extracts statistical features and behavior feature vectors according to customer basic data (basic data), customer service data (service data), and customer behavior data (operation behavior information), obtains an original feature wide table and a behavior feature vector sequence, performs dimensionality reduction by using a dimensionality reduction processing policy, obtains a low-dimensional feature wide table (partial feature vector sequence), and obtains a maximum value (maxpling) or an average value (Meanpooling) for a plurality of behavior feature vectors in the behavior feature vector sequence according to bits, to obtain a behavior feature vector corresponding to an object identifier. And splicing the partial characteristic vector sequence and the behavior characteristic vector to obtain a characteristic vector sequence. Clustering the object identifications according to the characteristic vector sequences corresponding to the object identifications to obtain a plurality of object identification groups, randomly sampling each object identification group, manually marking by examiners to obtain the fraud rate of each object identification group, performing decision tree two-classification training on the object identification groups with high fraud rate, and extracting an anti-fraud strategy.

The determination device of the anti-fraud policy provided by the present disclosure is explained with reference to fig. 4.

Fig. 4 is a schematic diagram according to a third embodiment of the present disclosure.

As shown in fig. 4, the apparatus 400 for determining an anti-fraud policy provided by the present disclosure includes: a first determination module 410, a clustering module 420, a processing module 430, and a second determination module 440.

The first determining module 410 is configured to determine a plurality of object identifiers to be processed, object data corresponding to each object identifier, and a feature vector sequence corresponding to each object identifier;

a clustering module 420, configured to perform clustering processing on the multiple object identifiers according to the feature vector sequences corresponding to the multiple object identifiers, so as to obtain at least one object identifier group;

a processing module 430, configured to obtain, for each object identifier group, tags corresponding to at least part of object identifiers in the object identifier group, and determine tags of the object identifier group according to the tags corresponding to the at least part of object identifiers; wherein the tag characterizes whether a fraud condition exists;

the second determining module 440 is configured to determine an anti-fraud policy for the object identifier according to the tag of the at least one object identifier group and the object data corresponding to the object identifier in each object identifier group.

It should be noted that the determination apparatus for anti-fraud policy provided in this embodiment may execute the determination method for anti-fraud policy of the foregoing embodiment, so as to implement anti-fraud policy with higher extraction accuracy.

It should be noted that the foregoing description of the embodiment of the method for determining an anti-fraud policy is also applicable to the apparatus for determining an anti-fraud policy provided in the present disclosure, and is not repeated herein.

As a possible implementation manner of the embodiment of the present disclosure, the object data includes object behavior data, and the first determining module 410 is further configured to determine a plurality of object identifiers to be processed, and object data corresponding to each object identifier; for each object identifier, performing statistical feature extraction processing on at least one specified dimension on object data corresponding to the object identifier to obtain a partial feature vector sequence corresponding to the object identifier; vectorizing the object behavior data corresponding to the object identifier to obtain a behavior feature vector corresponding to the object identifier; and splicing the partial characteristic vector sequence and the behavior characteristic vector to obtain the characteristic vector sequence.

As a possible implementation manner of the embodiment of the present disclosure, the first determining module 410 is further configured to, for each object identifier, perform statistical feature extraction processing on at least one specified dimension on object data corresponding to the object identifier to obtain a statistical feature data sequence; and vectorizing and reducing the dimension of the statistical characteristic data sequence to obtain the partial characteristic vector sequence.

As a possible implementation manner of the embodiment of the present disclosure, the first determining module 410 is further configured to perform vectorization processing on the statistical feature data sequence to obtain a statistical feature vector sequence; respectively carrying out dimensionality reduction processing on the statistical feature vector sequence according to at least one dimensionality reduction processing strategy to obtain at least one dimensionality-reduced feature vector sequence; and splicing the at least one reduced-dimension feature vector sequence to obtain the partial feature vector sequence.

As a possible implementation manner of the embodiment of the present disclosure, the dimension reduction processing policy includes at least one of the following policies: principal component analysis strategy, self-coding strategy and nonlinear dimension reduction strategy.

As a possible implementation manner of the embodiment of the present disclosure, the first determining module 410 is further configured to sequentially perform vectorization processing on the plurality of pieces of object behavior information in the object behavior data corresponding to the object identifier to obtain a behavior feature vector sequence; and taking the maximum value or averaging the multiple behavior feature vectors in the behavior feature vector sequence according to the position to obtain the behavior feature vector corresponding to the object identifier.

As a possible implementation manner of the embodiment of the present disclosure, the tag includes a first tag and a second tag, where the first tag indicates that a fraud condition exists, and the second tag indicates that the fraud condition does not exist; the processing module 430 is further configured to, for each object identifier group, obtain a tag corresponding to at least part of object identifiers in the object identifier group; determining the fraud ratio of the object identification group according to the number of first tags and the number of second tags in the tags corresponding to at least part of the object identifications; when the fraud ratio is greater than or equal to a preset ratio threshold, determining that the label of the object identification group is a first label; and when the fraud ratio is smaller than the ratio threshold value, determining that the label of the object identification group is a second label.

As a possible implementation manner of the embodiment of the present disclosure, the second determining module 440 is further configured to determine a decision tree model according to the label of the at least one object identification group and the object data corresponding to the object identification in each object identification group, where the decision tree model includes a plurality of nodes and a splitting condition of each node; the nodes represent data in the object data or statistical features on specified dimensions determined based on the object data; and updating the existing anti-fraud strategy according to the plurality of nodes in the decision tree model and the splitting condition of each node to obtain the anti-fraud strategy aiming at the object identification.

As a possible implementation manner of the embodiment of the present disclosure, the apparatus further includes: a third determination module and a fourth determination module; the third determining module is configured to determine an object identifier to be identified and object data corresponding to the object identifier to be identified; and the fourth determining module is used for determining whether the object corresponding to the object identifier to be identified has a fraud condition according to the anti-fraud policy and the object data corresponding to the object identifier to be identified.

The device for determining the anti-fraud policy of the embodiment of the disclosure determines a plurality of object identifiers to be processed, object data corresponding to each object identifier, and a feature vector sequence corresponding to each object identifier; clustering the object identifiers according to the characteristic vector sequences corresponding to the object identifiers to obtain at least one object identifier group; aiming at each object identification group, acquiring labels corresponding to at least part of object identifications in the object identification group, and determining the labels of the object identification group according to the labels corresponding to at least part of object identifications; according to the label of at least one object identification group and the object data corresponding to the object identification in each object identification group, the anti-fraud strategy aiming at the object identification is determined, the efficiency of extracting the fraud rule is further improved, and the accuracy of extracting the fraud rule is further improved.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all carried out on the premise of obtaining the consent of the user, and all accord with the regulation of related laws and regulations without violating the good custom of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as the determination method of the anti-fraud policy. For example, in some embodiments, the anti-fraud policy determination method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the above described determination method of an anti-fraud policy may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the determination method of the anti-fraud policy by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of determining an anti-fraud policy, comprising:

determining a plurality of object identifications to be processed, object data corresponding to each object identification and a feature vector sequence corresponding to each object identification;

clustering the object identifications according to the characteristic vector sequences corresponding to the object identifications to obtain at least one object identification group;

for each object identification group, acquiring labels corresponding to at least part of object identifications in the object identification group, and determining the labels of the object identification group according to the labels corresponding to at least part of object identifications; wherein the tag characterizes whether a fraud condition exists;

and determining an anti-fraud policy aiming at the object identification according to the label of the at least one object identification group and the object data corresponding to the object identification in each object identification group.

2. The method of claim 1, wherein the object data comprises: the object behavior data, determining a plurality of object identifiers to be processed, object data corresponding to each object identifier, and a feature vector sequence corresponding to each object identifier, includes:

determining a plurality of object identifications to be processed and object data corresponding to each object identification;

for each object identifier, performing statistical feature extraction processing on at least one specified dimension on object data corresponding to the object identifier to obtain a partial feature vector sequence corresponding to the object identifier;

vectorizing the object behavior data corresponding to the object identifier to obtain a behavior feature vector corresponding to the object identifier;

and splicing the partial characteristic vector sequence and the behavior characteristic vector to obtain the characteristic vector sequence.

3. The method according to claim 2, wherein the performing, for each object identifier, statistical feature extraction processing on at least one specified dimension on object data corresponding to the object identifier to obtain a partial feature vector sequence corresponding to the object identifier includes:

for each object identifier, performing statistical feature extraction processing on at least one specified dimension on object data corresponding to the object identifier to obtain a statistical feature data sequence;

and vectorizing and reducing the dimension of the statistical characteristic data sequence to obtain the partial characteristic vector sequence.

4. The method according to claim 3, wherein the vectorizing and dimensionality reduction processing the statistical feature data sequence to obtain the partial feature vector sequence includes:

vectorizing the statistical characteristic data sequence to obtain a statistical characteristic vector sequence;

respectively carrying out dimensionality reduction processing on the statistical feature vector sequence according to at least one dimensionality reduction processing strategy to obtain at least one dimensionality-reduced feature vector sequence;

and splicing the at least one reduced-dimension characteristic vector sequence to obtain the partial characteristic vector sequence.

5. The method of claim 4, wherein the dimension reduction processing policy comprises at least one of: principal component analysis strategy, self-coding strategy and nonlinear dimension reduction strategy.

6. The method according to claim 2, wherein the vectorizing the object behavior data corresponding to the object identifier to obtain the behavior feature vector corresponding to the object identifier includes:

sequentially carrying out vectorization processing on the plurality of object behavior information in the object behavior data corresponding to the object identification to obtain a behavior characteristic vector sequence;

and taking the maximum value or averaging the multiple behavior feature vectors in the behavior feature vector sequence according to the position to obtain the behavior feature vector corresponding to the object identifier.

7. The method of claim 1, wherein the tag comprises a first tag and a second tag, wherein the first tag is indicative of the presence of fraud and the second tag is indicative of the absence of fraud;

the obtaining, for each object identification group, tags corresponding to at least part of object identifications in the object identification group, and determining the tags of the object identification group according to the tags corresponding to at least part of object identifications, includes:

aiming at each object identification group, acquiring labels corresponding to at least part of object identifications in the object identification group;

determining the fraud ratio of the object identification group according to the number of first tags and the number of second tags in the tags corresponding to at least part of the object identifications;

when the fraud ratio is greater than or equal to a preset ratio threshold, determining that the label of the object identification group is a first label;

and when the fraud ratio is smaller than the ratio threshold value, determining that the label of the object identification group is a second label.

8. The method of claim 1, wherein the determining an anti-fraud policy for the object identifier according to the tag of the at least one object identifier group and the object data corresponding to the object identifier in each object identifier group comprises:

determining a decision tree model according to the label of the at least one object identification group and object data corresponding to the object identification in each object identification group, wherein the decision tree model comprises a plurality of nodes and the splitting condition of each node; the nodes represent data in the object data or statistical features on specified dimensions determined based on the object data;

and updating the existing anti-fraud strategy according to the plurality of nodes in the decision tree model and the splitting condition of each node to obtain the anti-fraud strategy aiming at the object identification.

9. The method of claim 1, wherein after determining an anti-fraud policy for the object identification, the method further comprises:

determining an object identifier to be identified and object data corresponding to the object identifier to be identified;

and determining whether the object corresponding to the object identifier to be identified has a fraud condition according to the anti-fraud strategy and the object data corresponding to the object identifier to be identified.

10. An anti-fraud policy determination apparatus, comprising:

the device comprises a first determining module, a second determining module and a processing module, wherein the first determining module is used for determining a plurality of object identifications to be processed, object data corresponding to each object identification and a characteristic vector sequence corresponding to each object identification;

the clustering module is used for clustering the object identifiers according to the characteristic vector sequences corresponding to the object identifiers to obtain at least one object identifier group;

the processing module is used for acquiring labels corresponding to at least part of object identifications in each object identification group, and determining the labels of the object identification groups according to the labels corresponding to the at least part of object identifications; wherein the tag characterizes whether a fraud condition exists;

and the second determining module is used for determining an anti-fraud policy aiming at the object identification according to the label of the at least one object identification group and the object data corresponding to the object identification in each object identification group.

11. The apparatus of claim 10, wherein the object data comprises: object behavior data, the first determination module further to,

12. The method of claim 11, wherein the first determination module is further configured to,

13. The method of claim 12, wherein the first determination module is further configured to,

and splicing the at least one reduced-dimension feature vector sequence to obtain the partial feature vector sequence.

14. The apparatus of claim 13, wherein the dimension reduction processing policy comprises at least one of: principal component analysis strategy, self-coding strategy and nonlinear dimension reduction strategy.

15. The apparatus of claim 11, wherein the first determining means is further for,

16. The apparatus of claim 10, wherein the tag comprises a first tag and a second tag, wherein the first tag is indicative of the presence of fraud and the second tag is indicative of the absence of fraud;

the processing module is further configured to,

17. The apparatus of claim 10, wherein the second determining means is further for,

18. The apparatus of claim 10, wherein the apparatus further comprises: a third determination module and a fourth determination module;

the third determining module is configured to determine an object identifier to be identified and object data corresponding to the object identifier to be identified;

and the fourth determining module is used for determining whether the object corresponding to the object identifier to be identified has a fraud condition according to the anti-fraud policy and the object data corresponding to the object identifier to be identified.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method according to any one of claims 1-9.