CN110322254B

CN110322254B - Online fraud identification method, device, medium and electronic equipment

Info

Publication number: CN110322254B
Application number: CN201910599798.9A
Authority: CN
Inventors: 王明英
Original assignee: Tongdun Holdings Co Ltd
Current assignee: Tongdun Holdings Co Ltd
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2022-12-16
Anticipated expiration: 2039-07-04
Also published as: CN110322254A

Abstract

The embodiment of the disclosure provides an online fraud identification method, an online fraud identification device, a computer-readable storage medium and electronic equipment, which relate to the technical field of computers, and the method comprises the steps of receiving a fraud identification request containing current data to be identified; acquiring the current attribute and the current attribute value of the current data to be identified; and comparing the current attribute and the current attribute value thereof with the cached fraud attribute set to obtain the online identification result of whether the current data to be identified is fraud data. In the technical scheme of the embodiment of the disclosure, the current attribute and the current attribute value of the current data to be identified are acquired and then compared with the cached fraud attribute set, so that the fraud identification of the current data to be identified is realized, and the online fraud identification can be simply and conveniently carried out without changing a pattern relation network.

Description

Online fraud identification method, device, medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an online fraud identification method and apparatus, a computer-readable storage medium, and an electronic device.

Background

With the continuous development of the internet, the black industry chain, namely black products, is also continuously developed, and specifically, the data leakage disaster, the black product technology development and the black product practitioner are huge.

When black products are identified based on the rule strategy, if risks occur, rules need to be added and modified manually, but early morning and rest periods are often selected for cheating and black product group cheating behaviors, and machine scripts are used for triggering regularly, so that the rule strategy is difficult to play in time.

When the supervised machine learning model is established for identifying the fraudulent conduct, the quantity of fraudulent black samples is small, clear reasons are often required to be given for the conduct intercepted by the model in actual business, and the interpretability of the supervised machine model is poor. In addition, as fraud and anti-fraud are continuously in the process of fighting, the supervised model learns the past fraud behavior characteristics, the fraud group changes the fraud means, and the interception capability of the model is greatly reduced. And (3) collecting black and white samples again, and performing the steps of model training, testing and publishing, wherein the process is tedious and time-consuming, and the requirement of real-time counterwork with black products is difficult to achieve.

How to simply and conveniently perform online fraud identification is a technical problem which needs to be solved urgently at present.

It is noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide an online fraud identification method, apparatus, computer-readable storage medium and electronic device, so as to overcome, at least to a certain extent, the problem that online fraud identification cannot be performed simply and conveniently.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the embodiments of the present disclosure, there is provided an online fraud identification method, including: receiving a fraud identification request containing current data to be identified; acquiring the current attribute and the current attribute value of the current data to be identified; and comparing the current attribute and the current attribute value with a cached fraud attribute set to obtain an online identification result of whether the current data to be identified is fraud data or not.

In some embodiments, the method further comprises: obtaining historical fraud data; acquiring historical attributes and historical attribute values of the historical fraud data by adopting an unsupervised or semi-supervised model; generating the fraud attribute set according to the historical attribute of the historical fraud data and the historical attribute value of the historical fraud data; and storing the fraud attribute set in a cache database.

In some embodiments, the comparing the current attribute and the current attribute value thereof with the cached fraud attribute set to obtain an online identification result of whether the current data to be identified is fraud data includes: if the current attribute is matched with the historical attribute in the fraud attribute set, judging whether the current attribute value is matched with the corresponding historical attribute value; and if the current attribute value is matched with the corresponding historical attribute value, the online identification result is that the current data to be identified is fraud data.

In some embodiments, determining whether the current attribute value matches the corresponding historical attribute value comprises: and if the historical attribute value is a conditional expression and the current attribute value of the current attribute meets the conditional expression, judging that the current attribute value is matched with the historical attribute value.

In some embodiments, determining whether the current attribute value matches the corresponding historical attribute value comprises: and if the historical attribute value is a first set numerical value and the current attribute value is equal to the historical attribute value, judging that the current attribute value is matched with the historical attribute value.

In some embodiments, if the historical attribute value is a vector or set of values and the distance between the current attribute value and the historical attribute value is less than a second set value, it is determined that the current attribute value matches the historical attribute value.

In some embodiments, the set of fraud attributes comprises fraud scores, each of the historical attribute values corresponding to a fraud score; wherein the method further comprises: and if the online identification result is that the current data to be identified is fraud data, obtaining fraud scores of the fraud data according to fraud scores corresponding to historical attribute values matched with the fraud data.

In some embodiments, the method further comprises: acquiring a reference identification result obtained after the data to be identified is subjected to fraud identification in a reference identification mode; obtaining an evaluation identifier of the online recognition result according to the online recognition result and the reference recognition result; determining the quality score of the historical attribute value according to the evaluation identifier of the online identification result corresponding to the historical attribute value; and when the quality score is smaller than a third set numerical value, deleting the historical attribute value corresponding to the quality score from the cache.

In some embodiments, the historical attribute values in the fraud attribute set have a set caching period; the method further comprises the following steps: and adjusting the cache validity period of the historical attribute value according to the quality score corresponding to the historical attribute value.

In some embodiments, the method further comprises: receiving fraud attribute change data; and updating the fraud attribute set in the cache according to the fraud attribute change data.

According to a second aspect of the embodiments of the present disclosure, there is provided an online fraud recognition apparatus including: the receiving unit is used for receiving a fraud identification request containing current data to be identified; the acquiring unit is used for acquiring the current attribute of the current data to be identified and the current attribute value of the current data to be identified; and the judging unit is used for comparing the current attribute and the current attribute value thereof with the cached fraud attribute set to obtain whether the current data to be identified is the online identification result of the fraud data.

According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the online fraud identification method as described in the first aspect of the embodiments above.

According to a fourth aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the online fraud identification method as described in the first aspect of the embodiments above.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in the technical solutions provided by some embodiments of the present disclosure, after the current attribute of the current data to be identified and the current attribute value thereof are obtained, the current attribute is compared with the cached fraud attribute set, so that fraud identification of the current data to be identified is realized without changing a graph relation network, and online fraud identification can be performed simply and conveniently.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure. It should be apparent that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived by those of ordinary skill in the art without inventive effort. In the drawings:

FIG. 1 schematically illustrates a flow diagram of an online fraud identification method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram for saving a fraud attribute set according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram of a line fraud identification method according to another embodiment of the present disclosure;

FIG. 4 schematically illustrates a block diagram of an online fraud identification apparatus according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of an online fraud identification apparatus according to another embodiment of the disclosure;

FIG. 6 schematically illustrates a block diagram of a computer system suitable for use with an electronic device that implements an embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the embodiments of the disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

In the related art, the real-time fraud detection can also be performed based on association diagram partitioning query or based on association diagram vector embedding. When the real-time fraud detection is carried out based on the association graph partitioning query, a heterogeneous relationship network is constructed based on the association among the user attributes, and then the relationship network is partitioned into single communities based on a community discovery algorithm. A small number of nodes on the relationship network may be tagged for fraud based on historical black samples. Real-time fraud detection can be performed by inserting the relational network data into the graph database. When real-time fraud detection is carried out, the community where the current node is located is inquired, and the fraud probability of the current inquiry can be returned according to the concentration and distribution of the fraud nodes in the community.

When the real-time fraud detection is carried out based on the relevance graph vector embedding, firstly, an isomorphic or heterogeneous relation network is constructed based on the attributes of users, and then nodes are converted into a numerical value vector form through a random walk or node2vec (Scalable Feature extraction for Networks) or k-means method. Clustering is performed on the feature vectors to form cluster clusters. And marking the cluster through historical black sample data to generate the fraud probability of each cluster. When real-time fraud detection is carried out, the query result is weighted in a k-nearest neighbor mode, and the fraud probability of the current sample is obtained.

The reasonable business meaning of the query result cannot be given based on graph association division query and real-time fraud detection based on association graph vector embedding, and the graph relation network updating complexity is high. The real-time fraud detection based on the relevance graph vector embedding also has the problem that the hyper-parameters are difficult to determine, so that the real-time fraud detection is difficult.

In order to solve the above problem, embodiments of the present disclosure provide an online fraud identification method, so as to improve interpretability of online fraud identification, simplify difficulty of rule extraction, and improve convenience of real-time fraud detection.

FIG. 1 schematically illustrates an online fraud identification method of an exemplary embodiment of the present disclosure. The method provided by the embodiment of the disclosure can be executed by any electronic device with computer processing capability, such as a terminal device and/or a server. Referring to fig. 1, an online fraud identification method provided by an embodiment of the present disclosure may include the following steps:

step S102, receiving a fraud identification request containing the current data to be identified.

The current data to be identified comes from log data transmitted by the client, and the current data to be identified can come from various scenes such as registration, login, transaction, payment, comment, posting and the like. The client initiates a fraud identification request after receiving the user operation.

And step S104, acquiring the current attribute of the current data to be identified and the current attribute value of the current data to be identified.

And step S106, comparing the current attribute and the current attribute value with the cached fraud attribute set to obtain the online identification result of whether the current data to be identified is fraud data.

According to the technical scheme of the embodiment of the disclosure, online fraud recognition is carried out by comparing the current attribute and the current attribute value with the cached fraud attribute set, and fraud recognition is carried out by comparing with a user rule strategy or a supervision model, so that the rules do not need to be added or modified manually when risks occur, model training does not need to be carried out again, and online fraud recognition can be carried out more simply and conveniently.

In an embodiment of the invention, shown in FIG. 2, a fraud attribute set is stored in a cache database according to the following steps:

step S202, historical fraud data is acquired.

And step S204, acquiring historical attributes of historical fraud data and historical attribute values of the historical fraud data by adopting an unsupervised or semi-supervised model.

And step S206, generating a fraud attribute set according to the historical attribute of the historical fraud data and the historical attribute value thereof.

And step S208, storing the fraud attribute set into a cache database.

The unsupervised or semi-supervised model has obvious advantages in interpretability and model iteration. The unsupervised or semi-supervised model digs fraud risk data from massive behavior data by exploring multi-dimensional fraud characteristics, and improves from single record judgment to gang mining.

Common unsupervised models include clustering, outlier detection, PCA (Principal components analysis), autoencode (self-encoding network), community discovery, and the like. Common semi-supervised models include simple self-training and co-training.

In the exemplary embodiment of the present disclosure, the historical fraud data is derived from log data generated by user operations, and supports all scenarios and all input data, which may include various scenarios such as registration, login, transaction, payment, comment, posting, and the like.

The offline mining needs batch data calculation, and the requirement of real-time risk interception is difficult to achieve. In the technical scheme of the embodiment of the disclosure, historical fraud data is mined offline to obtain a group with a high fraud probability, historical attributes and historical attribute values thereof are obtained according to the group with the high fraud probability, a fraud attribute set is further formed and stored in a cache database, comparison data is provided for real-time fraud identification, and the advantage of offline mining of an unsupervised or semi-supervised model is fully utilized.

The historical fraud data obtained in step S111 may include field details and fraud scores, where the field details may be user and service information and fraud group identification codes.

In step S112, the suspected fraud group can be obtained by batch calculation according to unsupervised and semi-supervised algorithms. The fraud group usually has correlation and similarity in multiple dimensions, and in order to be able to apply the characteristics of the fraud group to the online service, the correlation and similar characteristics inside the group need to be extracted.

Extracting historical attributes requires that the extracted rules have a definite meaning, such as matching a partnership or having a high similarity to a fraudulent partnership, based on common business fields.

Since the offline result is output regularly, the historical fraud data in the latest time window needs to be subjected to de-duplication and combination, and an initial fraud score is given. Here, the fraud score may be a number between 0 and 1, with a higher fraud score indicating a higher likelihood of fraud.

In step S112, k-v data consistent with the format of the cache data may be generated after the dirty data processing, structuring and normalization are performed on the above fields, and the k-v data may have multiple pieces, and finally the obtained k-v list is imported into the cache database in a k-v form. Here, the k-v form is a key-value form. Here, k may be a history attribute and v may be a history attribute value. The v forms are various and can be in the form of word expressions or numerical data.

When v is a conditional expression, logic may be used to determine, for example, to bring the preprocessed feature field into the expression, and determine whether the result is true. Similarly, "a >10& & b <8| | c = 'xx'".

In particular, the conditional expression may be derived from the partnership and association rules.

When the conditional expression is obtained according to the group partner center, because the same group partner has correlation and similarity, the highest frequency attribute value is solved for the correlation field, and the average value is solved for the similar field, so that the attribute value of the cheating group partner center can be obtained.

When the conditional expression is obtained according to the association rule, firstly, the continuous numerical variable is quantized, and the association rule representing the property of the cheating group is obtained through an association rule algorithm. The common association rule Algorithm includes a prior Algorithm (Apriori Algorithm), an association rule mining Algorithm (frequency Pattern growth, abbreviated as FP-growth), an Eclat Algorithm, and the like.

When v is numerical data, v may be in a numerical vector or in a set form, wherein the distance calculation formula between the current attribute value and v may be a euclidean distance, a cosine distance, or a Jaccard (Jaccard) distance, and is not limited thereto.

In order to ensure the accuracy of the result, an initial caching validity period needs to be set, for example, the initial caching validity period may be set to be one week. In step S114, the cache validity period of the data inserted into the cache may vary according to the fraud score.

In the exemplary embodiment of the present disclosure, the commonly used cache database may be a REmote data service (REmote directory Server, redis) database or an aerobridge database, and is not limited thereto. This requires merging the same k before importing the historical attributes and historical attribute values into the cache. The merging mode can be used for overwriting old data with new data or combining the new data and the old data in a mode of conditional or relational mode.

The rule extraction is only needed to be carried out on the groups with higher fraud probability, so the calculation complexity of the rule extraction is lower. If the rules need to be temporarily added or modified, only the modified part needs to be imported, and the reconstruction of the whole data is not needed. Specifically, it is necessary to receive fraud attribute modification data for updating a fraud attribute set, and update the fraud attribute set in the cache according to the fraud attribute modification data.

For the problems of false killing and missed killing reflected by the client, the operator can add and modify the historical attribute value in a manual operation mode and synchronize the historical attribute value into the cache, so that errors are eliminated in time, and the accuracy of implementing fraud identification is ensured. In addition, when a new risk case appears, the history attributes and the corresponding history attribute set thereof can be added into the cache.

The cache updating mechanism uses a full-data offline unsupervised model to extract rules and stores the extracted rules in a cache, which is helpful for identifying real-time fraud.

In step S106, the current attribute and the current attribute value thereof are compared with the cached fraud attribute set, that is, the fraud attribute set is queried according to the current attribute and the current attribute value thereof, and because the fraud attribute set is stored in the cache, the time consumption for querying is very small, and the query result can be obtained within millisecond time.

In step S106, when comparing the current attribute and the current attribute value thereof with the cached fraud attribute set, if the current attribute matches the historical attribute in the fraud attribute set, determining whether the current attribute value matches the corresponding historical attribute value; and if the current attribute value is matched with the corresponding historical attribute value, the online identification result is that the current data to be identified is fraud data.

In embodiments of the present disclosure, the historical attribute values may be conditional expressions, fixed numerical values, or vectors or sets of numerical values.

And if the historical attribute value is the conditional expression and the current attribute value of the current attribute meets the conditional expression, judging that the current attribute value is matched with the historical attribute value. Specifically, if the result of logic judgment of substituting the current attribute value into the conditional expression is true, the current attribute value is matched with the historical attribute value.

And if the historical attribute value is a first set numerical value and the current attribute value is equal to the historical attribute value, judging that the current attribute value is matched with the historical attribute value.

If the historical attribute value is a numerical vector or set and the distance between the current attribute value and the historical attribute value is less than a second set numerical value, the common unsupervised model comprises the matching of the current attribute value and the historical attribute value.

Specifically, when comparing the current attribute and the current attribute value thereof with the cached fraudulent attribute set, the real-time transformed k-v list needs to be traversed, the cache is queried one by one, and if k exists, the real-time transformed v and the corresponding v in the cache are compared. Here, the k-v list is a list of current attributes and current attribute values, and v in the cache is a historical attribute value.

And the fraud attribute set comprises fraud scores, each historical attribute value corresponds to a fraud score, and if the online identification result is that the current data to be identified is fraud data, the fraud score of the fraud data is obtained according to the fraud score corresponding to the historical attribute value matched with the fraud data.

In the exemplary embodiment of the disclosure, historical attributes and historical attribute values are formed through the similarity and relevance of a plurality of feature dimensions in an offline stage, and the fraud score of the historical attribute values is determined according to the statistical distribution rule of the historical attribute values, so that the identification reason containing business meanings can be obtained, and the interpretability of online identification results is improved.

And when the data to be identified is determined to be fraud data, returning the fraud scoring result and the online identification result together as a response result of the fraud identification request.

In the embodiment of the present disclosure, the historical attribute value in the fraud attribute set has a set cache validity period, and the cache validity period of the historical attribute value can be adjusted according to the quality score corresponding to the historical attribute value.

Specifically, the fraud identification capability of the historical attribute value can be judged by quality scoring the historical attribute value, the cache validity period of the historical attribute value can be prolonged when the quality score of the historical attribute value is high, and the cache validity period of the historical attribute value can be shortened when the quality score of the historical attribute value is low. Historical attribute values that exceed the period of cache validity are removed from the cache.

In the exemplary embodiment of the present disclosure, the online recognition result may be compared with a reference recognition result of the data to be recognized, an evaluation identifier of the online recognition result is obtained, and the quality score of the historical attribute value is adjusted according to the evaluation identifier.

Here, the online identification result may be compared with reference identification results of other wind control mechanisms such as offline batch unsupervised and rule policy in a periodic offline manner, and an evaluation flag of each online identification result is initialized to 0, and if the online identification result simultaneously hits any one of the reference identification results of the other wind control mechanisms or manual marking, the evaluation flag is set to 1.

Specifically, a reference identification result obtained by performing fraud identification on data to be identified in a reference identification mode can be obtained; obtaining an evaluation identifier of the online recognition result according to the online recognition result and the reference recognition result; and determining the quality score of the historical attribute value according to the evaluation identifier of the online identification result corresponding to the historical attribute value, wherein the quality score is beneficial to correcting the historical attribute value.

Specifically, online recognition results for each of the historical attribute values are associated, and a percentage of the online recognition results whose evaluation flag is 1 is calculated. And when the percentage is less than the set fourth numerical value, attenuating the quality score setting of the historical attribute value according to a setting rule.

And re-synchronizing the historical attribute value after the quality score is updated and the updated quality score into the cache, and deleting the historical attribute value corresponding to the quality score from the cache when the quality score is smaller than a third set numerical value.

The attribute rules in the cache can be automatically updated according to the quality scores, cross check is carried out on the unsupervised or semi-supervised models through the supervised models or rule strategies, the weight of the rules can be adaptively adjusted, and the accuracy of online fraud identification is improved.

As shown in fig. 3, in one embodiment of the present disclosure, performing online fraud identification includes the following steps:

step S301, cheating group partners are mined off line. Namely, the suspicious fraud group can be obtained through batch calculation according to historical fraud data.

In step S302, fraud attribute extraction is performed. And acquiring the historical attribute of the historical fraud data and the historical attribute value of the historical fraud data according to the suspicious fraud group.

Step S303, saving the fraud attribute to the cache. In addition, the modified historical attribute values can be added and saved in the cache in a manual operation mode.

And step S304, acquiring real-time service data.

Step S305, judging whether fraud exists according to the real-time service data and the cache data. The current attribute and the current attribute value are compared with the cached fraud attribute set to obtain an online identification result.

And step S306, carrying out risk processing on the fraud result. Namely fraud scoring and other rules of the online identification result, and setting a weighting coefficient and a fraud scoring threshold value so as to further process the online identification result.

Step S307, the suspicious result is evaluated. The online identification result is compared with the reference identification results of other wind control mechanisms such as offline batch unsupervised and regular strategies, and the evaluation result is obtained.

And step S308, updating the cache according to the evaluation result. Namely, after the quality score of the historical attribute value is adjusted according to the evaluation result, the historical attribute value after the quality score is updated and the updated quality score are cached again in the same step.

According to the online fraud identification method provided by the embodiment of the disclosure, after the current attribute and the current attribute value of the current data to be identified are obtained, the current attribute and the current attribute value are compared with the cached fraud attribute set, fraud identification of the current data to be identified is realized, and online fraud identification can be simply and conveniently carried out without changing a graph network relation.

Embodiments of the disclosed apparatus are described below, which may be used to perform the above-described online fraud identification method of the present disclosure. As shown in fig. 4, an online fraud recognition apparatus 400 provided according to an embodiment of the present disclosure may include:

the receiving unit 502 may be configured to receive a fraud identification request including data to be currently identified.

The obtaining unit 504 may be configured to obtain a current attribute of the current data to be identified and a current attribute value thereof.

The determining unit 506 may be configured to compare the current attribute and the current attribute value thereof with the cached fraud attribute set, and obtain an online identification result of whether the current data to be identified is fraud data.

The determining unit 406 may be further configured to, if the current attribute matches the historical attribute in the fraud attribute set, determine whether the current attribute value matches the corresponding historical attribute value; and if the current attribute value is matched with the corresponding historical attribute value, the online identification result is that the current data to be identified is fraud data.

The determining unit 406 may be further configured to determine that the current attribute value matches the historical attribute value if the historical attribute value is the conditional expression and the current attribute value of the current attribute satisfies the conditional expression; if the historical attribute value is a first set numerical value and the current attribute value is equal to the historical attribute value, judging that the current attribute value is matched with the historical attribute value; and if the historical attribute value is a numerical value vector or set and the distance between the current attribute value and the historical attribute value is less than a second set numerical value, judging that the current attribute value is matched with the historical attribute value.

As shown in fig. 5, another online fraud recognition apparatus 500 and online fraud recognition apparatus 400 provided in the embodiments of the present disclosure include not only a receiving unit 402, an obtaining unit 404, and a determining unit 406, but also a caching unit 502, a scoring unit 504, a deleting unit 506, an adjusting unit 508, and an updating unit 510.

The caching unit 502 may be configured to obtain historical fraud data; acquiring historical attributes and historical attribute values of historical fraud data by adopting an unsupervised or semi-supervised model; generating a fraud attribute set according to the historical attribute of the historical fraud data and the historical attribute value of the historical fraud data; the fraud attribute set is stored in a cache database.

The online fraud recognition apparatus 500 may further include a scoring unit 504, configured to, if the online recognition result is that the current data to be recognized is fraud data, obtain a fraud score of the fraud data according to the fraud score corresponding to the history attribute value matching the fraud data.

The online fraud recognition apparatus 500 may further include a deleting unit 506, configured to perform fraud recognition on the data to be recognized by using a reference recognition method, so as to obtain a reference recognition result; generating an evaluation identifier of the online recognition result according to the online recognition result and the reference recognition result; determining the quality score of the historical attribute value according to the evaluation identifier of the online identification result corresponding to the historical attribute value; and when the quality score is smaller than a third set numerical value, deleting the historical attribute value corresponding to the quality score from the cache.

The online fraud identification apparatus 500 may further include an adjusting unit 508 for adjusting the cache validity period of the historical attribute values according to the quality scores corresponding to the historical attribute values.

The online fraud identification apparatus 500 may further comprise an updating unit 510 for receiving fraud attribute modification data; and updating the fraud attribute set in the cache according to the fraud attribute modification data.

For details that are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the online fraud recognition method of the present disclosure for the details that are not disclosed in the embodiments of the apparatus of the present disclosure.

The online fraud recognition device of the embodiment of the disclosure realizes fraud recognition of the current data to be recognized by obtaining the current attribute of the current data to be recognized and the current attribute value thereof and comparing the current attribute with the cached fraud attribute set, and online fraud recognition can be simply and conveniently performed without changing the graph network relationship.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic devices of embodiments of the present disclosure. The computer system 600 of the electronic device shown in fig. 6 is only one example, and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for system operation are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. A driver 610 is also connected to the I/O interface 606 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that the computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609 and/or installed from the removable medium 611. The above-described functions defined in the system of the present application are executed when the computer program is executed by the Central Processing Unit (CPU) 601.

It should be noted that the computer readable storage medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the online fraud identification method as described in the above embodiments.

For example, the electronic device may implement the following as shown in fig. 1: step S102, receiving a fraud identification request containing current data to be identified; step S104, acquiring the current attribute of the current data to be identified and the current attribute value thereof; s106, comparing the current attribute and the current attribute value thereof with the cached fraud attribute set to obtain whether the current data to be identified is the online identification result of the fraud data.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, and may also be implemented by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An online fraud identification method, characterized in that the method comprises:

receiving a fraud identification request containing current data to be identified;

acquiring the current attribute and the current attribute value of the current data to be identified;

comparing the current attribute and the current attribute value thereof with a cached fraud attribute set to obtain an online identification result of whether the current data to be identified is fraud data or not, wherein the fraud attribute set comprises historical attributes and historical attribute values thereof;

the step of comparing the current attribute and the current attribute value thereof with the cached fraud attribute set to obtain the online identification result of whether the current data to be identified is fraud data comprises the following steps:

if the current attribute is matched with the historical attribute in the fraud attribute set, judging whether the current attribute value is matched with the corresponding historical attribute value;

if the current attribute value is matched with the corresponding historical attribute value, the online identification result is that the current data to be identified is fraud data, wherein the historical attribute value comprises one or more of the following items: conditional expressions, numerical data;

the method further comprises the following steps:

performing fraud identification on the data to be identified by adopting a reference identification mode to obtain a reference identification result;

generating an evaluation identifier of the online recognition result according to the online recognition result and the reference recognition result;

determining the quality score of the historical attribute value according to the evaluation identifier of the online identification result corresponding to the historical attribute value;

when the quality score is smaller than a third set numerical value, deleting a historical attribute value corresponding to the quality score from a cache;

the evaluation identifier of the online identification result comprises a target evaluation identifier; the determining the quality score of the historical attribute value according to the evaluation identifier of the online identification result corresponding to the historical attribute value comprises the following steps:

determining a hit ratio of an evaluation identifier belonging to a target evaluation identifier in an online recognition result corresponding to a target historical attribute value;

and if the hit ratio is smaller than a fourth set numerical value, attenuating the quality score setting of the target historical attribute value.

2. The method of claim 1, further comprising:

obtaining historical fraud data;

acquiring historical attributes and historical attribute values of the historical fraud data by adopting an unsupervised model or a semi-supervised model;

generating the fraud attribute set according to the historical attribute of the historical fraud data and the historical attribute value of the historical fraud data;

and storing the fraud attribute set in a cache database.

3. The method of claim 1, wherein determining whether the current attribute value matches the corresponding historical attribute value comprises:

and if the historical attribute value is a conditional expression and the current attribute value of the current attribute meets the conditional expression, judging that the current attribute value is matched with the historical attribute value.

4. The method of claim 1, wherein determining whether the current attribute value matches the corresponding historical attribute value comprises:

5. The method of claim 1, wherein determining whether the current attribute value matches the corresponding historical attribute value comprises:

and if the historical attribute value is a numerical value vector or set and the distance between the current attribute value and the historical attribute value is less than a second set numerical value, judging that the current attribute value is matched with the historical attribute value.

6. The method of claim 2, wherein the set of fraud attributes includes fraud scores, each of the historical attribute values corresponding to a fraud score; wherein the method further comprises:

and if the online identification result is that the current data to be identified is fraud data, obtaining fraud scores of the fraud data according to fraud scores corresponding to historical attribute values matched with the fraud data.

7. The method of claim 1, wherein the historical attribute values in the fraud attribute set have a set caching period; the method further comprises the following steps:

and adjusting the cache validity period of the historical attribute value according to the quality score corresponding to the historical attribute value.

8. The method of claim 1, further comprising:

receiving fraud attribute change data;

and updating the fraud attribute set in the cache according to the fraud attribute change data.

9. An online fraud identification apparatus, characterized in that the apparatus comprises:

the receiving unit is used for receiving a fraud identification request containing current data to be identified;

the acquiring unit is used for acquiring the current attribute of the current data to be identified and the current attribute value of the current data to be identified;

the judging unit is used for comparing the current attribute and the current attribute value thereof with a cached fraud attribute set to obtain an online identification result of whether the current data to be identified is fraud data or not, wherein the fraud attribute set comprises historical attributes and historical attribute values thereof;

the step of comparing the current attribute and the current attribute value with the cached fraud attribute set to obtain the online identification result of whether the current data to be identified is fraud data comprises the following steps:

the device further comprises:

the deleting unit is used for carrying out fraud identification on the data to be identified by adopting a reference identification mode to obtain a reference identification result; generating an evaluation identifier of the online recognition result according to the online recognition result and the reference recognition result; determining the quality score of the historical attribute value according to the evaluation identifier of the online identification result corresponding to the historical attribute value; when the quality score is smaller than a third set numerical value, deleting the historical attribute value corresponding to the quality score from the cache;

wherein the evaluation identifier of the online recognition result comprises a target evaluation identifier; the determining the quality score of the historical attribute value according to the evaluation identifier of the online identification result corresponding to the historical attribute value comprises the following steps:

determining the hit ratio of the evaluation identifier belonging to the target evaluation identifier in the online identification result corresponding to the historical attribute value;

and if the hit ratio is smaller than a fourth set numerical value, attenuating the quality score setting of the historical attribute value.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the online fraud identification method according to any one of claims 1 to 8.

11. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the online fraud identification method of any of claims 1 to 8.