CN117474678A - Method, device, equipment and medium for group partner mining based on heterogeneous knowledge graph - Google Patents

Method, device, equipment and medium for group partner mining based on heterogeneous knowledge graph Download PDF

Info

Publication number
CN117474678A
CN117474678A CN202311480333.4A CN202311480333A CN117474678A CN 117474678 A CN117474678 A CN 117474678A CN 202311480333 A CN202311480333 A CN 202311480333A CN 117474678 A CN117474678 A CN 117474678A
Authority
CN
China
Prior art keywords
data
node
nodes
knowledge graph
account
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311480333.4A
Other languages
Chinese (zh)
Inventor
廖小瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311480333.4A priority Critical patent/CN117474678A/en
Publication of CN117474678A publication Critical patent/CN117474678A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/027Frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method, a device, equipment and a medium for group excavation based on heterogeneous knowledge graph, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring real-time transaction data of a user in various service scenes, and screening a plurality of artificial features from the real-time transaction data, wherein the artificial features characterize basic features of a plurality of login accounts and a plurality of login devices of the user; constructing a knowledge graph based on a plurality of artificial features, wherein the knowledge graph comprises a plurality of nodes and edges connected with different nodes, each node represents a login account or login equipment, and each edge represents a relationship between a login account and the login equipment; generating potential feature vectors for each node in the knowledge graph based on a meta-path random walk strategy of the heterogram; clustering calculation is carried out on the plurality of potential feature vectors to obtain a plurality of clustering results, and a plurality of nodes corresponding to each clustering result are used as one piece of group data to form a plurality of group data.

Description

Method, device, equipment and medium for group partner mining based on heterogeneous knowledge graph
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the field of anti-fraud and knowledge graph mining, and more particularly, to a method, an apparatus, an electronic device, a medium, and a program product for partner mining based on heterogeneous knowledge graphs.
Background
With the deep combination of science and technology and finance, the electronic banking business scale is rapidly increased, and the electronic banking business can take the advantages of low cost, high efficiency and the like into account, is deeply favored by users, and is strived for new products in each big bank. However, electronic banking does not require the person to go to off-line sites to transact business, so that criminals are low in cost, and fraudsters often make illegal benefits to online fraudsters.
The current common solutions for electronic banking fraud scenarios mainly include the following five categories:
(1) Clustering-based anti-fraud algorithm: the method mainly uses behavior data of normal users and fraudulent molecules as data sources to cluster the data, and the clustering result clusters users with similar behaviors into a cluster, and the number of normal users is often larger, so that the number of users in the general cluster is smaller, and the users are generally fraudulent molecules. But this method is inferior in terms of interpretability.
(2) Classification-based anti-fraud algorithm: the data identification is mainly classified into normal data and fraudulent data, the marked data after being processed in advance is utilized for training, and then unknown data is input into a trained model for identification. The method has the problem of low recall rate caused by unbalanced data.
(3) Anti-fraud algorithm based on behavior patterns: the method mainly comprises two steps, wherein the first step takes normal user behavior as a standard normal behavior mode, and the second step compares an unknown behavior mode with the standard normal behavior mode, and if the difference is large, the behavior is proved to be fraudulent. However, financial business scenes are rich, and as the number of businesses increases, the problem of excessively high feature dimensions in a standard normal behavior pattern library can occur.
(4) Graph-based anti-fraud algorithm: the relationship between entities is utilized to make composition, and then the interaction relationship between entities is utilized to find out the fraudulent behavior and the fraudulent entity. At present, the method only supports composition under the same type of scene, and can not realize analysis of entity behaviors under different scenes.
(5) Anti-fraud algorithm based on deep learning: training is mainly performed in a deep learning model such as a convolutional neural network by using data already labeled with labels, and the algorithm is mainly based on a machine learning method of deep learning neural hidden layers, and hidden layer results are not usually interpretable.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a method, apparatus, electronic device, medium, and program product for partner mining based on heterogeneous knowledge graph.
According to a first aspect of the present disclosure, there is provided a method for mining a group partner based on a heterogeneous knowledge graph, including: acquiring real-time transaction data of a user in various service scenes, and screening a plurality of artificial features from the real-time transaction data, wherein the artificial features characterize basic features of a plurality of login accounts and a plurality of login devices of the user; constructing a knowledge graph based on a plurality of artificial features, wherein the knowledge graph comprises a plurality of nodes and edges connected with different nodes, each node represents a login account or login equipment, and each edge represents a relationship between a login account and the login equipment; generating potential feature vectors for each node in the knowledge graph based on a meta-path random walk strategy of the heterogram; clustering calculation is carried out on the plurality of potential feature vectors to obtain a plurality of clustering results, and a plurality of nodes corresponding to each clustering result are used as one piece of group data to form a plurality of group data.
According to an embodiment of the present disclosure, acquiring real-time transaction data of a user in a plurality of service scenarios includes, for each of the plurality of service scenarios, performing the following operations: establishing a risk rule according to the service scene, wherein the risk rule characterizes the risk rule possibly occurring in the service scene; collecting real-time transaction data of hit risk rules in the service scene by utilizing BDSP batch processing program; and generating the format data from the collected real-time transaction data.
According to an embodiment of the present disclosure, before generating the potential feature vector for each node in the knowledge-graph based on the meta-path random walk strategy of the heterograph, the method further includes: the association ratio of the login account and login equipment in the knowledge graph is 1: and pruning the part 1.
According to an embodiment of the present disclosure, generating potential feature vectors for each node in a knowledge-graph based on a meta-path random walk strategy of an iso-graph includes: constructing a heterogeneous knowledge graph G= (V, E) according to the relation between a login account and login equipment, wherein V represents a node set, the node set comprises account nodes and equipment nodes, and E represents an edge set of two adjacent nodes; the meta-path walk rule is defined in advance: every two adjacent nodes in the travelling path are nodes of different types, and the meta-path meets the symmetry principle; according to the meta-path migration rule, carrying out migration selection paths on nodes in the heterogeneous knowledge graph, and calculating the probability that any current node is transferred to any candidate node in a plurality of next candidate nodes; and determining a next target node of the current node from a plurality of next candidate nodes according to the optimization target with the maximized local probability and the negative sampling strategy, and determining potential feature vectors of the current node according to the next target node.
According to an embodiment of the present disclosure, the probability of any current node transitioning to any candidate node of the plurality of next candidate nodes is calculated according to the following principles: under the condition that any candidate node of the current node does not accord with the meta-path migration rule or the current node is not adjacent to any candidate node, determining that the probability of the current node and any candidate node is 0; and under the condition that any candidate node of the current node accords with the meta-path migration rule and the current node is adjacent to any candidate node, determining the probability of the current node and any candidate node as the reciprocal of the total number of all next candidate nodes meeting the condition.
According to an embodiment of the present disclosure, the method further comprises: extracting complaint keywords in each piece of partner data, and establishing a mapping relation between each piece of partner data and the corresponding complaint keywords; according to the potential feature vectors, calculating the similarity between each associated node in the knowledge graph, and taking the similarity as the credibility score of the corresponding side in the knowledge graph; and (3) integrating the type, the mapping relation and the credibility score of at least one account node of each group of data, and evaluating the risk level of each group of data.
According to an embodiment of the present disclosure, extracting complaint keywords in each partner data includes: screening at least one piece of partner data with complaint information from the plurality of pieces of partner data; for each of the at least one group data, word segmentation is carried out on complaint information of the partner data by utilizing a business basic information corpus to obtain a plurality of alternative keywords; calculating the weight of each candidate keyword in complaint information of the partner data by using a TF-IDF algorithm; the method comprises the steps of sorting a plurality of weights from large to small, selecting n alternative keywords corresponding to n weights ranked at the front as complaint keywords of the group data, wherein n is a positive integer.
According to an embodiment of the present disclosure, the method further comprises: and updating the n candidate keywords to a business basic information corpus so as to be used for extracting the next complaint keywords.
According to an embodiment of the disclosure, the types of account nodes include a normal account and an abnormal account, and the abnormal account includes a complaint account, a case related account and a freeze account; the risk level includes, in order, a normal transaction scenario, a low risk, a medium risk, and a high risk.
According to an embodiment of the present disclosure, the type, mapping relation, and trusted score of at least one account node of each of the group data are integrated, and the risk level of each of the group data is evaluated, including performing the following operations for any of the group data of the plurality of group data: under the condition that all account nodes in the group partner data are common accounts and all credible scores of the group partner data are larger than a preset threshold value, determining that the group partner data belong to a normal transaction scene; under the condition that all account nodes in the group partner data are common accounts and the number of nodes smaller than a preset threshold value in all credible scores of the group partner data exceeds a preset proportion, determining that the group partner data belong to low risk; determining that the group data belongs to high risk under the condition that at least one abnormal account exists in the group data and the number of nodes smaller than a preset threshold value in all credible scores of the group data exceeds a preset proportion; and under the condition that at least one abnormal account exists in the group partner data, and the number of nodes smaller than a preset threshold value in all the credible scores of the group partner data does not exceed a preset proportion, determining that the group partner data belongs to middle risk.
According to an embodiment of the present disclosure, the method further comprises: sequencing a plurality of risk levels from large to small, and selecting m pieces of partner data corresponding to m risk levels ranked at the front, wherein m is a positive integer; displaying the m pieces of group partner data and business keywords related to each piece of group partner data in the m pieces of group partner data; and (5) deriving equipment and account information related to the m pieces of partner data from the knowledge graph.
A second aspect of the present disclosure provides a group mining apparatus based on heterogeneous knowledge graph, including: the system comprises an artificial feature acquisition module, a storage module and a storage module, wherein the artificial feature acquisition module is used for acquiring real-time transaction data of a user in various service scenes, screening a plurality of artificial features from the real-time transaction data, wherein the artificial features represent basic features of a plurality of login accounts and a plurality of login devices of the user; the knowledge graph construction module is used for constructing a knowledge graph based on a plurality of artificial features, the knowledge graph comprises a plurality of nodes and edges connected with different nodes, each node represents a login account or login equipment, and each edge represents the relationship between a login account and the login equipment; the potential feature vector generation module is used for generating potential feature vectors for all nodes in the knowledge graph based on the meta-path random walk strategy of the heterogram; and the group partner mining module is used for carrying out clustering calculation on the plurality of potential feature vectors to obtain a plurality of clustering results, and taking a plurality of nodes corresponding to each clustering result as group partner data to form a plurality of group partner data.
According to an embodiment of the present disclosure, the apparatus further comprises: the mapping relation establishing module is used for extracting complaint keywords in each piece of partner data and establishing a mapping relation between each piece of partner data and the corresponding complaint keywords; the node similarity calculation module is used for calculating the similarity between each associated node in the knowledge graph according to the potential feature vector, and taking the similarity as the credibility score of the corresponding side in the knowledge graph; and the partner risk rating module is used for integrating the type, the mapping relation and the credibility score of at least one account node of each partner data and evaluating the risk level of each partner data.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the heterogeneous knowledge graph-based group mining method described above.
A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described heterogeneous knowledge graph based group mining method.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described heterogeneous knowledge graph based group mining method.
According to the heterogeneous knowledge graph-based group mining method, device, electronic equipment, medium and program product provided by the embodiment of the disclosure, a solution is provided for the problems of unstable relationship among users, unreasonable sample feature selection, unbalanced data, poor interpretability and the like existing in the traditional anti-fraud algorithm. Firstly, a relationship graph between a login account and login equipment is constructed based on real-time transaction data of a user in various business scenes. Next, potential feature vectors are generated for each node in the constructed knowledge-graph using metaath 2 Vec. And then, clustering calculation is carried out on the plurality of potential feature vectors, and a plurality of nodes in the constructed knowledge graph are divided into a plurality of pieces of partner data, so that each piece of partner data contains at least one node. In this way, a relatively stable relation graph between the equipment and the account is established, and the potential feature vectors of the single node are adaptively mined by using the Metapath2Vec, so that the mined features are objective as much as possible, and the influence caused by data imbalance is also relieved. By the method, the abnormal account or the fraud group partner where the abnormal equipment is located can be mined out to the maximum extent in time according to the real-time transaction data of the social network of the abnormal account under the background of the electronic bank.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates a system architecture of a heterogeneous knowledge graph based method and apparatus for partner mining in accordance with an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a heterogeneous knowledge-graph based group mining method, in accordance with an embodiment of the present disclosure;
3A-3E schematically illustrate a framework schematic diagram of a heterogeneous knowledge graph-based group mining method, in accordance with an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of acquiring real-time transaction data of a user according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of generating potential feature vectors according to an embodiment of the disclosure;
FIG. 6 schematically illustrates a flow chart of a partner risk rating after partner mining in accordance with an embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow chart of extracting complaint keywords in partner data according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow chart of risk level assessment of partner data in accordance with an embodiment of the present disclosure;
FIG. 9 schematically illustrates a flow chart after risk level assessment according to an embodiment of the present disclosure;
FIG. 10 schematically illustrates a block diagram of a heterogeneous knowledge graph based group mining apparatus, in accordance with an embodiment of the present disclosure;
FIG. 11 schematically illustrates a block diagram of a heterogeneous knowledge graph based group mining apparatus, in accordance with another embodiment of the present disclosure;
fig. 12 schematically illustrates a block diagram of an electronic device adapted to implement a heterogeneous knowledge-graph based group mining method, in accordance with an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Some of the block diagrams and/or flowchart illustrations are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, when executed by the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon, the computer program product being for use by or in connection with an instruction execution system.
In the technical scheme of the invention, the related user information (including but not limited to user personal information, user image information, user equipment information, such as position information and the like) and data (including but not limited to data for analysis, stored data, displayed data and the like) are information and data authorized by a user or fully authorized by all parties, and the processing of the related data such as collection, storage, use, processing, transmission, provision, disclosure, application and the like are all conducted according to the related laws and regulations and standards of related countries and regions, necessary security measures are adopted, no prejudice to the public welfare is provided, and corresponding operation inlets are provided for the user to select authorization or rejection.
In carrying out the inventive concepts of the present disclosure, the applicant found that: the traditional anti-fraud algorithm mainly has the following problems in two stages of user behavior pattern exploration and fraud group mining:
first, existing methods often perform group mining based on customer relationships, because the relationship between customers is relatively variable and unstable over time, and the difficulty of measuring whether two users are intimate is relatively high, and composition calculation is frequent and the calculation amount is large.
Secondly, the problems of unreasonable sample feature selection and unbalanced data are solved, the business scenes of the electronic bank are numerous, different important features are provided for each scene, and if artificial selection is adopted, the problem of excessive dependence on expert experience is generated, so that the later clustering effect is influenced. Secondly, in the electronic banking scene, most users are normal users, only few users are abnormal accounts, and if the problem of unbalanced data is not processed, false high accuracy can be generated.
Third, the anti-fraud algorithm has poor interpretability problems. Because the anti-fraud field is different from other fields, the events often violate laws and regulations, so after fraud is identified, the problems of the scheme and the related amount of money need to be fully demonstrated, and if the final result of the anti-fraud algorithm cannot be explained, the subsequent fraud cases are difficult to push due to lack of evidence.
Based on this, the embodiment of the disclosure provides a partner mining method based on heterogeneous knowledge graph, including: aiming at the problem of unstable client relationship, a knowledge graph relationship between a client and equipment is established, real-time transaction data of a user in various service scenes is obtained, a plurality of artificial features are screened from the real-time transaction data, and the artificial features characterize basic features of a plurality of login accounts and a plurality of login equipment of the user; aiming at the problems of unreasonable selection and unbalanced data of sample characteristics, a knowledge graph is constructed based on a plurality of artificial characteristics, the knowledge graph comprises a plurality of nodes and edges connected with different nodes, each node represents a login account or login equipment, and each edge represents the relationship between a login account and login equipment; generating potential feature vectors for each node in the knowledge graph based on a meta-path random walk strategy of the heterogram; aiming at the problem of poor interpretability of an anti-fraud algorithm, clustering calculation is carried out on a plurality of potential feature vectors to obtain a plurality of clustering results, a plurality of nodes corresponding to each clustering result are used as one piece of group data to form a plurality of group data, and a business basic information corpus is utilized to calculate corresponding complaint keywords, so that the fraud field and the fraud method of the group can be intuitively described.
Fig. 1 schematically illustrates a system architecture of a heterogeneous knowledge graph based group mining method and apparatus, in accordance with an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, an application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the partner mining method based on the heterogeneous knowledge graph provided in the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the partner mining device based on the heterogeneous knowledge graph provided by the embodiments of the present disclosure may be generally provided in the server 105. The heterogeneous knowledge graph based group mining method provided by the embodiments of the present disclosure may also be performed by a server or server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the partner mining apparatus based on the heterogeneous knowledge graph provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The partner mining method based on the heterogeneous knowledge graph according to the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 9 based on the system architecture described in fig. 1.
Fig. 2 schematically illustrates a flow chart of a heterogeneous knowledge graph based group mining method, in accordance with an embodiment of the present disclosure. Fig. 3A to 3E schematically illustrate a framework schematic diagram of a heterogeneous knowledge graph-based group mining method in accordance with an embodiment of the present disclosure. Fig. 3A is a general frame diagram, and fig. 3B to 3E are schematic diagrams of steps in fig. 3A in sequence.
As shown in fig. 2, the heterogeneous knowledge graph based group mining method of the embodiment may include operations S210 to S240, which may be performed by the server 105.
In operation S210, real-time transaction data of the user in various service scenarios is acquired, and a plurality of artificial features are screened from the real-time transaction data, wherein the artificial features characterize basic features of a plurality of login accounts and a plurality of login devices of the user.
In embodiments of the present disclosure, the user's consent or authorization may be obtained prior to obtaining the user's real-time transaction data. For example, before operation S210, a request to acquire user real-time transaction data may be issued to the user. Operation S210 is performed in case the user agrees or authorizes that the user real-time transaction data can be acquired.
This operation corresponds to Step1 in fig. 3B, where in the data preparation stage, real-time transaction data of the user in various business scenarios, such as various common scenarios where fraud may exist in the electronic bank, are collected. Then, from massive real-time transaction data, artificial features are selected according to expert experience for constructing a knowledge graph.
In operation S220, a knowledge graph is constructed based on the plurality of artificial features, the knowledge graph including a plurality of nodes and edges connecting different nodes, each node representing a login account or login device, each edge representing a relationship between a login account and a login device.
This operation corresponds to Step2 in fig. 3B, and when the knowledge graph is constructed, the relationship graph between the login account and the login device is selected to be constructed. Because the relationship graph between accounts is large in change along with the time, whether two users are close is difficult to measure, but the relationship between the accounts and the equipment cannot be suddenly changed in a short time. The method and the device are based on knowledge graph construction links of various business scenes, and the construction of the knowledge graph is not focused on business transactions between users, but a relation graph between equipment and the users is constructed.
In operation S230, potential feature vectors are generated for each node in the knowledge-graph based on the meta-path random walk strategy (metaath 2 Vec) of the iso-graph.
The operation corresponds to Step 3-Step 5 in fig. 3C, and potential feature vectors are generated for each node in the constructed knowledge graph by using metaath 2 Vec.
In operation S240, clustering computation is performed on the plurality of potential feature vectors to obtain a plurality of clustering results, and a plurality of nodes corresponding to each clustering result are used as one group data to form a plurality of group data.
The operation corresponds to Step6 to Step7 in fig. 3C, and finally, a plurality of nodes in the constructed knowledge graph are divided into a plurality of pieces of group data, so that each piece of group data contains at least one node.
By the embodiment of the disclosure, a solution is provided for the problems of unstable relationship among users, unreasonable sample feature selection, unbalanced data, poor interpretability and the like in the traditional anti-fraud algorithm. Firstly, a relationship graph between a login account and login equipment is constructed based on real-time transaction data of a user in various business scenes. Next, potential feature vectors are generated for each node in the constructed knowledge-graph using metaath 2 Vec. And then, clustering calculation is carried out on the plurality of potential feature vectors, and a plurality of nodes in the constructed knowledge graph are divided into a plurality of pieces of partner data, so that each piece of partner data contains at least one node. By the method, a relatively stable relation map between the equipment and the account is established, nodes and relations participating in calculation are screened, and later calculation is reduced. The potential feature vectors of the single node are adaptively mined by using the Metapath2Vec, so that the mined features are objective as much as possible, and the influence caused by data imbalance is relieved by a negative sampling strategy. By the method, the abnormal account or the fraud group partner where the abnormal equipment is located can be mined out to the maximum extent in time according to the real-time transaction data of the social network of the abnormal account under the background of the electronic bank.
Fig. 4 schematically illustrates a flowchart for acquiring real-time transaction data of a user according to an embodiment of the present disclosure.
As shown in fig. 3B and 4, in the embodiment of the present disclosure, the above-described operation S210 acquires real-time transaction data of a user in a plurality of service scenarios, including performing the following operations S411 to S413 for each of the plurality of service scenarios.
In operation S411, a risk rule is formulated according to the service scenario, where the risk rule characterizes a risk rule that may occur in the service scenario.
For example, a series of risk rules for the electronic bank, which may occur in fraud scenarios, are formulated according to expert rules, and these risk rules are configured online.
In operation S412, real-time transaction data of hit risk rules in the business scenario is collected using the BDSP batch handler.
Referring to Step1 in fig. 3B, the BDSP batch processing procedure is used to collect real-time transaction data during the data preparation phase prior to knowledge graph construction. The real-time transaction data may be, for example, user basic information, device information, activity account information, scenario information such as complaint users, case related fraudulent users, frozen user quantity ratio, etc. under the same device node.
For example, due to the large and complex amount of real-time transaction data, there are both betting fraud and frozen account information that has been maintained, and specific business scenario expert rules that need to be processed hit accounts and complaint accounts. Multiple BDSP batch program code may be written based on specific business scenario expert rules, each of which may be used to detect real-time transaction data that hits a certain risk rule.
In operation S413, the collected real-time transaction data is generated into formatted data. The data is processed into structured data, so that the data statistics and management are facilitated, and the subsequent establishment of a knowledge graph is facilitated.
Then, from massive real-time transaction data, for example, manual features are selected according to expert experience to construct a knowledge graph between the user and the device, and the business relationship is used for judging the fraudulent activity. The present disclosure no longer focuses on the construction of knowledge maps on business transactions from account to account, but rather on the construction of relationship maps between devices and accounts.
Specifically, a top-down knowledge graph construction mode can be adopted, and ontology construction of the knowledge graph is completed based on the business scene of the electronic bank. As shown in the following table 1, definition of entity class, relation among classes, self attribute of class, self attribute of relation and the like in the knowledge graph can be completed through combing fraud scenes, common lake table fields and expert experience which are common to electronic banks. The ontology of the knowledge graph mainly relates to concepts, attributes and relations. The concept mainly comprises two types of accounts and equipment, wherein the accounts can be divided into normal accounts and abnormal accounts, and the abnormal accounts comprise complaint accounts, frozen accounts, case-related accounts and the like. The "attribute" mainly contains basic information of the account, dynamic account information (active account information), and the like, wherein the complaint account also relates to the first complaint time, the last complaint event, and the like. The "relationship" mainly describes an association relationship between an account and a device, the relationship between an account and a device changing with different scenes. At the same time, the "relationship" may also have its own properties.
TABLE 1 description of variables of knowledge-graph
Among these, entity relationships are mainly usage relationships between accounts and devices, such as binding, logging in, transferring accounts. The device may be represented by a device fingerprint, recording the binding login relationship and time between the device and the account.
For example, the established knowledge graph may be stored in a JanusGraph graph database, where entities are represented by nodes, relationships are represented by edges connecting the entities, and properties that are entities and relationships are represented by key-value pairs.
In some embodiments, after the knowledge-graph is constructed based on the plurality of artificial features in operation S220 described above, other data than the artificial features in the real-time transaction data may be stored in the relational database.
In some embodiments, as shown in Step2 to Step3 in fig. 3B, before generating the potential feature vector for each node in the knowledge-graph based on the meta-path random walk strategy of the iso-graph in operation S230, the method further includes: the association ratio of the login account and login equipment in the knowledge graph is 1: and pruning the part 1. Because financial fraud often appears in the form of a partner, in this way, "island" data of one-to-one relation between a login account and login equipment is removed from a constructed knowledge graph, so that the number of nodes and relations participating in calculation in the follow-up process is reduced, the relation between the login account and the login equipment is more stable than that between accounts, the frequency of graph update is reduced, and the resource requirement is lower.
Because node information and side information in the knowledge graph are data which cannot be directly used as feature input models, the information needs to be converted into feature vectors.
Fig. 5 schematically illustrates a flow chart of generating potential feature vectors according to an embodiment of the disclosure.
As shown in fig. 3C and 5, in the embodiment of the present disclosure, the operation S230 generates potential feature vectors for each node in the knowledge-graph based on the meta-path random walk strategy of the iso-graph, which may specifically include operations S531 to S534.
In operation S531, a heterogeneous knowledge graph g= (V, E) is constructed according to the relationship between the login account and the login device, V represents a node set, the node includes an account node (user) and a device node (device), and E represents an edge set of two adjacent nodes.
In operation S532, the meta path walk rule is specified in advance: every two adjacent nodes in the travelling path are nodes of different types, and the meta-path meets the symmetry principle.
Specifically, this operation corresponds to Step4 in FIG. 3C, where the meta-path is a sequence of different node types that have different types of edges connected together. Firstly, the fixed element path migration rules are as follows: (1) user->device->user account node >Equipment node->An account node; (2) Device->user->device, i.e. device node>Account node->And (5) a device node. The predefined meta-path is defined as the following formula, two nodes V are adjacent to each other in the path i For different types of nodes, the next node of the account node is always the equipment node, E t For the edge between nodes, the meta-path must satisfy the symmetry principle.
In operation S533, according to the meta-path walk rule, a walk selection path is performed for the nodes in the heterogeneous knowledge graph, and a probability that any current node is transferred to any candidate node of the plurality of next candidate nodes is calculated.
In an embodiment of the present disclosure, the probability of any current node transitioning to any candidate node of the plurality of next candidate nodes in operation S533 described above is calculated according to the following principle:
under the condition that any candidate node of the current node does not accord with the meta-path migration rule or the current node is not adjacent to any candidate node, determining that the probability of the current node and any candidate node is 0;
and under the condition that any candidate node of the current node accords with the meta-path migration rule and the current node is adjacent to any candidate node, determining the probability of the current node and any candidate node as the reciprocal of the total number of all next candidate nodes meeting the condition.
That is, a meta-path migration rule is defined in advance, and if the meta-path migration rule is followed, the transition probability is the inverse of the number of neighbors of the specified type node, and if the meta-path migration rule is not followed, the transition probability is 0.
Specifically, based on a meta-path random walk strategy of the heterogram, a walk selection path is carried out on nodes in the knowledge graph, multiple choices are carried out on the nodes from the current node to the next node, and the probability of moving to the next node is as follows according to different node distribution conditions:
where ρ represents a predetermined meta-path walk rule,representing the current node of step i and of type device, v i+1 Representing the next candidate node, +.>Representing the current node +.>Transfer to candidate node v i+1 If the next candidate node of the current node meets the rule-consensus number of associated and meta-paths>The probability of the next transition to the candidate node is 1/N. If the next candidate node does not conform to the meta-path or there is no association between the two nodes, the probability is 0.
And, based on symmetry principle, the first node and the last node should be the same type of node.
In operation S534, a next target node of the current node is determined from among the plurality of next candidate nodes according to the optimization target and the negative sampling policy for the local probability maximization, and potential feature vectors of the current node are determined according to the next target node.
The optimization objective for wander is to maximize the local structural probability. To avoid softmax operations for each node at a time, a negative sampling strategy was adopted.
According to the embodiment of the disclosure, the potential feature vectors of all nodes in the knowledge graph are mined by using the Metapath2Vec, so that the objective and effective mined feature vectors can be ensured, the potential features of all business scenes can be represented, and the influence of data imbalance on calculation can be relieved by using a negative sampling strategy of the Metapath2 Vec.
Finally, as shown in Step6 to Step7 in fig. 3C, clustering calculation is performed on the mined potential feature vectors, and according to the clustering result, the nodes which are clustered into one class are regarded as a group partner, so that group partner data are obtained.
By the embodiment of the disclosure, a relatively stable relation map between equipment and an account is established, nodes and relations participating in calculation are screened, and later calculation is reduced. The potential feature vectors of the single node are adaptively mined by using the Metapath2Vec, so that the mined features are objective as much as possible, and the influence caused by data imbalance is relieved by a negative sampling strategy.
FIG. 6 schematically illustrates a flow chart of a partner risk rating after partner mining according to an embodiment of the present disclosure.
As shown in fig. 6, in some embodiments, after the above-described operation S240, the heterogeneous knowledge graph based group mining method may further include operations S610 to S630.
In operation S610, complaint keywords in each piece of partner data are extracted, and a mapping relationship between each piece of partner data and the corresponding complaint keywords is established.
This operation corresponds to Step8 to Step9 in fig. 3D, extracts complaint keywords from each piece of partner data, and establishes a mapping relationship between each complaint keyword and the corresponding piece of partner data.
In operation S620, according to the potential feature vectors, the similarity between the associated nodes in the knowledge-graph is calculated, and the similarity is used as a confidence score of the corresponding edge in the knowledge-graph.
This operation corresponds to Step 10-Step 11 in fig. 3E, and calculates the similarity between the account node and the device node by using the potential feature vector obtained in Step5, aiming at the problem of poor interpretability of the anti-fraud algorithm. And taking the calculated similarity as a credible score of the corresponding edge in the knowledge graph. The corresponding edge, i.e., the edge of two neighboring nodes (associated nodes) involved in the similarity calculation.
It will be appreciated that if the similarity is high, the device is proved to be the usual device for the account; otherwise, the device is the new use device; when this happens more in the same group data, there may be a problem of theft.
Thus, based on the credibility score determination process of node similarity, whether the partner has risk is judged from the equipment dimension.
In operation S630, the type, mapping relation, and trust score of at least one account node of each group data are integrated, and the risk level of each group data is evaluated.
The operation corresponds to Step9, step11 to Step12 in fig. 3E, and the risk level of each piece of the mined piece of the data is comprehensively evaluated by combining the obtained piece of the data, the mapping relation and the credible score. The complaint keywords in the mapping relation are combined with the credible scores obtained by similarity calculation, so that two evidence with different visual angles are provided for follow-up pursuit responsibility, objective evaluation on the partner is provided from the two angles of user feedback and credible equipment, and the interpretability of partner mining is improved.
By the embodiment of the disclosure, complaint information and equipment scores in the partner data are comprehensively considered, the behavior of the partner is explained from multiple angles, and the problem of poor interpretability of an anti-fraud algorithm is solved to a certain extent. By the method, under the background of an electronic bank, real-time transaction data of a social network of an abnormal account can be timely combined with individual fraud event complaint information, so that the abnormal account or fraud group partner where abnormal equipment is located can be mined out to the maximum extent, the group partner risk is rated, and therefore the high-risk account and equipment are timely sealed and controlled.
Fig. 7 schematically illustrates a flowchart of extracting complaint keywords in partner data according to an embodiment of the present disclosure.
As shown in fig. 3D and 7, in some embodiments, extracting complaint keywords in each group data in operation S610 may include operations S711 to S712.
At least one piece of partner data in which complaint information exists is screened out from the plurality of pieces of partner data in operation S711.
The complaint information may be a complaint of the user, such as a complaint unit, consumer information, business name, a complaint event, etc.
In operation S712, the following operations S712A to S712C are performed for each of the at least one partner data.
In operation S712A, complaint information of the group data is segmented using the business base information corpus to obtain a plurality of candidate keywords.
In operation S712B, a weight of each candidate keyword in complaint information of the group data is calculated using TF-IDF algorithm.
In operation S712C, the plurality of weights are ranked from large to small, n candidate keywords corresponding to the n weights ranked first are selected as complaint keywords of the group data, and n is a positive integer.
For example, operations S712A to S712C correspond to Step8 in fig. 3D, and firstly, the complaint information of the user is segmented by using the business basic information corpus to obtain a plurality of candidate keywords { t } 1 ,t 2 ,…,t n },t i Representing the i-th alternative keyword. The business basic information corpus contains words commonly used by all businesses of the company. The weight (t) of each candidate keyword is then calculated using the TF-IDF algorithm i ,S j ) This weight represents the alternative keyword t i For complaint information S j Is of importance. Finally, according to the descending order of the weights, selecting the front Top n alternative keywords as complaint keywords of the group partner data, and establishing a mapping relation between each group partner data and the corresponding complaint keywords as shown in Step9 in FIG. 3D. Extracting the behavior of the partner by using the complaint informationThe keywords provide a certain interpretation for the clustering result.
In some embodiments, after the above operation S712, the heterogeneous knowledge graph based group mining method may further include: and updating the n candidate keywords to a business basic information corpus so as to be used for extracting the next complaint keywords.
In some embodiments, at operation S630 above, the types of account nodes include a normal account and an abnormal account, where the abnormal account includes a complaint account, a case related account, and a freeze account as shown in table 1 above; the risk level includes, in order, a normal transaction scenario, a low risk, a medium risk, and a high risk.
Based on this, FIG. 8 schematically illustrates a flow chart of risk level assessment of partner data in accordance with an embodiment of the present disclosure.
As shown in fig. 3E and 8, in some embodiments, integrating the type, mapping relationship, and trust score of at least one account node of each of the group data in operation S630 described above, evaluates the risk level of each group data, including performing the following operations S831-S834 for any of the plurality of group data.
In operation S831, in the case that all account nodes in the group data are common accounts and all trust scores of the group data are greater than a preset threshold x, it is determined that the group data belongs to a normal transaction scenario.
In operation S832, in the case that all account nodes in the group data are common accounts and the number of nodes less than the preset threshold x in all trust scores of the group data exceeds the preset proportion P, it is determined that the group data belongs to a low risk.
In operation S833, if there is at least one abnormal account in the group data and the number of nodes less than the preset threshold x in all the trusted scores of the group data exceeds the preset proportion P, it is determined that the group data belongs to a high risk.
In operation S834, if there is at least one abnormal account in the group data and the number of nodes less than the preset threshold x in all the trusted scores of the group data does not exceed the preset proportion P, determining that the group data belongs to the middle risk.
Since each group data includes a plurality of nodes and a plurality of edges, each node includes an account node or a device node, each edge has a trust score. The risk level of each piece of partner data can be evaluated by integrating the type, the mapping relation and the credibility score of at least one account node of each piece of partner data, the comparison result of a preset threshold value x and a preset proportion P.
It should be noted that, the four operations S831 to S834 are four kinds of determination operations of different risk levels, which are parallel to each other, and there is no strict sequence.
Fig. 9 schematically illustrates a flow chart after risk level assessment according to an embodiment of the present disclosure.
As shown in fig. 9, in some embodiments, after the above operation S630, the heterogeneous knowledge graph based group mining method may further include operations S901 to S903.
In operation S901, a plurality of risk levels are ranked from large to small, and m pieces of partner data corresponding to m risk levels ranked at the top are selected, where m is a positive integer.
For example, rogue partners of Top m before the size and risk level are screened from the plurality of partner data.
In operation S902, m pieces of partner data are displayed together with a business keyword related to each of the m pieces of partner data.
In operation S903, device and account information related to m pieces of group data are derived from the knowledge-graph.
The operations S902 and S903 are two operations of post-processing m pieces of group data, which are juxtaposed with each other, and there is no strict order of precedence. Operations S901 to S903 may be provided to a manager to facilitate timely management and control.
Based on the above-mentioned heterogeneous knowledge graph-based group partner excavating method, the present disclosure also provides a heterogeneous knowledge graph-based group partner excavating device. The device will be described in detail with reference to fig. 10 to 11.
Fig. 10 schematically illustrates a block diagram of a heterogeneous knowledge-graph-based group mining apparatus, in accordance with an embodiment of the present disclosure.
As shown in fig. 10, the heterogeneous knowledge graph-based group mining apparatus 1000 of this embodiment includes an artificial feature acquisition module 1010, a knowledge graph construction module 1020, a potential feature vector generation module 1030, and a group mining module 1040.
The artificial feature obtaining module 1010 is configured to obtain real-time transaction data of a user in a plurality of service scenarios, and screen a plurality of artificial features from the real-time transaction data, where the artificial features characterize basic features of a plurality of login accounts and a plurality of login devices of the user. In an embodiment, the artificial feature obtaining module 1010 may be configured to perform the operation S210 described above, which is not described herein.
The knowledge graph construction module 1020 is configured to construct a knowledge graph based on the plurality of artificial features, where the knowledge graph includes a plurality of nodes and edges connecting different nodes, each node represents a login account or login device, and each edge represents a relationship between a login account and a login device. In an embodiment, the knowledge graph construction module 1020 may be used to perform the operation S220 described above, which is not described herein.
The potential feature vector generating module 1030 is configured to generate a potential feature vector for each node in the knowledge-graph based on the meta-path random walk strategy of the heterogram. In an embodiment, the potential feature vector generating module 1030 may be configured to perform the operation S230 described above, which is not described herein.
And the cluster mining module 1040 is configured to perform cluster computation on the plurality of potential feature vectors to obtain a plurality of cluster results, and use a plurality of nodes corresponding to each cluster result as one cluster data to form a plurality of cluster data. In an embodiment, the partner mining module 1040 may be configured to perform the operation S240 described above, which is not described herein.
Fig. 11 schematically illustrates a block diagram of a heterogeneous knowledge graph based group mining apparatus, in accordance with another embodiment of the present disclosure.
In some embodiments, the heterogeneous knowledge graph-based group mining apparatus 1100 not only includes the artificial feature acquisition module 1010, the knowledge graph construction module 1020, the potential feature vector generation module 1030, and the group mining module 1040 of the above embodiments, but also includes the mapping relationship establishment module 1110, the node similarity calculation module 1120, and the group risk rating module 1130.
The mapping relationship establishing module 1110 is configured to extract complaint keywords in each piece of group data, and establish a mapping relationship between each piece of group data and the corresponding complaint keywords. In an embodiment, the mapping relationship establishing module 1110 may be configured to perform the operation S610 described above, which is not described herein.
The node similarity calculation module 1120 is configured to calculate, according to the potential feature vectors, a similarity between each associated node in the knowledge graph, and use the similarity as a confidence score of a corresponding edge in the knowledge graph. In an embodiment, the node similarity calculation module 1120 may be configured to perform the operation S620 described above, which is not described herein.
And a partner risk rating module 1130 for integrating the type, mapping relation and credibility score of at least one account node of each partner data, and evaluating the risk level of each partner data. In an embodiment, the partner risk rating module 1130 may be configured to perform the operation S630 described above, which is not described herein.
According to an embodiment of the present disclosure, any of the above-described artificial feature acquisition module 1010, knowledge graph construction module 1020, potential feature vector generation module 1030, group mining module 1040, mapping relation establishment module 1110, node similarity calculation module 1120, and group risk rating module 1130 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the artificial feature acquisition module 1010, the knowledge graph construction module 1020, the potential feature vector generation module 1030, the group mining module 1040, the mapping relationship establishment module 1110, the node similarity calculation module 1120, and the group risk rating module 1130 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the artificial feature acquisition module 1010, the knowledge graph construction module 1020, the potential feature vector generation module 1030, the group mining module 1040, the mapping relation establishment module 1110, the node similarity calculation module 1120, and the group risk rating module 1130 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.
Fig. 12 schematically illustrates a block diagram of an electronic device adapted to implement a heterogeneous knowledge-graph based group mining method, in accordance with an embodiment of the disclosure.
As shown in fig. 12, an electronic device 1200 according to an embodiment of the present disclosure includes a processor 1201, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. The processor 1201 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 1201 may also include on-board memory for caching purposes. The processor 1201 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.
In the RAM 1203, various programs and data required for the operation of the electronic apparatus 1200 are stored. The processor 1201, the ROM 1202, and the RAM 1203 are connected to each other through a bus 1204. The processor 1201 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1202 and/or RAM 1203. Note that the program may be stored in one or more memories other than the ROM 1202 and the RAM 1203. The processor 1201 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the disclosure, the electronic device 1200 may also include an input/output (I/O) interface 1205, the input/output (I/O) interface 1205 also being connected to the bus 1204. The electronic device 1200 may also include one or more of the following components connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1208 including a hard disk or the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. The drive 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1210 so that a computer program read out therefrom is installed into the storage section 1208 as needed.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs that, when executed, implement a heterogeneous knowledge graph-based group mining method in accordance with an embodiment of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include the ROM 1202 and/or the RAM 1203 and/or one or more memories other than the ROM 1202 and the RAM 1203 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. When the computer program product runs in a computer system, the program code is used for enabling the computer system to realize the group mining method based on the heterogeneous knowledge graph provided by the embodiment of the disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1201. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program can also be transmitted, distributed over a network medium in the form of signals, and downloaded and installed via a communication portion 1209, and/or from a removable medium 1211. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1209, and/or installed from the removable media 1211. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1201. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (16)

1. A partner mining method based on heterogeneous knowledge graph includes:
acquiring real-time transaction data of a user in various service scenes, and screening a plurality of artificial features from the real-time transaction data, wherein the artificial features represent basic features of a plurality of login accounts and a plurality of login devices of the user;
Constructing a knowledge graph based on the plurality of artificial features, wherein the knowledge graph comprises a plurality of nodes and edges connected with different nodes, each node represents a login account or login equipment, and each edge represents a relationship between a login account and the login equipment;
generating potential feature vectors for each node in the knowledge graph based on a meta-path random walk strategy of the heterogram;
and clustering the potential feature vectors to obtain a plurality of clustering results, and taking a plurality of nodes corresponding to each clustering result as one piece of partner data to form a plurality of pieces of partner data.
2. The method of claim 1, wherein the acquiring real-time transaction data of the user under the plurality of business scenarios comprises, for each of the plurality of business scenarios, performing the following:
establishing a risk rule according to the service scene, wherein the risk rule characterizes a risk rule possibly occurring in the service scene;
collecting real-time transaction data hitting the risk rule in the service scene by utilizing a BDSP batch processing program;
and generating the format data from the collected real-time transaction data.
3. The method of claim 1, wherein the iso-pattern based meta-path random walk strategy further comprises, prior to generating potential feature vectors for each node in the knowledge-graph:
And (3) the association ratio of the login account and login equipment in the knowledge graph is 1: and pruning the part 1.
4. The method of claim 1, wherein the isopattern-based meta-path random walk strategy generates potential feature vectors for nodes in the knowledge-graph, comprising:
constructing a heterogeneous knowledge graph G= (V, E) according to the relation between a login account and login equipment, wherein V represents a node set, the node set comprises account nodes and equipment nodes, and E represents an edge set of two adjacent nodes;
the meta-path walk rule is defined in advance: every two adjacent nodes in the travelling path are nodes of different types, and the meta-path meets the symmetry principle;
according to the meta-path migration rule, a migration selection path is carried out on the nodes in the heterogeneous knowledge graph, and the probability that any current node is transferred to any candidate node in a plurality of next candidate nodes is calculated;
and determining a next target node of the current node from the plurality of next candidate nodes according to an optimization target with maximized local probability and a negative sampling strategy, and determining potential feature vectors of the current node according to the next target node.
5. The method of claim 4, wherein the probability of any current node transitioning to any candidate node of a plurality of next candidate nodes is calculated according to the following principle:
determining that the probability of the current node and any candidate node is 0 under the condition that any candidate node of the current node does not accord with the meta-path migration rule or the current node and any candidate node are not adjacent;
and if any candidate node of the current node accords with the meta-path migration rule and the current node is adjacent to any candidate node, determining the probability of the current node and the any candidate node as the reciprocal of the total number of all next candidate nodes meeting the condition.
6. The method of claim 1, wherein the method further comprises:
extracting complaint keywords in each piece of the group data, and establishing a mapping relation between each piece of the group data and the corresponding complaint keywords;
according to the potential feature vector, calculating the similarity between each associated node in the knowledge graph, and taking the similarity as a credible score of a corresponding side in the knowledge graph;
And evaluating the risk level of each piece of group data by integrating the type of at least one account node of each piece of group data, the mapping relation and the credibility score.
7. The method of claim 6, wherein the extracting complaint keywords in each of the partner data comprises:
screening at least one piece of partner data with complaint information from the plurality of pieces of partner data;
for each of the at least one partner data,
the complaint information of the partner data is segmented by utilizing a business basic information corpus to obtain a plurality of alternative keywords;
calculating the weight of each candidate keyword in the complaint information of the partner data by using a TF-IDF algorithm;
and sequencing the weights from large to small, selecting n candidate keywords corresponding to the n weights ranked at the front as the complaint keywords of the group data, wherein n is a positive integer.
8. The method of claim 7, wherein the method further comprises:
and updating the n candidate keywords to the business basic information corpus so as to be used for extracting the next complaint keywords.
9. The method of claim 6, wherein the account node types include a normal account and an abnormal account, the abnormal account including a complaint account, a case related account, and a frozen account;
the risk level sequentially comprises a normal transaction scene, a low risk, a medium risk and a high risk.
10. The method of claim 9, wherein the integrating the type of at least one account node of each of the partner data, the mapping relationship, and the trust score evaluates a risk level of each of the partner data, including, for any of the plurality of partner data:
under the condition that all account nodes in the group partner data are the common accounts and all the credibility scores of the group partner data are larger than a preset threshold value, determining that the group partner data belong to a normal transaction scene;
under the condition that all account nodes in the group partner data are the common accounts and the number of nodes smaller than the preset threshold value in the credible scores of all the group partner data exceeds a preset proportion, determining that the group partner data belong to low risk;
determining that the group data belongs to high risk when at least one abnormal account exists in the group data and the number of nodes smaller than the preset threshold value in the credible scores of all the group data exceeds the preset proportion;
And under the condition that at least one abnormal account exists in the group partner data, and the number of nodes smaller than the preset threshold value in all the credible scores of the group partner data does not exceed the preset proportion, determining that the group partner data belongs to medium risk.
11. The method of claim 6, wherein the method further comprises:
sorting the risk levels from large to small, and selecting m pieces of partner data corresponding to m risk levels ranked at the front, wherein m is a positive integer;
displaying the m pieces of partner data and business keywords related to each piece of partner data in the m pieces of partner data;
and leading out the equipment and account information related to the m pieces of group data from the knowledge graph.
12. A partner mining device based on heterogeneous knowledge graph, comprising:
the system comprises an artificial feature acquisition module, a real-time transaction module and a storage module, wherein the artificial feature acquisition module is used for acquiring real-time transaction data of a user in various business scenes, and screening a plurality of artificial features from the real-time transaction data, wherein the artificial features characterize basic features of a plurality of login accounts and a plurality of login devices of the user;
the knowledge graph construction module is used for constructing a knowledge graph based on the plurality of artificial features, the knowledge graph comprises a plurality of nodes and edges connected with different nodes, each node represents a login account or login equipment, and each edge represents a relationship between a login account and the login equipment;
The potential feature vector generation module is used for generating potential feature vectors for all nodes in the knowledge graph based on the meta-path random walk strategy of the heterogram;
and the cluster mining module is used for carrying out cluster calculation on the plurality of potential feature vectors to obtain a plurality of cluster results, and taking a plurality of nodes corresponding to each cluster result as cluster data to form a plurality of cluster data.
13. The apparatus of claim 12, wherein the apparatus further comprises:
the mapping relation establishing module is used for extracting complaint keywords in the partner data and establishing a mapping relation between the partner data and the corresponding complaint keywords;
the node similarity calculation module is used for calculating the similarity between each associated node in the knowledge graph according to the potential feature vector, and taking the similarity as the credibility score of the corresponding side in the knowledge graph;
and the partner risk rating module is used for integrating the type of at least one account node of each partner data, the mapping relation and the credibility score and evaluating the risk level of each partner data.
14. An electronic device, comprising:
One or more processors;
storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-11.
15. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-11.
16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 11.
CN202311480333.4A 2023-11-08 2023-11-08 Method, device, equipment and medium for group partner mining based on heterogeneous knowledge graph Pending CN117474678A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311480333.4A CN117474678A (en) 2023-11-08 2023-11-08 Method, device, equipment and medium for group partner mining based on heterogeneous knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311480333.4A CN117474678A (en) 2023-11-08 2023-11-08 Method, device, equipment and medium for group partner mining based on heterogeneous knowledge graph

Publications (1)

Publication Number Publication Date
CN117474678A true CN117474678A (en) 2024-01-30

Family

ID=89639319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311480333.4A Pending CN117474678A (en) 2023-11-08 2023-11-08 Method, device, equipment and medium for group partner mining based on heterogeneous knowledge graph

Country Status (1)

Country Link
CN (1) CN117474678A (en)

Similar Documents

Publication Publication Date Title
AU2016259200B2 (en) Gating decision system and methods for determining whether to allow material implications to result from online activities
CN112288455B (en) Label generation method and device, computer readable storage medium and electronic equipment
Guliyev et al. Customer churn analysis in banking sector: Evidence from explainable machine learning model
CN112417176B (en) Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics
Yao et al. Service recommendation for mashup composition with implicit correlation regularization
Song et al. A method of intrusion detection based on WOA‐XGBoost algorithm
CN112231570B (en) Recommendation system support attack detection method, device, equipment and storage medium
CN111931069B (en) User interest determination method and device and computer equipment
US20210397669A1 (en) Clustering web page addresses for website analysis
Thakkar et al. Clairvoyant: AdaBoost with Cost‐Enabled Cost‐Sensitive Classifier for Customer Churn Prediction
Xu et al. A hybrid interpretable credit card users default prediction model based on RIPPER
Wang et al. An unsupervised strategy for defending against multifarious reputation attacks
CN117437020A (en) Merchant risk judging method and device, electronic equipment and medium
Oprea et al. Feature engineering solution with structured query language analytic functions in detecting electricity frauds using machine learning
CN116664306A (en) Intelligent recommendation method and device for wind control rules, electronic equipment and medium
Ding et al. Automobile Insurance Fraud Detection Based on PSO-XGBoost Model and Interpretable Machine Learning Method
Wang Research on bank marketing behavior based on machine learning
CN117474678A (en) Method, device, equipment and medium for group partner mining based on heterogeneous knowledge graph
CN116308615A (en) Product recommendation method and device, electronic equipment and storage medium
Zhao et al. Detecting fake reviews via dynamic multimode network
CN114897564A (en) Target customer recommendation method and device, electronic equipment and storage medium
Prakash et al. ATM card fraud detection system using machine learning techniques
Jia et al. Recommendation model based on mobile commerce in cloud computing
CN113052512A (en) Risk prediction method and device and electronic equipment
Wang et al. Sales growth rate forecasting using improved PSO and SVM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination