CN114880522A

CN114880522A - Method and device for realizing ID Mapping based on graph database

Info

Publication number: CN114880522A
Application number: CN202210303694.0A
Authority: CN
Inventors: 朱贺贺; 李锐佳; 周建宏; 朱文俊; 文朝
Original assignee: Bangdao Technology Co ltd
Current assignee: Bangdao Technology Co ltd
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2022-08-09
Anticipated expiration: 2042-03-24
Also published as: CN114880522B

Abstract

The present invention provides a method and device for implementing ID Mapping based on a graph database. The method includes: obtaining the relationship between an ID node appearing on the T-th day and an ID node appearing on the T-day from a source ID data record; The ID node of the ID node, the ID node relationship that occurs on the T day, and the ID relationship network corresponding to the T-1 day are identified and connected, and the first ID relationship network corresponding to the T day is obtained; According to the activity of the ID node in the first ID relationship network The degree and the activity of the ID node relationship are used to clean up the first ID relationship network, and obtain the second ID relationship network corresponding to the T-th day. The present invention cleans up expired IDs by cleaning the ID nodes whose activity is lower than the threshold, and realizes the cleaning of the weak association of ID nodes by disconnecting the ID node relationship whose activity is lower than the threshold, thereby improving the reliability of the user ID relationship network. Accuracy and stability.

Description

Method and device for implementing ID Mapping based on graph database

技术领域technical field

本发明涉及大数据技术领域，尤其涉及一种基于图数据库实现ID Mapping的方法及装置。The invention relates to the technical field of big data, and in particular, to a method and device for implementing ID Mapping based on a graph database.

背景技术Background technique

在现实生活中，用户可以通过多种多样的设备，从各种各样的入口，获得企业提供的服务；企业也可以发展多种业务线，形成多种产品，而且能够从不同的渠道为用户提供服务，因此导致同一用户的数据来自不同的数据源，数据种类繁杂而且分散在各个位置。In real life, users can obtain services provided by enterprises through a variety of devices and from various portals; enterprises can also develop a variety of business lines, form a variety of products, and can provide users from different channels. Provide services, so the data of the same user comes from different data sources, and the data is diverse and scattered in various locations.

大数据平台在物理上解决了数据分散在各个位置的“数据孤岛”问题，但是在逻辑上，多个不同来源的数据之间难于建立关联，数据仍处于割裂的状态，这就导致只能从单个或很少的数据构建用户的一个片面画像，相当于“盲人摸象”，难以提供一个用户的完整信息。The big data platform physically solves the problem of "data islands" in which data is scattered in various locations, but logically, it is difficult to establish associations between data from different sources, and the data is still fragmented. Constructing a one-sided portrait of a user with a single or very little data is equivalent to "blind man touching an elephant", and it is difficult to provide complete information of a user.

多种不同来源的数据的一个共同之处就是来自于同一个用户，而通常数据中记录了标识用户身份的信息，这里统称为“用户标识(Identity document,ID)”。用户ID就是代表一个用户实体的一串序列号，例如身份证号、手机号、邮箱、微信号、设备号、Cookie ID及介质访问控制(Media Access Contro,MAC)地址等。因此建立用户ID之间的关系就能够在数据之间建立联系。而其中，构建用户ID之间的关系的过程，就是ID Mapping的主要过程。A common feature of data from different sources is that they come from the same user, and usually the data records information identifying the user's identity, which is collectively referred to as an "identity document (ID)" here. A user ID is a series of serial numbers representing a user entity, such as ID number, mobile phone number, email, WeChat ID, device number, Cookie ID, and Media Access Control (MAC) address. Therefore, establishing the relationship between the user IDs can establish a connection between the data. Among them, the process of building the relationship between user IDs is the main process of ID Mapping.

通俗来说，ID Mapping就是通过各种技术手段将多个不同来源数据中的用户ID识别为同一主体并生成标识用户唯一身份的统一身份标识(统一身份标识即one-ID)。根据处理过程中所使用数据结构，ID Mapping实现方式大致可以概括为3类：字典方式、表方式和图方式，但是这3种方式中当前的具体实现方法都主要聚焦于构建ID关系，未重视ID过期与复用问题及ID间复杂关系问题的解决，而这两个问题却严重影响构建的用户ID关系网的可靠性、准确性和稳定性。其中，ID过期和复用问题是由用户ID的生命周期和精度不同产生的，例如通常一人一生只有一个身份证号，而手机号、邮箱及设备号等会较常发生改变；ID间复杂关系问题是现实复杂场景导致的，例如同设备多账号、同账号多设备、多账号、多数据源及异常数据等。Generally speaking, ID Mapping is to use various technical means to identify user IDs in multiple data from different sources as the same subject and generate a unified identity identifier (one-ID) that identifies the unique identity of the user. According to the data structure used in the processing process, the ID Mapping implementation methods can be roughly summarized into three categories: dictionary method, table method and graph method, but the current specific implementation methods in these three methods mainly focus on building ID relationships and do not pay attention to them. The problem of ID expiration and reuse and the solution of the complex relationship between IDs seriously affect the reliability, accuracy and stability of the constructed user ID relationship network. Among them, the problem of ID expiration and reuse is caused by the different life cycle and precision of user IDs. For example, a person usually has only one ID number in his life, while the mobile phone number, email address and device number will change frequently; the complex relationship between IDs The problem is caused by complex real-world scenarios, such as multiple accounts on the same device, multiple devices with the same account, multiple accounts, multiple data sources, and abnormal data.

发明内容SUMMARY OF THE INVENTION

本发明提供一种基于图数据库实现ID Mapping的方法及装置，用以解决现有技术中用户ID关系网的可靠性不高、准确性不高和稳定性不高的缺陷，实现对ID过期、ID复用以及ID复杂关系的有效处理，提高用户ID关系网的可靠性、准确性和稳定性。The invention provides a method and device for realizing ID Mapping based on a graph database, which is used to solve the defects of low reliability, low accuracy and low stability of the user ID relation network in the prior art, and realizes the detection of ID expiration, ID reuse and effective processing of complex ID relationships improve the reliability, accuracy and stability of the user ID relationship network.

本发明提供一种基于图数据库实现ID Mapping的方法，包括：The present invention provides a method for implementing ID Mapping based on a graph database, comprising:

从源ID数据记录中获取第T日出现的ID节点和所述第T日出现的ID节点关系；From the source ID data record, obtain the relationship between the ID node that occurs on the T-th day and the ID node that occurs on the T-th day;

对所述第T日出现的ID节点、所述第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网；The ID node that occurs on the T day, the ID node relationship that occurs on the T day, and the ID relationship network corresponding to the T-1 day are identified and connected, and the first ID relationship network corresponding to the T day is obtained;

根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理，获取第T日对应的第二ID关系网。The first ID relationship network is cleaned up according to the activity degree of the ID nodes in the first ID relationship network and the activity degree of the ID node relationship, and the second ID relationship network corresponding to the T-th day is obtained.

可选地，所述根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理之前，还包括：Optionally, before cleaning the first ID relationship network according to the activity of the ID nodes and the activity of the ID node relationship in the first ID relationship network, the method further includes:

分别对所述ID节点的属性和所述ID节点关系的属性进行更新；respectively update the attribute of the ID node and the attribute of the ID node relationship;

分别对所述ID节点更新后的属性和所述ID节点关系更新后的属性进行特征提取，获取所述ID节点的特征值和所述ID节点关系的特征值；Perform feature extraction on the updated attribute of the ID node and the updated attribute of the ID node relationship, respectively, to obtain the feature value of the ID node and the feature value of the ID node relationship;

根据所述ID节点的特征值和所述ID节点的特征值对应的权重，获取所述ID节点的活跃度；Obtain the activity of the ID node according to the feature value of the ID node and the weight corresponding to the feature value of the ID node;

根据所述ID节点关系的特征值和所述ID节点关系的特征值对应的权重，获取所述ID节点关系的活跃度。The activity of the ID node relationship is acquired according to the feature value of the ID node relationship and the weight corresponding to the feature value of the ID node relationship.

可选地，根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理，获取第T日对应的第二ID关系网，包括：Optionally, clean up the first ID relationship network according to the activity of the ID nodes in the first ID relationship network and the activity degree of the ID node relationship, and obtain the second ID relationship network corresponding to the T-th day, including:

在所述第一ID关系网中ID节点的活跃度小于节点活跃度阈值的情况下，将所述ID节点清理出所述第一ID关系网；In the case that the activity of the ID node in the first ID relationship network is less than the node activity threshold, clearing the ID node out of the first ID relationship network;

在所述第一ID关系网中ID节点关系的活跃度小于关系活跃度阈值的情况下，将所述ID节点关系清理出所述第一ID关系网；In the case that the activity of the ID node relationship in the first ID relationship network is less than the relationship activity threshold, clearing the ID node relationship out of the first ID relationship network;

根据清理后的所述第一ID关系网，获取第T日对应的第二ID关系网。According to the cleaned first ID relationship network, the second ID relationship network corresponding to the T-th day is acquired.

可选地，所述根据清理后的所述第一ID关系网，获取第T日对应的第二ID关系网，包括：Optionally, obtaining the second ID relationship network corresponding to the T-th day according to the cleaned up first ID relationship network, including:

在所述ID节点或所述ID节点关系的清理未导致所述第一ID关系网中的关系子网分裂的情况下，所述第二ID关系网中的关系子网的统一身份标识为所述第一ID关系网中的关系子网的统一身份标识；In the case that the clearing of the ID node or the relationship between the ID nodes does not result in the splitting of the relationship subnet in the first ID relationship network, the unified identity of the relationship subnet in the second ID relationship network is the Describe the unified identity of the relationship subnet in the first ID relationship network;

在所述ID节点或所述ID节点关系的清理导致所述第一ID关系网中的关系子网分裂为多个关系子网的情况下，所述第二ID关系网中所述多个关系子网中的一个关系子网的统一身份标识为所述第一ID关系网中的关系子网的统一身份标识，所述多个关系子网中的其他关系子网的统一身份标识为新生成的统一身份标识。In the case where the cleanup of the ID node or the relationship of the ID node causes the relationship subnet in the first ID relationship network to be split into multiple relationship subnets, the multiple relationships in the second ID relationship network The unified identity of a relational subnet in the subnet is the unified identity of the relational subnet in the first ID relational network, and the unified identity of other relational subnets in the plurality of relational subnets is newly generated. unified identity.

可选地，所述根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理，获取第T日对应的第二ID关系网之后，还包括：Optionally, the first ID relationship network is cleaned up according to the activity of the ID nodes in the first ID relationship network and the activity of the ID node relationship, and after obtaining the second ID relationship network corresponding to the T-th day. ,Also includes:

从源ID数据记录中获取预设时间内未活跃的ID节点和未活跃的ID节点关系；Obtain the relationship between inactive ID nodes and inactive ID nodes within a preset time from the source ID data record;

对所述未活跃的ID节点的活跃度和所述未活跃的ID节点关系的活跃度进行更新；updating the activity of the inactive ID node and the activity of the inactive ID node relationship;

根据所述未活跃的ID节点更新后的活跃度和所述未活跃的ID节点关系更新后的活跃度对所述第二ID关系网进行清理，获取第T日对应的第三ID关系网。The second ID relationship network is cleaned up according to the updated activity degree of the inactive ID node and the updated activity degree of the inactive ID node relationship, and the third ID relationship network corresponding to the T-th day is obtained.

可选地，所述对所述第T日出现的ID节点、所述第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网之前，还包括：Optionally, the ID node that appears on the T day, the ID node relationship that appears on the T day, and the ID relationship network corresponding to the T-1 day are identified and connected, and the first corresponding to the T day is obtained. Before the ID network, it also includes:

根据所述第T日出现的ID节点和第T-1日对应的标识映射字典，获取在第T-1日存在的统一身份标识；According to the ID node that appears on the T day and the identity mapping dictionary corresponding to the T-1 day, obtain the unified identity that exists on the T-1 day;

根据所述在第T-1日存在的统一身份标识，获取所述第T-1日对应的ID关系网。Obtain the ID relationship network corresponding to the T-1th day according to the unified identity identifier that exists on the T-1th day.

可选地，所述对所述第T日出现的ID节点、所述第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网之后，还包括：Optionally, the ID node that appears on the T day, the ID node relationship that appears on the T day, and the ID relationship network corresponding to the T-1 day are identified and connected, and the first corresponding to the T day is obtained. After the ID relationship network, it also includes:

在所述第一ID关系网中的关系子网存在一个统一身份标识的情况下，所述关系子网的统一身份标识为存在的所述统一身份标识；In the case that the relationship subnet in the first ID relationship network has a unified identity, the unified identity of the relationship subnet is the existing unified identity;

在所述第一ID关系网中的关系子网存在一个以上统一身份标识的情况下，所述关系子网的统一身份标识为创建时间最早和合并或拆分次数最多的统一身份标识；In the case that the relationship subnet in the first ID relationship network has more than one unified identity, the unified identity of the relationship subnet is the unified identity with the earliest creation time and the most times of merging or splitting;

在所述第一ID关系网中的关系子网不存在统一身份标识的情况下，所述关系子网的统一身份标识为新生成的统一身份标识。In the case where the relationship subnet in the first ID relationship network does not have a unified identity, the unified identity of the relationship subnet is a newly generated unified identity.

本发明还提供一种基于图数据库实现ID Mapping的装置，包括：The present invention also provides a device for implementing ID Mapping based on a graph database, including:

第一获取模块，用于从源ID数据记录中获取第T日出现的ID节点和所述第T日出现的ID节点关系；The first acquisition module is used to obtain the ID node relationship that occurs on the T day and the ID node relationship that occurs on the T day from the source ID data record;

第二获取模块，用于对所述第T日出现的ID节点、所述第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网；The second acquisition module is used to identify and connect the ID nodes that appear on the T-th day, the ID node relationships that appear on the T-th day, and the ID relationship network corresponding to the T-1 day, and obtain the ID node corresponding to the T-th day. an ID network;

第三获取模块，用于根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理，获取第T日对应的第二ID关系网。The third acquisition module is used to clean up the first ID relationship network according to the activity of the ID nodes in the first ID relationship network and the activity degree of the ID node relationship, and obtain the second ID relationship network corresponding to the T day .

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述中的任一项所述基于图数据库实现ID Mapping的方法。The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the program, the processor implements the above-mentioned based on any one of the above A method for implementing ID Mapping in a graph database.

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述中的任一项所述基于图数据库实现ID Mapping的方法。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method for implementing ID Mapping based on a graph database according to any one of the above.

本发明还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述中的任一项所述基于图数据库实现ID Mapping的方法。The present invention also provides a computer program product, including a computer program, which, when executed by a processor, implements the method for implementing ID Mapping based on a graph database according to any one of the above.

本发明提供的基于图数据库实现ID Mapping的方法及装置，通过清理活跃度低于阈值的ID节点，实现了清理过期ID，解决了ID过期的问题，通过断开活跃度低于阈值的ID节点关系，实现对ID节点之间弱关联关系的清理，解决了ID复用和ID复杂关系的问题，从而提高了用户ID关系网的可靠性、准确性和稳定性。The method and device for implementing ID Mapping based on a graph database provided by the present invention realizes the clearing of expired IDs by clearing the ID nodes whose activity is lower than the threshold, and solves the problem of ID expiration. By disconnecting the ID nodes whose activity is lower than the threshold It realizes the cleaning of weak associations between ID nodes, and solves the problems of ID reuse and complex ID relationships, thereby improving the reliability, accuracy and stability of the user ID relationship network.

附图说明Description of drawings

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作以简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings required in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are the For some embodiments of the invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1是本发明提供的基于图数据库实现ID Mapping的方法的流程示意图之一；1 is one of the schematic flow charts of the method for implementing ID Mapping based on a graph database provided by the present invention;

图2是本发明提供的基于图数据库实现ID Mapping的方法的流程示意图之二；2 is the second schematic flowchart of the method for implementing ID Mapping based on a graph database provided by the present invention;

图3是本发明提供的基于图数据库实现ID Mapping的装置的结构示意图；3 is a schematic structural diagram of a device for implementing ID Mapping based on a graph database provided by the present invention;

图4是本发明提供的电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided by the present invention.

具体实施方式Detailed ways

ID Mapping的字典方式是3类方法中最简单的，在ID Mapping的字典方式中，用关键字(key)表示源数据中的ID，用值(value)表示生成的统一one-ID，其主要流程是判断提取的ID是否在key中，若key中存在提取的ID，则使用已存在的one-ID；若key中不存在提取的ID，则创建新的one-ID。ID联系可能会随着ID的增加而产生，因此还需要合并字典，即合并已存在ID关系但未建立ID间联系的ID。The dictionary method of ID Mapping is the simplest among the three types of methods. In the dictionary method of ID Mapping, the ID in the source data is represented by the keyword (key), and the generated unified one-ID is represented by the value (value). The process is to determine whether the extracted ID is in the key. If the extracted ID exists in the key, the existing one-ID is used; if the extracted ID does not exist in the key, a new one-ID is created. The ID connection may be generated as the ID increases, so it is also necessary to merge the dictionary, that is, merge the IDs that already have an ID relationship but have not established an inter-ID connection.

ID Mapping的字典方式的优点是原理简单，易于实现，处理效率高；缺点是不能解决的ID过期和复用问题及ID复杂关系问题，例如同设备多用户时多个用户会被合并为一个用户。The advantage of the dictionary method of ID Mapping is that it is simple in principle, easy to implement, and has high processing efficiency; the disadvantage is that it cannot solve the problem of ID expiration and reuse and the complex relationship between IDs. For example, when there are multiple users on the same device, multiple users will be merged into one user. .

ID最普遍的组织方式是表记录的形式，一条记录包含同时出现的多个ID，同时包含ID关系，因此，表方式是3类方式中提出方法最多的。其简要流程：以记录为单位，合并具有相同ID的记录的形成统一one-ID，而关键点是降低合并记录的错误率。The most common way of organizing IDs is in the form of table records. One record contains multiple IDs that appear at the same time, and also contains ID relationships. Therefore, the table method is the most proposed method among the three types of methods. Its brief process: taking records as a unit, merging records with the same ID to form a unified one-ID, and the key point is to reduce the error rate of merged records.

ID Mapping的表方式的优点在于原理和实现比较简单，合并时能够降低错误率；其缺点在于数据量较大时处理速度较慢，独立处理单个ID和ID关系比较难，仍然未解决ID过期和复用问题及ID复杂关系问题，例如手机号转让带来的问题。The advantage of the ID Mapping table method is that the principle and implementation are relatively simple, and the error rate can be reduced when merging; the disadvantage is that the processing speed is slow when the amount of data is large, and it is difficult to independently process a single ID and ID relationship. Multiplexing problems and complex ID relationship problems, such as problems caused by mobile phone number transfer.

ID关系的本质就是网络结构，因而，图方式是3类方式中最直接的，而且随着图数据库的成熟，图方式的方案得到更广泛地应用。其主要是通过将记录中的ID构建为图，而且通常会在构建图过程中设置边阈值过滤掉ID弱关联关系；然后通过最大连通子图算法获得图中的所有连通子图，并为每个子图生成唯一的one-ID，其中一个子图就代表一个用户ID关系网；该方式通过设置用户行为规则(例如，设置一个用户在预设时间内可以拥有某个类型ID的阈值数)和ID优先级(例如设置身份证号的优先级最高)处理ID复杂关系问题。The essence of the ID relationship is the network structure. Therefore, the graph method is the most direct among the three methods, and with the maturity of the graph database, the graph method is more widely used. It mainly builds the ID in the record into a graph, and usually sets the edge threshold in the process of constructing the graph to filter out the weak ID relationship; Each subgraph generates a unique one-ID, and one of the subgraphs represents a user ID relationship network; this method sets the user behavior rules (for example, sets the threshold number of a certain type of ID that a user can have within a preset time) and ID priority (for example, setting the ID number with the highest priority) deals with the complex relationship between IDs.

ID Mapping的图方式的优点是图结构表示ID关系直观易理解，易于处理单个ID和ID关系；其缺点是缺少有效处理ID过期和复用的问题的方法，ID复杂关系问题虽然提出了处理规则，但是太过简单。The advantage of the graph method of ID Mapping is that the graph structure represents the ID relationship, which is intuitive and easy to understand, and it is easy to handle a single ID and ID relationship; its disadvantage is that it lacks an effective method to deal with the problem of ID expiration and reuse. , but too simplistic.

为了对ID过期和复用问题，以及ID复杂关系问题进行有效处理，提高用户ID及ID关系网的可靠性、准确性和稳定性，本发明通过清理活跃度低于阈值的ID节点，实现了清理过期ID，解决了ID过期的问题，通过断开活跃度低于阈值的ID节点关系，实现对ID节点之间弱关联关系的清理，解决了ID复用和ID复杂关系的问题，从而提高了用户ID及ID关系网的可靠性、准确性和稳定性。In order to effectively deal with the ID expiration and reuse problems and the ID complex relationship problem, and improve the reliability, accuracy and stability of the user ID and ID relationship network, the present invention realizes the realization of Cleaning up expired IDs solves the problem of ID expiration. By disconnecting the ID node relationship whose activity is lower than the threshold, the weak association relationship between ID nodes is cleaned up, and the problem of ID reuse and ID complex relationship is solved, thereby improving the The reliability, accuracy and stability of user ID and ID relationship network are improved.

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

图1是本发明提供的基于图数据库实现ID Mapping的方法的流程示意图之一，如图1所示，本发明提供一种基于图数据库实现ID Mapping的方法，该方法包括：1 is one of the schematic flow charts of the method for implementing ID Mapping based on a graph database provided by the present invention. As shown in FIG. 1 , the present invention provides a method for implementing ID Mapping based on a graph database. The method includes:

步骤101，从源ID数据记录中获取第T日出现的ID节点和所述第T日出现的ID节点关系。Step 101: Acquire the relationship between the ID nodes that appear on the T-th day and the ID nodes that appear on the T-th day from the source ID data record.

具体地，图2是本发明提供的基于图数据库实现ID Mapping的方法的流程示意图之二，源数据中包含各种ID数据记录，将每条ID数据记录中的ID和ID间的关系写入图数据库，并初始化ID节点的属性和ID节点关系的属性。表1为ID节点的属性表，表2为ID节点关系的属性表。Specifically, FIG. 2 is the second schematic flowchart of the method for implementing ID Mapping based on a graph database provided by the present invention. The source data includes various ID data records, and the relationship between the ID and the ID in each ID data record is written into The graph database, and initializes the properties of the ID node and the properties of the ID node relationship. Table 1 is the attribute table of the ID node, and Table 2 is the attribute table of the ID node relationship.

表1ID节点的属性表Table 1. Attribute table of ID node

由表1可以看出，ID节点的属性包括节点值、节点类型、节点第1次记录日期、节点最近活跃的日期、节点活跃天数、节点每次出现距第1次记录的天数的列表、节点度值、节点活跃度，其中，节点优先级和节点活跃度阈值与节点类型紧密相关，两个节点类型相同的ID节点具有相同的节点优先级，两个节点类型相同的ID节点具有相同的节点活跃度阈值。As can be seen from Table 1, the attributes of the ID node include the node value, node type, the first record date of the node, the date when the node was most recently active, the number of days the node was active, the list of days from the first record of each node occurrence, the node Degree value, node activity, where the node priority and node activity threshold are closely related to the node type, two nodes with the same ID of the node type have the same node priority, and two nodes with the same ID of the node type have the same node Activeness threshold.

表2ID节点关系的属性表Table 2 Attribute table of ID node relationship

由表2可以看出，ID节点关系的属性包括关系描述、关系类型、关系节点第1次记录日期、关系最近活跃的日期、关系活跃的天数、关系每次出现距第1次记录的天数的列表、关系活跃度、关系优先级和关系活跃度阈值，其中，关系优先级和关系活跃度阈值与关系类型密切相关，两个关系类型相同的ID节点关系具有相同的关系优先级，两个关系类型相同的ID节点关系具有相同的关系活跃度阈值。As can be seen from Table 2, the attributes of the ID node relationship include the relationship description, the relationship type, the first record date of the relationship node, the date when the relationship was most recently active, the number of days the relationship was active, and the number of days from the first record between each occurrence of the relationship. List, relationship liveness, relationship priority, and relationship liveness threshold, where relationship priority and relationship liveness threshold are closely related to the relationship type, two ID node relationships with the same relationship type have the same relationship priority, and two relationships ID node relationships of the same type have the same relationship liveness threshold.

在初始化时，除节点优先级、关系优先级、节点度值外，其他的ID节点的属性和ID节点关系的属性全部取1。During initialization, except for the node priority, relationship priority, and node degree value, the attributes of other ID nodes and the attributes of the ID node relationship are all set to 1.

将源数据中包含的ID和ID间的关系写入图数据库之后，从图数据库中获取第T日出现的ID节点和第T日出现的ID节点关系。After writing the ID and the relationship between the IDs contained in the source data into the graph database, obtain the relationship between the ID node that appears on the T day and the ID node that appears on the T day from the graph database.

例如，第T日出现的ID节点有设备号dev_001、手机号150***、身份证号430***、账号ccc、身份证号560***。第T日出现的ID节点关系有设备号dev_001与手机号150***之间的关系，身份证号430***与账号ccc之间的关系。For example, the ID nodes that appear on the T day have the device number dev_001, the mobile phone number 150***, the ID number 430***, the account number ccc, and the ID number 560***. The ID node relationship that appears on the T day includes the relationship between the device number dev_001 and the mobile phone number 150***, and the relationship between the ID number 430*** and the account ccc.

步骤102，对所述第T日出现的ID节点、所述第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网。Step 102, identify and connect the ID nodes that appear on the T day, the ID node relationships that appear on the T day, and the ID relationship network corresponding to the T-1 day, and obtain the first ID relationship network corresponding to the T day. .

具体地，在进行标识连通获取第T日对应的第一ID关系网之前，还需要先获取第T-1日对应的ID关系网。Specifically, before the identification connection is performed to obtain the first ID relationship network corresponding to the T-th day, the ID relationship network corresponding to the T-1th day needs to be obtained first.

可选地，对第T日出现的ID节点、第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网之前，还包括：Optionally, identify and connect the ID node that occurs on the T day, the ID node relationship that occurs on the T day, and the ID relationship network corresponding to the T-1 day, and before acquiring the first ID relationship network corresponding to the T day, also include:

根据第T日出现的ID节点和第T-1日对应的标识映射字典，获取在第T-1日存在的统一身份标识；Obtain the unified identity that exists on the T-1 day according to the ID node that appears on the T day and the identity mapping dictionary corresponding to the T-1 day;

根据在第T-1日存在的统一身份标识，获取第T-1日对应的ID关系网。Obtain the ID relationship network corresponding to the T-1 day according to the unified identity that exists on the T-1 day.

具体地，统一身份标识即one-ID，统一身份标识是one-ID的中文翻译。标识映射字典中包含one-ID、每一个one-ID对应的ID节点和ID节点关系。Specifically, the unified identity is one-ID, and the unified identity is the Chinese translation of one-ID. The identification mapping dictionary contains one-ID, the ID node corresponding to each one-ID, and the ID node relationship.

将第T日出现的ID节点与第T-1日对应的标识映射字典中包含的ID节点进行比对，查找到既在第T日出现的ID节点也包含在第T-1日对应的标识映射字典中的ID节点，根据查找到的ID节点在第T-1日对应的标识映射字典中的对应关系，获取查找到的ID节点对应的one-ID，即第T-1日存在的one-ID。在获取第T-1日存在的one-ID之后，从图数据库中查找第T-1日存在的one-ID对应的ID节点和ID节点关系，从而获取了第T-1日对应的ID关系网。Compare the ID node that appears on the T-th day with the ID node included in the ID mapping dictionary corresponding to the T-1 day, and find the ID node that appears on the T-th day and also contains the ID corresponding to the T-1 day. For the ID node in the mapping dictionary, according to the corresponding relationship in the ID mapping dictionary corresponding to the found ID node on the T-1 day, the one-ID corresponding to the found ID node is obtained, that is, the one that exists on the T-1 day. -ID. After obtaining the one-ID existing on the T-1 day, the ID node and the ID node relationship corresponding to the one-ID existing on the T-1 day are searched from the graph database, so as to obtain the ID relationship corresponding to the T-1 day. network.

例如，第T日出现的节点手机号150***在第T-1日对应的标识映射字典中也存在，在第T-1日对应的标识映射字典中，手机号150***对应的one-ID为one-ID01，再根据one-ID01在图数据库中查找与其有对应关系的ID节点和ID节点关系，查找到与one-ID01关联的身份证号320***、账号aaa、手机号150***，以及身份证号320***与手机号150***之间的关系，身份证号320***与账号aaa之间的关系。For example, the node mobile phone number 150*** that appeared on the T day also exists in the identity mapping dictionary corresponding to the T-1 day. In the identity mapping dictionary corresponding to the T-1 day, the mobile phone number 150*** corresponds to one-ID is one-ID01, and then according to one-ID01, find the ID node and ID node relationship that has a corresponding relationship with it in the graph database, and find the ID number 320***, account aaa, mobile phone associated with one-ID01 Number 150***, as well as the relationship between ID number 320*** and mobile phone number 150***, and the relationship between ID number 320*** and account aaa.

通过先利用第T日出现的ID节点和第T-1日对应的标识映射字典，获取第T-1日存在的one-ID，再在根据在图数据库中查找与第T-1日存在的one-ID对应的ID节点和ID节点关系，从而获取了第T-1日对应的ID关系网，为后续利用第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网奠定了基础。By first using the ID node that appears on the T day and the identity mapping dictionary corresponding to the T-1 day, the one-ID existing on the T-1 day is obtained, and then the one-ID existing on the T-1 day is searched according to the graph database. The relationship between the ID node and the ID node corresponding to the one-ID is obtained, and the ID relationship network corresponding to the T-1th day is obtained. In order to use the ID relationship network corresponding to the T-1th day for subsequent identification and connection, the corresponding ID relationship network of the T-th day is obtained. An ID relationship network laid the foundation.

在获取第T日对应的第一ID关系网之后，利用最大连通子图算法对第T日出现的ID节点、第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，也就是将第T日出现的ID节点和ID节点关系连接到第T-1日对应的ID关系网，从而获取第T日对应的第一ID关系网。After obtaining the first ID relationship network corresponding to the T day, the maximum connected subgraph algorithm is used to identify the ID nodes that appear on the T day, the ID node relationships that appear on the T day, and the ID relationship network corresponding to the T-1 day. Connectivity, that is, connecting the ID node and the ID node relationship appearing on the T day to the ID relation network corresponding to the T-1 day, thereby obtaining the first ID relation network corresponding to the T day.

在获取第一ID关系网之后，需要对第一ID关系网中关系子网的one-ID进行确定。After the first ID relationship network is acquired, the one-ID of the relationship subnet in the first ID relationship network needs to be determined.

可选地，对第T日出现的ID节点、第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网之后，还包括：Optionally, identify and connect the ID node that occurs on the T day, the ID node relationship that occurs on the T day, and the ID relationship network corresponding to the T-1 day, and after obtaining the first ID relationship network corresponding to the T day, also include:

在第一ID关系网中的关系子网存在一个统一身份标识的情况下，关系子网的统一身份标识为存在的统一身份标识；In the case that the relationship subnet in the first ID relationship network has a unified identity, the unified identity of the relationship subnet is the existing unified identity;

在第一ID关系网中的关系子网存在一个以上统一身份标识的情况下，关系子网的统一身份标识为创建时间最早和合并或拆分次数最多的统一身份标识；In the case that more than one unified identity identifier exists in the relational subnet in the first ID relational network, the unified identity of the relational subnet is the one with the earliest creation time and the most times of merging or splitting;

在第一ID关系网中的关系子网不存在统一身份标识的情况下，关系子网的统一身份标识为新生成的统一身份标识。In the case where the relationship subnet in the first ID relationship network does not have a unified identity, the unified identity of the relationship subnet is a newly generated unified identity.

具体地，表3是one-ID的属性表，one-ID的属性包括节点值、节点生成时间、合并或拆分时间和合并或拆分次数。ID节点与one-ID间关系属性有关系建立时间，关系建立时间指ID节点和one-ID建立关系的时间。Specifically, Table 3 is an attribute table of one-ID, and the attributes of one-ID include node value, node generation time, merge or split time, and merge or split times. The relationship attribute between the ID node and the one-ID has the relationship establishment time, and the relationship establishment time refers to the time when the ID node and the one-ID establish the relationship.

表3one-ID的属性表Table 3 attribute table of one-ID

编号Numbering 属性Attributes 描述describe 11 节点值node value 关系子网的唯一标识值Unique identification value for the relational subnet 22 节点生成时间Node generation time 创建one-ID节点的时间Time to create one-ID node 33 合并或拆分时间Merge or split time 合并或拆分关系子网时更新的时间字段Time field updated when merging or splitting relational subnets 44 合并或拆分次数Number of merges or splits 合并和拆分的次数Number of merges and splits

在第一ID关系网中的关系子网存在一个one-ID的情况下，关系子网的one-ID为存在的one-ID。In the case that there is one one-ID in the relational subnet in the first ID relational network, the one-ID of the relational subnet is the existing one-ID.

在第一ID关系网中的关系子网存在多个one-ID的情况下，从多个one-ID中选择创建时间最早和合并或拆分次数最多的one-ID作为关系子网的one-ID。When there are multiple one-IDs in the relational subnet in the first ID relational network, the one-ID with the earliest creation time and the most times of merging or splitting is selected from the multiple one-IDs as the one-ID of the relational subnet. ID.

若多个one-ID的创建时间和合并或拆分次数都相同，则随机从多个one-ID中挑选一个作为关系子网的one-ID。If the creation time and the number of times of merging or splitting are the same for multiple one-IDs, one of the multiple one-IDs is randomly selected as the one-ID of the relational subnet.

在第一ID关系网中的关系子网存在不存在one-ID的情况下，根据关系子网中的ID节点和ID节点关系生成一个新的one-ID，将新生成的one-ID作为关系子网的one-ID。In the case where there is no one-ID in the relationship subnet in the first ID relationship network, a new one-ID is generated according to the relationship between the ID node and the ID node in the relationship subnet, and the newly generated one-ID is used as the relationship The one-ID of the subnet.

明确了第一ID关系网中关系子网的one-ID的确定，one-ID打通了ID之间的关联，one-ID的确定有利于根据one-ID获取与其相关的全部ID节点和全部ID节点关系。The determination of the one-ID of the relationship subnet in the first ID relationship network is clarified, the one-ID has opened up the association between the IDs, and the determination of the one-ID is conducive to obtaining all ID nodes and all IDs related to it according to the one-ID. Node relationship.

步骤103，根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理，获取第T日对应的第二ID关系网。Step 103: Clean up the first ID relationship network according to the activity of the ID nodes in the first ID relationship network and the activity of the ID node relationship, and obtain the second ID relationship network corresponding to the T-th day.

具体地，将第一ID关系网中节点活跃度低于节点活跃度阈值的ID节点从第一ID关系网中清理出去；将第一ID关系网中关系活跃度低于关系活跃度阈值的ID节点关系从第一ID关系网中清理出去。根据清理之后的第一ID关系网，获取第T日对应的第二ID关系网。Specifically, the ID nodes whose node activity is lower than the node activity threshold in the first ID relation network are removed from the first ID relation network; the ID nodes whose relation activity is lower than the relation activity threshold in the first ID relation network are removed from the first ID relation network. The node relationship is cleared from the first ID relationship network. Obtain the second ID relationship network corresponding to the T-th day according to the cleaned first ID relationship network.

可选地，根据第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对第一ID关系网进行清理之前，还包括：Optionally, before cleaning the first ID relationship network according to the activity of the ID nodes and the activity of the ID node relationship in the first ID relationship network, the method further includes:

分别对ID节点的属性和ID节点关系的属性进行更新；Update the attributes of the ID node and the attributes of the ID node relationship respectively;

分别对ID节点更新后的属性和ID节点关系更新后的属性进行特征提取，获取ID节点的特征值和ID节点关系的特征值；Perform feature extraction on the updated attribute of the ID node and the updated attribute of the ID node relationship, and obtain the feature value of the ID node and the feature value of the ID node relationship;

根据ID节点的特征值和ID节点的特征值对应的权重，获取ID节点的活跃度；Obtain the activity of the ID node according to the eigenvalue of the ID node and the weight corresponding to the eigenvalue of the ID node;

根据ID节点关系的特征值和ID节点关系的特征值对应的权重，获取ID节点关系的活跃度。The activity of the ID node relationship is obtained according to the eigenvalues of the ID node relationship and the weights corresponding to the eigenvalues of the ID node relationship.

具体地，第一ID关系网中ID节点和ID节点关系存在两类，一类是第T-1日未出现且第T日出现的ID节点和ID节点关系；另一类是第T-1日出现且第T日未出现的ID节点和ID节点关系。Specifically, there are two types of relationships between ID nodes and ID nodes in the first ID relationship network. One is the relationship between the ID node and the ID node that does not appear on the T-1th day and appears on the T-th day; the other is the relationship between the ID node and the ID node on the T-1th day. The ID node and ID node relationship that appear on day T and do not appear on day T.

对于第T-1日未出现且第T日出现的ID节点和ID节点关系，更新表1中的所有ID节点的属性，更新表2中的所有的ID节点关系的属性。For the ID nodes and ID node relationships that do not appear on the T-1th day and appear on the T day, the attributes of all ID nodes in Table 1 are updated, and the attributes of all ID node relationships in Table 2 are updated.

对于第T-1日出现且第T日未出现的ID节点和ID节点关系，仅更新ID节点的节点活跃度属性，仅更新ID节点关系的关系活跃度属性，其他属性不进行更新。For ID nodes and ID node relationships that appear on day T-1 and do not appear on day T, only the node liveness attribute of the ID node is updated, only the relationship liveness attribute of the ID node relationship is updated, and other attributes are not updated.

从ID节点更新后的属性和ID节点关系更新后的属性中提取用于评价ID节点的节点活跃度和评价ID节点关系的关系活跃度的特征。表4是ID节点和ID节点关系的特征提取处理表，表4中s表示进行归一化特征处理之前的特征值。Features for evaluating the node liveness of the ID node and evaluating the relationship liveness of the ID node relationship are extracted from the updated attribute of the ID node and the updated attribute of the ID node relationship. Table 4 is a feature extraction processing table of ID node and ID node relationship, and s in Table 4 represents the feature value before normalized feature processing.

表4 ID节点和ID节点关系的特征提取处理表Table 4 Feature extraction processing table of ID node and ID node relationship

由表4可以看出，用于评价ID节点的节点活跃度的特征有节点优先级、活跃天数占第1次记录距当前处理日期的天数的比例、平均活跃间隔天数、最近活跃日期距当前处理日期的天数、活跃间隔天数的标准差、间隔天数的规律性以及节点度值。As can be seen from Table 4, the features used to evaluate the node activity of ID nodes include node priority, the ratio of active days to the number of days from the first record to the current processing date, the average active interval days, and the distance from the latest active date to the current processing date. The number of days of the date, the standard deviation of the active interval days, the regularity of the interval days, and the node degree value.

用于评价ID节点关系的关系活跃度的特征有活跃天数占第1次记录距当前处理日期的天数的比例、平均活跃间隔天数、最近活跃日期距当前处理日期的天数、关系的两节点最近活跃日期相差天数、活跃间隔天数的标准差、间隔天数的规律性以及关系优先级。The features used to evaluate the relationship activity of the ID node relationship are the ratio of the number of active days to the number of days from the first record to the current processing date, the average number of days between active days, the number of days from the most recent active date to the current processing date, and the two nodes of the relationship are most recently active. The number of days apart between dates, the standard deviation of the active interval days, the regularity of the interval days, and the relationship priority.

对提取的特征进行归一化处理，从而获取ID节点的特征值和ID节点关系的特征值。The extracted features are normalized to obtain the feature value of the ID node and the feature value of the ID node relationship.

根据ID节点的特征值和ID节点的特征值对应的权重，获取ID节点的活跃度。The activity of the ID node is obtained according to the eigenvalue of the ID node and the weight corresponding to the eigenvalue of the ID node.

ID节点的活跃度的表达式如下所示：The expression for the liveness of the ID node is as follows:

式中，X表示ID节点的活跃度，x表示进行归一化处理后的特征值，x的下标对应表4中的编号，α表示每个节点特征值对应的权重，α₁是x₂对应的权重，α₂是x₃对应的权重，α₃是x₄对应的权重，α₄是x₅对应的权重，α₅是x₇对应的权重，α₆是x₈对应的权重，权重通过层次分析法由专家打分得到。In the formula, X represents the activity of the ID node, x represents the eigenvalue after normalization processing, the subscript of x corresponds to the number in Table 4, α represents the weight corresponding to the eigenvalue of each node, α ₁ is x ₂ Corresponding weight, α ₂ is the weight corresponding to x ₃ , α ₃ is the weight corresponding to x ₄ , α ₄ is the weight corresponding to x ₅ , α ₅ is the weight corresponding to x ₇ , α ₆ is the weight corresponding to x ₈ , weight Scored by experts through the analytic hierarchy process.

ID节点关系的活跃度的表达式如下所示：The expression for the liveness of the ID node relationship is as follows:

式中，Y表示ID节点关系的活跃度，x表示进行归一化处理后的特征值，x的下标对应表4中的编号，β表示每个关系特征值对应的权重，β₁是x₂对应的权重，β₂是x₃对应的权重，β₃是x₄和x₅合并对应的权重，β₄是x₆对应的权重，β₅是x₇对应的权重，权重通过层次分析法由专家打分得到。In the formula, Y represents the activity of the ID node relationship, x represents the eigenvalue after normalization processing, the subscript of x corresponds to the number in Table 4, β represents the weight corresponding to each relationship eigenvalue, β ₁ is x ₂ corresponds to the weight, β ₂ is the weight corresponding to x ₃ , β ₃ is the weight corresponding to the combination of x ₄ and x ₅ , β ₄ is the weight corresponding to x ₆ , β ₅ is the weight corresponding to x ₇ , and the weights pass the AHP Scored by experts.

通过对ID节点和ID节点关系的属性进行更新，并依据更新后的属性提取特征，对特征进行归一化处理，最后根据特征值和特征值对应的权重获取活跃度，明确了活跃度的计算方法，进一步有利用活跃度进行ID节点和ID节点关系清理。By updating the attributes of the ID node and the relationship between the ID nodes, extracting features according to the updated attributes, normalizing the features, and finally obtaining the activity according to the eigenvalue and the weight corresponding to the eigenvalue, the calculation of the activity is clarified The method further uses the activity to clean up the ID node and the ID node relationship.

可选地，根据第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对第一ID关系网进行清理，获取第T日对应的第二ID关系网，包括：Optionally, clean up the first ID relationship network according to the activity of the ID nodes in the first ID relationship network and the activity degree of the ID node relationship, and obtain the second ID relationship network corresponding to the T-th day, including:

在第一ID关系网中ID节点的活跃度小于节点活跃度阈值的情况下，将ID节点清理出第一ID关系网；In the case that the activity of the ID node in the first ID relationship network is less than the node activity threshold, clear the ID node out of the first ID relationship network;

在第一ID关系网中ID节点关系的活跃度小于关系活跃度阈值的情况下，将ID节点关系清理出第一ID关系网；When the activity of the ID node relationship in the first ID relationship network is less than the relationship activity threshold, clear the ID node relationship out of the first ID relationship network;

根据清理后的第一ID关系网，获取第T日对应的第二ID关系网。Obtain the second ID relationship network corresponding to the T-th day according to the cleaned first ID relationship network.

具体地，在获取ID节点的活跃度之后，将ID节点的活跃度与节点活跃度阈值进行比较，对活跃度小于节点活跃度阈值的ID节点进行清理。Specifically, after the activity of the ID node is obtained, the activity of the ID node is compared with the node activity threshold, and the ID nodes whose activity is less than the node activity threshold are cleaned up.

在获取ID节点关系的活跃度之后，将ID节点关系的活跃度与关系活跃度阈值进行比较，对活跃度小于关系活跃度阈值的ID节点关系进行清理。After obtaining the activity of the ID node relationship, compare the activity of the ID node relationship with the relationship activity threshold, and clean up the ID node relationship whose activity is less than the relationship activity threshold.

根据清理之后的第一ID关系网，获取第T日对应的第二ID关系网。Obtain the second ID relationship network corresponding to the T-th day according to the cleaned first ID relationship network.

通过清理活跃度低于阈值的ID节点，实现对过期ID的清理，通过断开活跃度低于阈值的ID节点关系，实现对ID节点之间弱关联关系的清理，提高了用户ID及ID关系网的可靠性、准确性和稳定性。By cleaning up ID nodes whose activity is lower than the threshold, the expired ID can be cleaned up. By disconnecting the ID node relationship whose activity is lower than the threshold, the weak relationship between ID nodes can be cleaned up, and the relationship between user IDs and IDs can be improved. reliability, accuracy and stability of the network.

可选地，根据清理后的第一ID关系网，获取第T日对应的第二ID关系网，包括：Optionally, according to the cleaned up first ID relationship network, obtain the second ID relationship network corresponding to the T-th day, including:

在ID节点或ID节点关系的清理未导致第一ID关系网中的关系子网分裂的情况下，第二ID关系网中的关系子网的统一身份标识为第一ID关系网中的关系子网的统一身份标识；In the case that the clearing of the ID node or ID node relationship does not cause the split of the relationship subnet in the first ID relationship network, the unified identity of the relationship subnet in the second ID relationship network is the relationship child in the first ID relationship network. The unified identity of the network;

在ID节点或ID节点关系的清理导致第一ID关系网中的关系子网分裂为多个关系子网的情况下，所述第二ID关系网中所述多个关系子网中的一个关系子网的统一身份标识为所述第一ID关系网中的关系子网的统一身份标识，所述多个关系子网中的其他关系子网的统一身份标识为新生成的统一身份标识。In the case where the cleanup of the ID node or ID node relationship causes the relationship subnet in the first ID relationship network to be split into multiple relationship subnets, a relationship in the multiple relationship subnets in the second ID relationship network The unified identity of the subnet is the unified identity of the relationship subnet in the first ID relationship network, and the unified identity of other relationship subnets in the multiple relationship subnets is the newly generated unified identity.

具体地，对ID节点和ID节点关系进行清理时，需要考虑是否会生成新的关系子网，也就是需要考虑进行清理时，关系子网会不会发生分裂，还需要考虑分裂之后，分裂后的关系子网的one-ID的确定。Specifically, when cleaning up the relationship between ID nodes and ID nodes, it is necessary to consider whether a new relational subnet will be generated, that is, whether the relational subnet will be split when cleaning is performed, and whether after splitting, after splitting The one-ID of the relational subnet is determined.

在ID节点或ID节点关系的清理未导致第一ID关系网中的关系子网分裂的情况下，则关系子网的one-ID保持不变，第二ID关系网中的关系子网的one-ID为第一ID关系网中的关系子网的one-ID。In the case where the cleanup of the ID node or ID node relationship does not cause the split of the relationship subnet in the first ID relationship network, the one-ID of the relationship subnet remains unchanged, and the one-ID of the relationship subnet in the second ID relationship network -ID is the one-ID of the relationship subnet in the first ID relationship network.

例如，在第一ID关系网中，关系子网1的one-ID为one-ID1，从关系子网1中清理了一些ID节点和一些ID节点关系，清理之后的关系子网1没有发生分裂，清理之后的关系子网1为第二ID关系网中的关系子网2，关系子网2的one-ID仍为one-ID1。For example, in the first ID relationship network, the one-ID of the relationship subnet 1 is one-ID1, some ID nodes and some ID node relationships are cleaned up from the relationship subnet 1, and the relationship subnet 1 after cleaning is not split. , the relationship subnet 1 after cleaning is the relationship subnet 2 in the second ID relationship network, and the one-ID of the relationship subnet 2 is still one-ID1.

在ID节点或ID节点关系的清理导致第一ID关系网中的关系子网分裂为多个关系子网的情况下，根据预设规则从多个关系子网中挑选符合预设规则的一个关系子网继承第一ID关系网中的关系子网的one-ID，其他关系子网的one-ID为根据自身的ID节点和ID节点关系生成的新的one-ID。In the case that the cleaning of the ID node or ID node relationship causes the relationship subnet in the first ID relationship network to be split into multiple relationship subnets, select a relationship that conforms to the preset rule from the multiple relationship subnets according to a preset rule The subnet inherits the one-IDs of the relational subnets in the first ID relational network, and the one-IDs of other relational subnets are new one-IDs generated according to their own ID node and the ID node relationship.

预设规则可以是依次按“ID节点活跃度最大、ID节点活跃天数最多、ID节点数量最多以及ID节点最近活跃日期”的挑选顺序从多个关系子网中挑选一个关系子网。The preset rule may be to select a relationship subnet from multiple relationship subnets in the order of "the most active ID node, the most active days of the ID node, the largest number of ID nodes, and the most recent active date of the ID node".

在多个关系子网分别对应的“ID节点活跃度最大、ID节点活跃天数最多、ID节点数量最多以及ID节点最近活跃日期”均相同的情况下，从多个关系子网中随机挑选一个关系子网。Select a relationship randomly from multiple relationship subnets under the condition that "the most active ID node, the most active days of the ID node, the largest number of ID nodes, and the most recent active date of the ID node" corresponding to multiple relationship subnets are all the same. subnet.

例如，在第一ID关系网中，关系子网A的one-ID为one-IDA，从关系子网A中清理了一些ID节点和一些ID节点关系，清理之后的关系子网A发生分裂，清理之后的关系子网A为第二ID关系网中的关系子网B、关系子网C和关系子网D。For example, in the first ID relationship network, the one-ID of the relationship subnet A is one-IDA, some ID nodes and some ID node relationships are cleaned up from the relationship subnet A, and the cleaned relationship subnet A splits. The cleaned-up relational subnet A is the relational subnet B, the relational subnet C, and the relational subnet D in the second ID relational network.

比较关系子网B、关系子网C和关系子网D分别对应的ID节点活跃度最大，若关系子网B对应的ID节点活跃度最大是三个关系子网中最大的，则关系子网B的one-ID为关系子网A的one-IDA，关系子网C和关系子网D根据自身的ID节点和ID节点关系生成的新的one-ID。Compare the ID nodes corresponding to relation subnet B, relation subnet C, and relation subnet D with the highest activity. If the ID node activity corresponding to relation subnet B is the largest among the three relation subnets, then the relation subnet The one-ID of B is the one-IDA of the relational subnet A, and the relational subnet C and the relational subnet D generate new one-IDs based on their own ID nodes and the relationship between the ID nodes.

若关系子网B、关系子网C和关系子网D分别对应的ID节点活跃度最大均相同时，再比较关系子网B、关系子网C和关系子网D分别对应的ID节点活跃天数最多，关系子网B对应的ID节点活跃天数最多为15天，关系子网C对应的ID节点活跃天数最多为25天，关系子网D对应的ID节点活跃天数最多为12天，则关系子网C的one-ID为关系子网A的one-IDA，关系子网B和关系子网D根据自身的ID节点和ID节点关系生成的新的one-ID。If the activity of ID nodes corresponding to relation subnet B, relation subnet C and relation subnet D are the same at maximum, then compare the active days of ID nodes corresponding to relation subnet B, relation subnet C and relation subnet D respectively. At most, the number of active days of the ID node corresponding to the relationship subnet B is at most 15 days, the number of active days of the ID node corresponding to the relationship subnet C is at most 25 days, and the number of active days of the ID node corresponding to the relationship subnet D is at most 12 days. The one-ID of the network C is the one-IDA of the relational subnet A, and the relational subnet B and the relational subnet D generate new one-IDs according to their own ID nodes and the relationship between the ID nodes.

若关系子网B、关系子网C和关系子网D分别对应的“ID节点活跃度最大、ID节点活跃天数最多、ID节点数量最多以及ID节点最近活跃日期”均相同的情况下，从关系子网B、关系子网C和关系子网D中随机挑选一个关系子网继承关系子网A的one-IDA，其他两个关系子网根据自身的ID节点和ID节点关系生成的新的one-ID。If relationship subnet B, relationship subnet C, and relationship subnet D have the same "ID node activity is the largest, ID node has the most active days, the number of ID nodes is the largest, and the ID node's latest active date" are the same, the relationship Subnet B, relational subnet C and relational subnet D randomly select a relational subnet to inherit the one-IDA of relational subnet A, and the other two relational subnets generate a new one based on their own ID node and ID node relationship -ID.

将获取的第二ID关系网更新到图数据库中，对于已经存在one-ID的关系子网，将更新属性和活跃度后的关系子网更新到图数据库。Update the acquired second ID relational network to the graph database, and update the relational subnet with the updated attributes and activity to the graph database for the relational subnet that already exists one-ID.

根据ID节点和ID节点关系的清理是否导致ID关系子网分裂，对ID关系子网的one-ID进行重新确定，提高了ID关系网的可靠性、准确性和稳定性。According to whether the cleaning of the ID node and the relationship between the ID nodes leads to the split of the ID relationship subnet, the one-ID of the ID relationship subnet is re-determined, which improves the reliability, accuracy and stability of the ID relationship network.

可选地，根据第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对第一ID关系网进行清理，获取第T日对应的第二ID关系网之后，还包括：Optionally, clean up the first ID relationship network according to the activity of the ID nodes in the first ID relationship network and the activity degree of the ID node relationship, and after acquiring the second ID relationship network corresponding to the T day, it also includes:

对未活跃的ID节点的活跃度和未活跃的ID节点关系的活跃度进行更新；Update the activity of inactive ID nodes and the activity of inactive ID node relationships;

根据未活跃的ID节点更新后的活跃度和未活跃的ID节点关系更新后的活跃度对第二ID关系网进行清理，获取第T日对应的第三ID关系网。The second ID relationship network is cleaned up according to the updated activity degree of the inactive ID node and the updated activity degree of the inactive ID node relationship, and the third ID relationship network corresponding to the T-th day is obtained.

具体地，前面仅对第T日的ID节点、第T日的ID节点关系和第T-1日的ID关系网进行了筛选，而对于预设时间内未活跃的ID节点和未活跃的ID节点关系未进行考虑。Specifically, only the ID nodes of the T-th day, the ID node relationships of the T-th day, and the ID relationship network of the T-1th day have been screened, while the inactive ID nodes and inactive IDs within the preset time have been screened. Node relationships are not considered.

从源ID数据记录构建的图数据库中，获取预设时间内未活跃的ID节点和未活跃的ID节点关系。预设时间可以根据ID节点的类型和ID节点关系的类型进行合理设置。From the graph database constructed by the source ID data records, obtain the inactive ID nodes and the inactive ID node relationships within the preset time. The preset time can be reasonably set according to the type of the ID node and the type of the relationship between the ID nodes.

利用未活跃的ID节点的属性对该未活跃的ID节点的活跃度进行更新，ID节点的活跃度的计算方法同前文一样，在此不再赘述。The activity of the inactive ID node is updated by using the attribute of the inactive ID node. The calculation method of the active degree of the ID node is the same as the previous one, and will not be repeated here.

利用未活跃的ID节点关系的属性对该未活跃的ID节点关系的活跃度进行更新，ID节点关系的活跃度的计算方法同前文一样，在此不再赘述。The activeness of the inactive ID node relationship is updated by using the attribute of the inactive ID node relationship. The calculation method of the activeness of the ID node relationship is the same as the previous one, and will not be repeated here.

将未活跃的ID节点更新后的活跃度与节点活跃度阈值进行比较，对活跃度小于节点活跃度阈值的未活跃的ID节点进行清理。Compare the updated liveness of the inactive ID nodes with the node liveness threshold, and clean up the inactive ID nodes whose liveness is less than the node liveness threshold.

将未活跃的ID节点关系更新后的活跃度与关系活跃度阈值进行比较，对活跃度小于关系活跃度阈值的未活跃的ID节点关系进行清理。Compare the updated liveness of inactive ID node relationships with the relationship liveness threshold, and clean up inactive ID node relationships whose liveness is less than the relationship liveness threshold.

根据清理了未活跃的ID节点和未活跃的ID节点关系的第二ID关系网，获取了第T日对应的第三ID关系网。According to the second ID relationship network in which the relationship between the inactive ID nodes and the inactive ID nodes is cleaned up, the third ID relationship network corresponding to the T-th day is obtained.

对未活跃的ID节点和未活跃的ID节点关系进行清理之后，仍需要考虑是否会生成新的关系子网，仍需要考虑新的关系子网的one-ID的确定。After cleaning the relationship between the inactive ID nodes and the inactive ID nodes, it is still necessary to consider whether a new relationship subnet will be generated, and it is still necessary to consider the determination of the one-ID of the new relationship subnet.

在未活跃的ID节点或未活跃的ID节点关系的清理未导致第二ID关系网中的关系子网分裂的情况下，则关系子网的one-ID保持不变，第三ID关系网中的关系子网的one-ID为第二ID关系网中的关系子网的one-ID。In the case where the cleanup of the inactive ID node or the inactive ID node relationship does not result in the splitting of the relationship subnet in the second ID relationship network, the one-ID of the relationship subnet remains unchanged, and the third ID relationship network in the The one-ID of the relationship subnet is the one-ID of the relationship subnet in the second ID relationship network.

在未活跃的ID节点或未活跃的ID节点关系的清理导致第二ID关系网中的关系子网分裂为多个关系子网的情况下，根据预设规则从多个关系子网中挑选符合预设规则的一个关系子网继承第二ID关系网中的关系子网的one-ID，其他关系子网的one-ID为根据自身的ID节点和ID节点关系生成的新的one-ID。将第三ID关系网更新到图数据库中，对于ID节点全部删除的ID关系子网，从图数据库中进行删除。In the case that the cleanup of inactive ID nodes or inactive ID node relationships causes the relationship subnet in the second ID relationship network to be split into multiple relationship subnets, select from the multiple relationship subnets according to preset rules. One relational subnet of the preset rule inherits the one-ID of the relational subnet in the second ID relational network, and the one-IDs of other relational subnets are new one-IDs generated according to their own ID node and the ID node relationship. The third ID relation network is updated into the graph database, and the ID relation subnets whose ID nodes are all deleted are deleted from the graph database.

根据更新的图数据库生成第T日对应的标识映射字典。Generate an identity mapping dictionary corresponding to the T-th day according to the updated graph database.

通过对预设时间内未活跃的ID节点和ID节点关系进活跃度更新，再根据更新后的活跃度对ID关系网进行清理，提高了ID关系网的可靠性、准确性和稳定性。The reliability, accuracy and stability of the ID relationship network are improved by updating the activity level of ID nodes and ID node relationships that are not active within a preset time, and then cleaning up the ID relationship network according to the updated activity level.

本发明提供的基于图数据库实现ID Mapping的方法，通过清理活跃度低于阈值的ID节点，实现了清理过期ID，解决了ID过期的问题，通过断开活跃度低于阈值的ID节点关系，实现对ID节点之间弱关联关系的清理，解决了ID复用和ID复杂关系的问题，从而提高了用户ID关系网的可靠性、准确性和稳定性The method for implementing ID Mapping based on a graph database provided by the present invention realizes the clearing of expired IDs by clearing the ID nodes whose activity is lower than the threshold, and solves the problem of ID expiration. By disconnecting the ID nodes whose activity is lower than the threshold, Realize the cleaning of weak relationship between ID nodes, solve the problem of ID reuse and ID complex relationship, thereby improving the reliability, accuracy and stability of the user ID relationship network

下面对本发明提供的基于图数据库实现ID Mapping的装置进行描述，下文描述的基于图数据库实现ID Mapping的装置与上文描述的基于图数据库实现ID Mapping的方法可相互对应参照。The apparatus for implementing ID Mapping based on a graph database provided by the present invention is described below. The apparatus for implementing ID Mapping based on a graph database described below and the method for implementing ID Mapping based on a graph database described above may refer to each other correspondingly.

图3是本发明提供的基于图数据库实现ID Mapping的装置的结构示意图，如图3所示，本发明还提供一种基于图数据库实现ID Mapping的装置，包括：第一获取模块301、第二获取模块302和第三获取模块303，其中：FIG. 3 is a schematic structural diagram of an apparatus for implementing ID Mapping based on a graph database provided by the present invention. As shown in FIG. 3 , the present invention also provides an apparatus for implementing ID Mapping based on a graph database, including: a first acquisition module 301, a second The acquisition module 302 and the third acquisition module 303, wherein:

第一获取模块301用于从源ID数据记录中获取第T日出现的ID节点和所述第T日出现的ID节点关系；The first obtaining module 301 is used to obtain the ID node relationship that occurs on the T-th day and the ID node relationship that occurs on the T-th day from the source ID data record;

第二获取模块302用于对所述第T日出现的ID节点、所述第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网；The second obtaining module 302 is configured to identify and connect the ID nodes that appear on the T-th day, the ID node relationships that appear on the T-th day, and the ID relation network corresponding to the T-1 day, to obtain the ID node corresponding to the T-th day. an ID network;

第三获取模块303用于根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理，获取第T日对应的第二ID关系网。The third obtaining module 303 is configured to clean up the first ID relationship network according to the activity of the ID nodes in the first ID relationship network and the activity of the ID node relationship, and obtain the second ID relationship network corresponding to the T day .

具体来说，本申请实施例提供的基于图数据库实现ID Mapping的装置，能够实现上述方法实施例所实现的所有方法步骤，且能够达到相同的技术效果，在此不再对本实施例中与方法实施例相同的部分及有益效果进行具体赘述。Specifically, the apparatus for implementing ID Mapping based on a graph database provided in the embodiment of the present application can implement all the method steps implemented by the above method embodiments, and can achieve the same technical effect. The same parts and beneficial effects of the embodiments are described in detail.

图4是本发明提供的电子设备的结构示意图，如图4所示，该电子设备可以包括：处理器(processor)410、通信接口(Communications Interface)420、存储器(memory)430和通信总线440，其中，处理器410，通信接口420，存储器430通过通信总线440完成相互间的通信。处理器410可以调用存储器430中的逻辑指令，以执行基于图数据库实现ID Mapping的方法，该方法包括：从源ID数据记录中获取第T日出现的ID节点和所述第T日出现的ID节点关系；对所述第T日出现的ID节点、所述第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网；根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理，获取第T日对应的第二ID关系网。FIG. 4 is a schematic structural diagram of an electronic device provided by the present invention. As shown in FIG. 4 , the electronic device may include: a processor (processor) 410, a communication interface (Communications Interface) 420, a memory (memory) 430 and a communication bus 440, The processor 410 , the communication interface 420 , and the memory 430 communicate with each other through the communication bus 440 . The processor 410 may invoke logic instructions in the memory 430 to execute a method for implementing ID Mapping based on a graph database, the method comprising: obtaining an ID node that occurs on the T-th day and an ID that occurs on the T-th day from a source ID data record Node relationship; identify and connect the ID node that occurs on the T day, the ID node relationship that occurs on the T day, and the ID relationship network corresponding to the T-1 day, and obtain the first ID relationship network corresponding to the T day. ; Clean up the first ID relationship network according to the activity degree of the ID node in the first ID relationship network and the activity degree of the ID node relationship, and obtain the second ID relationship network corresponding to the Tth day.

此外，上述的存储器430中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in the memory 430 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各方法所提供的基于图数据库实现ID Mapping的方法，该方法包括：从源ID数据记录中获取第T日出现的ID节点和所述第T日出现的ID节点关系；对所述第T日出现的ID节点、所述第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网；根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理，获取第T日对应的第二ID关系网。In another aspect, the present invention also provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, the computer can Executing the method for implementing ID Mapping based on a graph database provided by the above methods, the method includes: obtaining the ID node appearing on the T-th day and the ID node relationship appearing on the T-th day from the source ID data record; The ID node that appears on the T day, the ID node relationship that appears on the T day, and the ID relationship network corresponding to the T-1 day are identified and connected, and the first ID relationship network corresponding to the T day is obtained; according to the first ID The activity of the ID node and the activity of the relationship between the ID nodes in the relationship network are used to clean up the first ID relationship network, and obtain the second ID relationship network corresponding to the T-th day.

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的基于图数据库实现IDMapping的方法，该方法包括：从源ID数据记录中获取第T日出现的ID节点和所述第T日出现的ID节点关系；对所述第T日出现的ID节点、所述第T日出现的ID节点关系以及第T-1日对应的ID关系网进行标识连通，获取第T日对应的第一ID关系网；根据所述第一ID关系网中ID节点的活跃度和ID节点关系的活跃度对所述第一ID关系网进行清理，获取第T日对应的第二ID关系网。In another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, the computer program is implemented by a processor to execute the method for implementing IDMapping based on a graph database provided by the above methods, The method includes: obtaining the relationship between the ID node appearing on the T-th day and the ID node appearing on the T-th day from the source ID data record; The relationship and the ID relationship network corresponding to the T-1th day are identified and connected, and the first ID relationship network corresponding to the Tth day is obtained; The first ID relationship network is cleaned up, and the second ID relationship network corresponding to the T-th day is obtained.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

本申请实施例中术语“第一”、“第二”等是用于区别类似的对象，而不用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换，以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施，且“第一”、“第二”所区别的对象通常为一类，并不限定对象的个数，例如第一对象可以是一个，也可以是多个。The terms "first", "second", etc. in the embodiments of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and that "first", "second" distinguishes Usually it is a class, and the number of objects is not limited. For example, the first object may be one or multiple.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. a method for realizing ID Mapping based on graph database, is characterized in that, comprises:

From the source ID data record, obtain the relationship between the ID node that occurs on the T-th day and the ID node that occurs on the T-th day;

The ID node that occurs on the T day, the ID node relationship that occurs on the T day, and the ID relationship network corresponding to the T-1 day are identified and connected, and the first ID relationship network corresponding to the T day is obtained;

The first ID relationship network is cleaned up according to the activity degree of the ID nodes in the first ID relationship network and the activity degree of the ID node relationship, and the second ID relationship network corresponding to the T-th day is obtained.

2 . The method for implementing ID Mapping based on a graph database according to claim 1 , wherein, according to the activity of the ID nodes in the first ID relationship network and the activity of the ID node relationship, the first Before the ID network is cleaned up, it also includes:

respectively update the attribute of the ID node and the attribute of the ID node relationship;

Perform feature extraction on the updated attribute of the ID node and the updated attribute of the ID node relationship, respectively, to obtain the feature value of the ID node and the feature value of the ID node relationship;

Obtain the activity of the ID node according to the feature value of the ID node and the weight corresponding to the feature value of the ID node;

The activity of the ID node relationship is acquired according to the feature value of the ID node relationship and the weight corresponding to the feature value of the ID node relationship.

3. The method for implementing ID Mapping based on a graph database according to claim 1, wherein the first ID relationship is determined according to the activity of the ID node and the activity of the ID node relationship in the first ID relationship network. The network is cleaned up, and the second ID relationship network corresponding to the T day is obtained, including:

In the case that the activity of the ID node in the first ID relationship network is less than the node activity threshold, clearing the ID node out of the first ID relationship network;

In the case that the activity of the ID node relationship in the first ID relationship network is less than the relationship activity threshold, clearing the ID node relationship out of the first ID relationship network;

According to the cleaned first ID relationship network, the second ID relationship network corresponding to the T-th day is acquired.

4. the method for realizing ID Mapping based on graph database according to claim 3, is characterized in that, described according to the described first ID relation network after cleaning, obtain the second ID relation network corresponding to T day, comprising:

In the case that the clearing of the ID node or the relationship between the ID nodes does not result in the splitting of the relationship subnet in the first ID relationship network, the unified identity of the relationship subnet in the second ID relationship network is the Describe the unified identity of the relationship subnet in the first ID relationship network;

In the case where the cleanup of the ID node or the relationship of the ID node causes the relationship subnet in the first ID relationship network to be split into multiple relationship subnets, the multiple relationships in the second ID relationship network The unified identity of a relational subnet in the subnet is the unified identity of the relational subnet in the first ID relational network, and the unified identity of other relational subnets in the plurality of relational subnets is newly generated. unified identity.

5. The method for implementing ID Mapping based on a graph database according to claim 1, wherein the first ID node activity and the activity degree of the ID node relationship in the first ID relationship network The ID relationship network is cleaned up, and after obtaining the second ID relationship network corresponding to the T day, it also includes:

Obtain the relationship between inactive ID nodes and inactive ID nodes within a preset time from the source ID data record;

updating the activity of the inactive ID node and the activity of the inactive ID node relationship;

The second ID relationship network is cleaned up according to the updated activity degree of the inactive ID node and the updated activity degree of the inactive ID node relationship, and the third ID relationship network corresponding to the T-th day is obtained.

6 . The method for implementing ID Mapping based on a graph database according to claim 1 , wherein the ID nodes that appear on the T-th day, the ID node relationships that appear on the T-th day, and the T-1 The ID relationship network corresponding to the day is identified and connected, and before obtaining the first ID relationship network corresponding to the T day, it also includes:

According to the ID node that appears on the T day and the identity mapping dictionary corresponding to the T-1 day, obtain the unified identity that exists on the T-1 day;

Obtain the ID relationship network corresponding to the T-1th day according to the unified identity identifier that exists on the T-1th day.

7 . The method for implementing ID Mapping based on a graph database according to claim 1 , wherein the ID nodes that appear on the T-th day, the ID node relationships that appear on the T-th day, and the T-1 The ID relationship network corresponding to the day is identified and connected, and after obtaining the first ID relationship network corresponding to the T day, it also includes:

In the case that the relationship subnet in the first ID relationship network has a unified identity, the unified identity of the relationship subnet is the existing unified identity;

In the case that the relationship subnet in the first ID relationship network has more than one unified identity, the unified identity of the relationship subnet is the unified identity with the earliest creation time and the most times of merging or splitting;

In the case where the relationship subnet in the first ID relationship network does not have a unified identity, the unified identity of the relationship subnet is a newly generated unified identity.

8. A device for realizing ID Mapping based on a graph database, characterized in that, comprising:

The first acquisition module is used to obtain the ID node relationship that occurs on the T day and the ID node relationship that occurs on the T day from the source ID data record;

The second acquisition module is used to identify and connect the ID nodes that appear on the T-th day, the ID node relationships that appear on the T-th day, and the ID relationship network corresponding to the T-1 day, and obtain the ID node corresponding to the T-th day. an ID network;

The third acquisition module is used to clean up the first ID relationship network according to the activity of the ID nodes in the first ID relationship network and the activity degree of the ID node relationship, and obtain the second ID relationship network corresponding to the T day .

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the computer program as claimed in the claims The method for implementing ID Mapping based on a graph database according to any one of 1 to 7.

10. A non-transitory computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the graph-based database according to any one of claims 1 to 7 is implemented A method to implement IDMapping.