WO2023065691A1 - User information fusion method and system under multilayer association, and terminal and storage medium - Google Patents

User information fusion method and system under multilayer association, and terminal and storage medium Download PDF

Info

Publication number
WO2023065691A1
WO2023065691A1 PCT/CN2022/098808 CN2022098808W WO2023065691A1 WO 2023065691 A1 WO2023065691 A1 WO 2023065691A1 CN 2022098808 W CN2022098808 W CN 2022098808W WO 2023065691 A1 WO2023065691 A1 WO 2023065691A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
entity
graph
unique
user information
Prior art date
Application number
PCT/CN2022/098808
Other languages
French (fr)
Chinese (zh)
Inventor
胡嘉宏
徐亚波
李旭日
古嘉宏
苏淦
Original Assignee
广州数说故事信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州数说故事信息科技有限公司 filed Critical 广州数说故事信息科技有限公司
Publication of WO2023065691A1 publication Critical patent/WO2023065691A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/2454Optimisation of common expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Definitions

  • the present invention relates to the technical field of user information fusion, and more specifically, to a user information fusion method, terminal, storage medium and system under multi-layer association.
  • the Customer Data Platform includes relevant data about audiences, users or customers that are generated and accumulated during the operation of the enterprise.
  • member user entity
  • data spreads across multiple information systems/data sources of the enterprise, such as customer relationship management system (Customer Relationship Management, CRM), POS transactions, behavior log collection data, etc.
  • CRM Customer Relationship Management
  • POS transactions behavior log collection data
  • the same natural user member will have different unique identifiers in different information systems or channels.
  • JD.com user ID and mobile phone are included in the purchase records of JD.com, and there are corresponding user IDs in the enterprise’s own CRM.
  • the user ID of the same natural user member in JD.com and CRM is different, which brings difficulties to subsequent data processing, labeling, and promotion of marketing activities; moreover, a natural user may have multiple accounts, which are marked in the system
  • a unique identifier is needed to identify a natural user, connect user data from different systems and different channels, identify the user data behind which is actually the same natural user, and give it a unique entity ID (after fusion Entity Unique Identifier), which is used for subsequent operations such as subsequent data processing, user labeling system construction, user operation statistical analysis, and marketing campaign push.
  • a Chinese invention patent (publication number: CN104394118A) disclosed a user identification method and system, the basic information formed through user registration, including user ID, user name, Email, telephone, computer IP, etc. , to extract the user behavior data of the website, the comprehensive behavior data involves user ID, user name, Email, phone number, cookie, computer IP and other information, establish the relationship between the two user information and assign a unique identity, which can be used for the current B2B
  • the users in the website can be identified uniformly, establish the relationship of identity characteristics, distinguish new and old users, and can effectively track user behavior, so that a series of applications can be established for users to improve user experience, but this patent cannot handle multi-layer associated user information
  • a user's mobile phone number A purchases a coupon, and then uses the coupon to make a purchase through the spare mobile phone number B (corresponding to user ID2)
  • the original records of mobile phone number A and mobile phone number B are two different user ID, but through this purchase
  • the present invention proposes a user information fusion method, terminal, storage medium and system under multi-layer association, which associates and identifies different accounts of natural users come out, and then integrate and determine the unique identifier to facilitate the push processing of subsequent related marketing activities.
  • a user information fusion method under multi-layer association the method at least includes:
  • each data source corresponds to a material table, determine the field in the material table that can identify the user and its corresponding channel type, and obtain the channel type information;
  • each data source corresponds to a material table
  • the material table includes: enterprise member information table, mall user information table, coupon Use the record table
  • the channel types in the material table include: user mobile phone number, member ID, Email, WeChat unionID, and WeChat openID. Records with the same channel type in different material tables correspond to the same natural user; the channel type field is for each combination of channel types and material tables corresponding to each channel type.
  • channel type + channel type field value is used as the vertex attribute, and the vertex attribute is hashed to obtain the vertex ID.
  • the vertex ID records the attribute of a natural user on a certain channel type
  • edges are connections between vertices, recording the association of a natural user on different channel types
  • channel type + channel type field value is used as the vertex attribute, and there is no need to record the information of the material table where it is located. It is simple and easy to operate. From the linear material table relationship to the two-dimensional plane graph relationship, it is convenient Multi-layer complex associations are supported.
  • each user entity Preferably, if the unique ID of each user entity only appears in one user connectivity graph, multiple natural users generated by newly added user information are also independent natural users in the historical material table, and the connectivity graph of each user is fused as needed User Info.
  • the entity unique ID connectivity graph is constructed with each entity unique ID as a vertex, and the relationship between multiple entity unique IDs appearing in the same user connectivity graph as an edge, which greatly reduces the amount of data that needs to be involved in the calculation , so as to effectively improve the computational efficiency.
  • the method for determining the entity unique IDs connected together in the entity unique ID connected graph described in step S5 is a connected component algorithm; after associating the user connected graphs corresponding to the connected entity unique IDs, each user connected graph corresponds to a natural user.
  • step S6 when performing deduplication processing on the unique ID of the entity described in step S6, it includes:
  • the user connectivity graph has no entity unique ID: the current natural user is a new user, an entity unique ID is generated based on the user connectivity graph, and UUID is used to ensure its uniqueness;
  • the user connectivity graph only has one entity unique ID: the old and new user information data of the current natural user belong to the same natural user, and the old entity unique ID will be used;
  • the present invention proposes a terminal, including a processor, a memory, and a computer program stored in the memory, and the processor executes the computer program stored in the memory to realize the steps of the user information fusion method under multi-layer association .
  • the present invention proposes a computer storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the instructions are executed by a processor, the steps of the user information fusion method under multi-layer association are realized.
  • the present invention also proposes a user information fusion system under multi-layer association, the system is used to realize the user information fusion method under multi-layer association, including:
  • the channel type information acquisition module selects the data source of the user information to be integrated, each data source corresponds to a material table, determines the field that can identify the user in the material table and its corresponding channel type, and obtains the channel type information;
  • the user connectivity graph construction module determines the vertices and edges based on the channel type information, constructs the user information graph based on the vertices and edges, and then splits the user information graph into independent user connectivity graphs;
  • the historical data association module uses the channel type association table to query the historical data of the unique ID of the entity corresponding to each vertex of the user connectivity graph, and obtains the unique IDs of all entities corresponding to the user connectivity graph.
  • the relationship between the unique IDs of multiple entities in the network is used as an edge to construct a connected graph of entity unique IDs, so as to use historical data to associate user connected graphs;
  • Judging module used to judge whether the unique ID of each user entity only appears in one user connectivity graph, if so, fuse user information on demand for each user connectivity graph; otherwise, construct the entity unique ID connectivity graph;
  • the user connected graph association module determines the entity unique IDs connected together in the entity unique ID connected graph, and associates the user connected graphs corresponding to the connected entity unique IDs;
  • the deduplication processing module reads the entity unique IDs corresponding to all vertices of the user connectivity graph, and performs deduplication processing on the entity unique IDs;
  • the update module uses the unique ID of the entity after deduplication to update the channel type association table.
  • the present invention proposes a user information fusion method, terminal, storage medium and system under multi-layer association.
  • the data source of the user information to be integrated is selected.
  • the data source corresponds to the material table, and further according to the association relationship in the material table
  • Obtain channel type information build a user information graph according to the channel type information, and then split the user information graph into independent user connectivity graphs, expanding from a linear material table relationship to a two-dimensional plane graph relationship, conveniently supporting multiple Layer complex association, in the complex association of a large amount of data, find the records belonging to the same natural user, which can be used for subsequent data processing, user label system construction, user operation statistical analysis, marketing activity push and other operations; based on the user connectivity graph
  • the unique ID connection graph of the entity is further constructed.
  • the two independent user entities in the historical material table data are fused together, which greatly reduces the need to participate in the calculation. The amount of data can effectively improve the calculation efficiency.
  • FIG. 1 shows a schematic flow diagram of a user information fusion method under multi-layer association proposed in Embodiment 1 of the present invention
  • Fig. 2 shows the independent user connectivity graph that divides the user information graph into proposed in Embodiment 1 of the present invention
  • Fig. 3 shows the user connectivity diagram corresponding to the historical data in the historical material table proposed in Embodiment 1 of the present invention
  • Fig. 4 represents the schematic diagram of the unique ID of the user entity in the corresponding channel type association table for each vertex of the current user connectivity graph proposed in Embodiment 1 of the present invention
  • Fig. 5 shows a schematic diagram of associating user connectivity graphs corresponding to the entity unique IDs proposed in Embodiment 1 of the present invention
  • Fig. 6 shows the final user connectivity graph proposed in Embodiment 1 of the present invention
  • FIG. 7 shows a schematic structural diagram of a user information fusion system under multi-layer association proposed in Embodiment 3 of the present invention.
  • each data source corresponds to a material table, determine the field in the material table that can identify the user and its corresponding channel type, and obtain the channel type information;
  • the material table includes: enterprise member information table, mall user information table, and coupon usage records Table; the channel types in the material table include: user mobile phone number, member ID, Email, WeChat unionID and WeChat openID. Records with the same channel type in different material tables correspond to the same natural user; the channel type field is for each channel Combination of type and material table corresponding to each channel type.
  • This field of the table can have multiple values, such as coupon ID, a user can purchase and use multiple coupons, but a coupon can only be purchased and used by one user; such a field is a channel type field , such as coupon ID, email, mobile phone number, channel type indicates the scene where the user entity has a relationship with the outside world, and determines the channel type field that can be associated between the material table, and each such field set is recorded as a "channel type".
  • the "mobile phone number” field in the coupon use record table, the "mobile phone number” in the user information table, and other fields in the material table that represent the mobile phone number of the consumer user constitute a channel type.
  • channel types such as ID card and Email.
  • a user information graph is also composed of vertices and connections between vertices.
  • channel type in step S1 records with the same channel type in different material tables should correspond to the same natural Users, such as the record with mobile phone number 13344445555 in the coupon use record table and the record with mobile phone number 13344445555 in the mall user information table belong to the same natural user, so the "channel type + channel type field value" is used as the Vertex attribute, hash the vertex attribute to get the vertex ID, which records the attribute of a natural user on a certain channel type; the hash calculation in this process is realized by the currently public MurMusHash algorithm, which is the Conventional technologies in the field will not be repeated here.
  • edges are connections between vertices, which record the association of a natural user on different channel types; in addition, the edited The weighted value of can be the update time of this record, which can be used in subsequent user connectivity graphs to resolve conflicts.
  • an independent user connectivity graph corresponds to a unique natural user, and there are no connected edges with other user connectivity graphs.
  • the connected component algorithm uses the connected component algorithm to split the user information graph , the connected component algorithm currently has a relatively mature programming language that can be directly applied, such as Spark GraphX, etc.
  • the direct result of the calculation is the minimum vertex ID of the user information graph where each vertex is located, and the minimum vertex ID is grouped and aggregated, using
  • the GROUP function can be implemented to obtain all the vertices of each independent user connectivity graph, that is, to obtain the information (vertices and corresponding edges) of each independent user connectivity graph.
  • This step is the process of correlating historical data. If the method proposed by the present invention is executed for the first time, there is no need to correlate history, and this step can be skipped.
  • the mapping already exists for the data source to be integrated with user information is a new computing task, it can be assumed that the mapping already exists.
  • the user connection graph formed by it does not represent comprehensive user channel type information, and historical data needs to be associated to perform accurate user information fusion calculations;
  • step S4 Determine whether the unique ID of each user entity only appears in one user connection graph, if so, fuse user information on demand for each user connection graph; otherwise, construct the entity unique ID connection graph, and execute step S5; if each user The unique ID of the entity only appears in one user connectivity graph, and the multiple natural users generated by the newly added user information are also independent natural users in the historical material table. Therefore, in the future, only the user information needs to be fused on demand for each user connectivity graph ; If multiple user connectivity graphs have the same user entity unique ID, the multiple natural users generated by the new user information data actually belong to the same natural user in the old data and can be merged. Multiple connected graphs of the unique ID of the entity are connected together, which can be realized by constructing a connected graph of the unique ID of the entity.
  • the entity unique ID connectivity graph is constructed with the unique ID of each entity as the vertex, and the relationship between multiple entity unique IDs in the same user connectivity graph as the edge, which greatly reduces the amount of data that needs to be involved in the calculation, thereby effectively improving Computational efficiency.
  • step S6 Read the entity unique IDs corresponding to all vertices of the user connected graph, and deduplicate the entity unique IDs; the method of determining the entity unique IDs connected together in the entity unique ID connected graph described in step S5 is also a connected component algorithm , belongs to a kind of graph algorithm, connectivity means that in an undirected graph, if there is a path from vertex v to vertex w, then v and w are said to be connected. If any two vertices in graph G are connected, then graph G is called connected graph, otherwise it is called disconnected graph. After associating the user connectivity graphs corresponding to the unique IDs of the connected entities, each user connectivity graph corresponds to a natural user;
  • the user connectivity graph does not have an entity unique ID: the current natural user is a new user, an entity unique ID is generated based on the user connectivity graph, and UUID is used to ensure its uniqueness; UUID is the abbreviation of Universal Unique Identification Code, which is a relatively conventional Technology.
  • the user connectivity graph only has one entity unique ID: the old and new user information data of the current natural user belong to the same natural user, and the old entity unique ID will be used;
  • the attributes of the vertices contain information about the channel type and its value, and store its relationship with the unique ID of the entity in the channel type association table.
  • the table name of the association table is channel_entity_relation_ ⁇ channel type>, such as channel_entity_relation _Mobile phone number, the structure is:
  • the method proposed in this embodiment will be further described below in combination with specific implementation scenarios.
  • the scenario is simplified as follows: the user receives coupons in the membership system, and then uses the coupons to shop in JD.com.
  • coupon usage record table Assume that the data source only contains 3 material tables: coupon usage record table, JD user information table and internal member information table.
  • the Jingdong user information table has the records shown in Table 3.
  • the internal member information table has the records shown in Table 4.
  • a user information graph is constructed and split into independent user connectivity graphs.
  • the vertices of the user information graph are calculated according to the information in the material table, as shown in Table 5.
  • Table 5 a relatively simple value is used to refer to the actual vertex attribute hash value.
  • Vertex ID vertex attributes 1111 [mobile phone, 13344445555] 2222 [mobile phone, 13455556666] 3333 [Mobile phone_13566667777] 4444 [Member ID_, C001] 5555 [Member ID_, C002] 6666 [JD user ID, JD001] 7777 [JD user ID, JD002] 8888 [JD user ID, JD003]
  • Vertex ID The minimum vertex ID of the connected graph vertex attributes 1111 1111 [mobile phone, 13344445555] 2222 1111 [mobile phone, 13455556666] 3333 3333 [mobile phone, 13566667777] 4444 1111 [Member ID, C001] 5555 3333 [Member ID, C002] 6666 1111 [JD user ID, JD001] 7777 1111 [JD user ID, JD002] 8888 3333 [JD user ID, JD003]
  • the user connectivity graph is shown in Figure 2. It can be seen from Figure 2 that, according to the "minimum vertex ID of the connected graph" column, the entire user information graph is split into two independent user connectivity graphs, corresponding to Two natural users, including user connectivity graph 1 and user connectivity graph 2.
  • the historical data in the historical material table has been calculated, and the channel type association table is generated.
  • the historical data is the coupon use record table with the records shown in Table 7:
  • the old user information is related to two independent users, and they are the same user, and user information fusion has occurred.
  • the updated channel type association table includes the "mobile phone number” channel type association table shown in Table 10, the "member ID” channel type association table shown in Table 11, and the "JD user ID” channel type association table shown in Table 12.
  • CDP data is based on the unique ID of the entity.
  • the unique ID of the entity For example, to calculate the user portrait, calculate the unique ID of the entity, and get the user (entity) portrait that is the unique ID of each entity; if you want to do marketing activities in the membership system, you can filter the crowd according to the user portrait, and get the corresponding user portrait group A series of entity unique IDs, and then according to the JD user ID channel association table, the corresponding JD user ID can be found, so as to perform marketing campaign push and other touch operations.
  • the present invention proposes a terminal, including a processor, a memory, and a computer program stored on the memory, and the processor executes the computer program stored on the memory to realize the user information fusion under the multi-layer association described in Embodiment 1
  • the memory can be a magnetic disk, flash memory or any other non-volatile storage medium
  • the processor is connected to the memory, and can be implemented as one or more integrated circuits, specifically, it can be a microprocessor or a microcontroller , when executing the computer program stored on the memory, the fusion of user information under multi-layer association is realized.
  • the present invention proposes a computer storage medium.
  • the computer readable storage medium stores computer program instructions. When the instructions are executed by a processor, the steps of the user information fusion method under multi-layer association described in Embodiment 1 are realized.
  • the present invention also proposes a user information fusion system under multi-layer association, the system is used to implement the user information fusion method under multi-layer association described in Embodiment 1, including:
  • the channel type information acquisition module selects the data source of the user information to be integrated, each data source corresponds to a material table, determines the channel type and the channel type field identifying the association relationship in the material table, and obtains the channel type information;
  • the user connectivity graph construction module determines the vertices and edges based on the channel type information, constructs the user information graph based on the vertices and edges, and then splits the user information graph into independent user connectivity graphs;
  • the historical data association module is used to collect the historical material table, determine the mapping between each channel type in the historical material table and the unique ID of the user entity, form an initial channel type association table, and further search for the channel type corresponding to each vertex of the user connectivity graph The unique ID of the user entity in the association table;
  • Judging module used to judge whether the unique ID of each user entity only appears in one user connectivity graph, if so, fuse user information on demand for each user connectivity graph; otherwise, construct the entity unique ID connectivity graph;
  • the user connected graph association module determines the entity unique IDs connected together in the entity unique ID connected graph, and associates the user connected graphs corresponding to the connected entity unique IDs;
  • the deduplication processing module reads the entity unique IDs corresponding to all vertices of the user connectivity graph, and performs deduplication processing on the entity unique IDs;
  • the update module uses the unique ID of the entity after deduplication to update the channel type association table.

Abstract

A user information fusion method and system under multilayer association, and a terminal and a storage medium. The method comprises: firstly, a data source of user information to be integrated is selected, the data source corresponds to a material table, channel type information is further obtained according to an association relation in the material table, a user information graph is constructed according to the channel type information, the user information graph is split into independent user connected graphs, a linear material table relation is expanded into a two-dimensional plane graph relation, multilayer complex association is conveniently supported, and in the complex association of a large amount of data, records belonging to a same natural user are found; on the basis of the user connected graphs, an entity unique ID connected graph is further constructed in combination with a historical material table, and two independent user entities in data of the historical material table are fused together with the help of new association provided by newly added user information data. The problem of user information fusion under multilayer association is solved.

Description

多层关联下的用户信息融合方法、终端、存储介质及系统User information fusion method, terminal, storage medium and system under multi-layer association 技术领域technical field
本发明涉及用户信息融合的技术领域,更具体地,涉及一种多层关联下的用户信息融合方法、终端、存储介质及系统。The present invention relates to the technical field of user information fusion, and more specifically, to a user information fusion method, terminal, storage medium and system under multi-layer association.
背景技术Background technique
客户数据平台(Customer Data Platform,CDP)包含了企业在运营过程中产生并积累的受众、用户或者客户的相关数据。在CDP中,会员(用户实体)数据遍布企业多个信息系统/数据源,如客户关系管理系统(Customer Relationship Management,CRM)、POS交易、行为日志采集数据等。而同一个自然用户会员,在不同的信息系统或渠道中,会有不同的唯一标识,如在京东商城的购买记录中有京东的用户ID和手机,在企业自己的CRM中有对应的用户ID,同一个自然用户会员在京东商城和CRM的用户ID并不一样,给后续的数据处理、打标签、营销活动推送等带来困难;而且,一个自然用户可能拥有多个帐号,在系统中标记为不同用户,因此,需要一个唯一标识来标识一个自然用户,将来自不同系统、不同渠道的用户数据打通,识别其中背后实际为同一自然用户的用户数据,并为其赋予实体唯一ID(融合后的实体唯一标识符),用于后续的给后续的数据处理、用户标签体系构建、用户运营统计分析、营销活动推送等操作。The Customer Data Platform (CDP) includes relevant data about audiences, users or customers that are generated and accumulated during the operation of the enterprise. In CDP, member (user entity) data spreads across multiple information systems/data sources of the enterprise, such as customer relationship management system (Customer Relationship Management, CRM), POS transactions, behavior log collection data, etc. However, the same natural user member will have different unique identifiers in different information systems or channels. For example, JD.com’s user ID and mobile phone are included in the purchase records of JD.com, and there are corresponding user IDs in the enterprise’s own CRM. , the user ID of the same natural user member in JD.com and CRM is different, which brings difficulties to subsequent data processing, labeling, and promotion of marketing activities; moreover, a natural user may have multiple accounts, which are marked in the system For different users, therefore, a unique identifier is needed to identify a natural user, connect user data from different systems and different channels, identify the user data behind which is actually the same natural user, and give it a unique entity ID (after fusion Entity Unique Identifier), which is used for subsequent operations such as subsequent data processing, user labeling system construction, user operation statistical analysis, and marketing campaign push.
2015年3月14日,中国发明专利(公布号:CN104394118A)中公开了一种用户身份识别方法及系统,通过用户注册形成的基本信息,包括用户ID,用户名、Email、电话、计算机IP等,对网站用户行为数据进行提取,综合行为数据中涉及用户ID、用户名、Email、电话号码、Cookie、计算机IP等信息,建立两者的用户信息关联关系并赋予唯一标识身份,能够对目前B2B网站中的用户做统一身份识别,建立身份特征关系,分辨新老用户,能够有效的跟踪用户行为,从而能够针对用户建立一系列应用,提高用户体验,但该专利无法处理多层关联的用户信息融合场景,如一个用户手机号A(对应用户ID1)购买了优惠券,随后通过备用手机号B(对应用户ID2)使用该优惠券进行购买,手机号A和手机号B原记录为两个不同的用户ID,但通过这次购买行为,可以认为这两个用户ID是 属于同一个自然用户的;然而该方法将用户ID作为用户唯一标识,对于此场景无法产生关联,无法融合这两个用户信息。On March 14, 2015, a Chinese invention patent (publication number: CN104394118A) disclosed a user identification method and system, the basic information formed through user registration, including user ID, user name, Email, telephone, computer IP, etc. , to extract the user behavior data of the website, the comprehensive behavior data involves user ID, user name, Email, phone number, cookie, computer IP and other information, establish the relationship between the two user information and assign a unique identity, which can be used for the current B2B The users in the website can be identified uniformly, establish the relationship of identity characteristics, distinguish new and old users, and can effectively track user behavior, so that a series of applications can be established for users to improve user experience, but this patent cannot handle multi-layer associated user information In a fusion scenario, for example, if a user's mobile phone number A (corresponding to user ID1) purchases a coupon, and then uses the coupon to make a purchase through the spare mobile phone number B (corresponding to user ID2), the original records of mobile phone number A and mobile phone number B are two different user ID, but through this purchase behavior, it can be considered that the two user IDs belong to the same natural user; however, this method uses the user ID as the unique user identifier, which cannot be associated with this scenario and cannot be integrated. information.
发明内容Contents of the invention
为解决当前用户信息融合方法无法实现多层关联下的用户信息融合问题,本发明提出一种多层关联下的用户信息融合方法、终端、存储介质及系统,将自然用户的不同帐号关联并识别出来,进而融合并确定唯一标识,方便后续相关营销活动的推送处理。In order to solve the problem that the current user information fusion method cannot realize the user information fusion under multi-layer association, the present invention proposes a user information fusion method, terminal, storage medium and system under multi-layer association, which associates and identifies different accounts of natural users come out, and then integrate and determine the unique identifier to facilitate the push processing of subsequent related marketing activities.
为了达到上述技术效果,本发明的技术方案如下:In order to achieve the above-mentioned technical effect, the technical scheme of the present invention is as follows:
一种多层关联下的用户信息融合方法,所述方法至少包括:A user information fusion method under multi-layer association, the method at least includes:
S1.选定待整合用户信息的数据源,每一个数据源对应一个物料表,确定物料表中能标识用户的字段及其对应渠道类型,得到渠道类型信息;S1. Select the data source of the user information to be integrated, each data source corresponds to a material table, determine the field in the material table that can identify the user and its corresponding channel type, and obtain the channel type information;
S2.根据渠道类型信息,确定顶点和边,基于顶点和边,构建用户信息图,然后将用户信息图拆分为独立的用户连通图;S2. Determine the vertices and edges according to the channel type information, build a user information graph based on the vertices and edges, and then split the user information graph into independent user connectivity graphs;
S3.利用渠道类型关联表,查询用户连通图的每个顶点对应实体唯一ID的历史数据,得到用户连通图对应的所有实体唯一ID,进一步以实体唯一ID作为顶点,以用户连通图里的多个实体唯一ID的关系作为边,构造实体唯一ID连通图,从而利用历史数据关联了用户连通图;S3. Use the channel type association table to query the historical data of the entity unique ID corresponding to each vertex of the user connectivity graph, and obtain all entity unique IDs corresponding to the user connectivity graph. The relationship of the unique ID of each entity is used as an edge to construct a connected graph of the unique ID of the entity, thereby using the historical data to associate the connected graph of the user;
S4.判断每个用户实体唯一ID是否仅出现在一个用户连通图中,若是,对每个用户连通图按需融合用户信息;否则,构造实体唯一ID连通图,执行步骤S5;S4. Determine whether the unique ID of each user entity only appears in one user connectivity graph, if so, fuse user information on demand for each user connectivity graph; otherwise, construct the entity unique ID connectivity graph, and execute step S5;
S5.确定实体唯一ID连通图中连在一起的实体唯一ID,将连在一起的实体唯一ID对应的用户连通图关联;S5. Determine the entity unique IDs connected together in the entity unique ID connected graph, and associate the user connected graphs corresponding to the connected entity unique IDs;
S6.读取用户连通图的所有顶点对应的实体唯一ID,对实体唯一ID进行去重处理;S6. Read the entity unique IDs corresponding to all vertices of the user connectivity graph, and deduplicate the entity unique IDs;
S7.利用去重处理后的实体唯一ID更新渠道类型关联表。S7. Utilize the unique ID of the entity after deduplication processing to update the channel type association table.
在本技术方案中,通过构建用户连通图及拆分用户连通图,支持多层复杂关联关系,而非固定的简单关联规则,使用者仅需关注物料表两两的关联,无需关注所有物料表之间是如何关联起来的,并且支持新增用户信息的运行,在用户连通图的基础上,结合渠道类型关联表(历史数据下能反映渠道类型取值与实体唯一ID的映射关系的表),进一步构建实体唯一ID连通图,从而可以在新增量数据提供的新关联的帮助下,将历史数据里认为是两个独立用户实体融合在一起, 适用范围不局限于很具体的场景和业务逻辑,而可以根据业务实际,整理渠道类型关系,适配不同场景,从用户信息融合,扩展到商品、场所的信息融合。In this technical solution, by constructing and splitting the user connectivity graph, multi-layer complex association relationships are supported instead of fixed simple association rules. Users only need to pay attention to the association of two material tables, and do not need to pay attention to all material tables How are they related, and supports the operation of new user information, based on the user connection graph, combined with the channel type association table (a table that can reflect the mapping relationship between the channel type value and the unique ID of the entity under historical data) , to further construct the entity unique ID connectivity graph, so that with the help of the new association provided by the new incremental data, the two independent user entities considered in the historical data can be merged together, and the scope of application is not limited to very specific scenarios and businesses Logic, but can sort out the channel type relationship according to the actual business, adapt to different scenarios, expand from user information fusion to product and place information fusion.
优选地,在企业的客户数据平台CDP中,用户信息遍布企业的多个数据源,每一个数据源对应一个物料表,所述的物料表包括:企业会员信息表、商城用户信息表、优惠券使用记录表;物料表中渠道类型包括:用户手机号、会员ID、Email、微信unionID及微信openID,渠道类型在不同物料表中取值一样的记录对应同一个自然用户;渠道类型字段为每一种渠道类型与每一种渠道类型对应的物料表的组合。Preferably, in the customer data platform CDP of the enterprise, user information spreads across multiple data sources of the enterprise, each data source corresponds to a material table, and the material table includes: enterprise member information table, mall user information table, coupon Use the record table; the channel types in the material table include: user mobile phone number, member ID, Email, WeChat unionID, and WeChat openID. Records with the same channel type in different material tables correspond to the same natural user; the channel type field is for each combination of channel types and material tables corresponding to each channel type.
优选地,以“渠道类型+渠道类型字段取值”作为顶点属性,对顶点属性进行hash取值,得到顶点ID,顶点ID记录了一个自然用户在某个渠道类型上的属性;Preferably, "channel type + channel type field value" is used as the vertex attribute, and the vertex attribute is hashed to obtain the vertex ID. The vertex ID records the attribute of a natural user on a certain channel type;
以物料表中某一条记录中出现的多个“渠道类型字段取值”作为边,所述的边为顶点之间的连线,记录了一个自然用户在不同渠道类型上的关联;Multiple "channel type field values" appearing in a certain record in the material table are used as edges, and the edges are connections between vertices, recording the association of a natural user on different channel types;
连接顶点和边,构建用户信息图;然后利用连通分量算法将用户信息图拆分,得出每个顶点所在用户信息图的最小顶点ID,对最小顶点ID进行分组聚合,得到每个独立的用户连通图的所有顶点,也即得到了每个独立的用户连通图。Connect vertices and edges to build a user information graph; then use the connected component algorithm to split the user information graph to obtain the minimum vertex ID of the user information graph where each vertex is located, group and aggregate the minimum vertex ID to obtain each independent user All vertices of the connected graph, that is, each independent user connected graph is obtained.
在此,以“渠道类型+渠道类型字段取值”作为顶点属性,无需记录其所在的物料表的信息,简单易操作,从线性的物料表关系,扩展为二维平面的图关系,方便地支持了多层复杂关联。Here, "channel type + channel type field value" is used as the vertex attribute, and there is no need to record the information of the material table where it is located. It is simple and easy to operate. From the linear material table relationship to the two-dimensional plane graph relationship, it is convenient Multi-layer complex associations are supported.
优选地,若每个用户实体唯一ID仅出现在一个用户连通图中,则新增用户信息产生的多个自然用户在历史物料表里也是独立的自然用户,对每个用户连通图按需融合用户信息。Preferably, if the unique ID of each user entity only appears in one user connectivity graph, multiple natural users generated by newly added user information are also independent natural users in the historical material table, and the connectivity graph of each user is fused as needed User Info.
优选地,所述的实体唯一ID连通图是以每一个实体唯一ID作为顶点,以同一个用户连通图里出现多个实体唯一ID的关系作为边构造的,大幅减少了需要参与计算的数据量,从而有效提升计算效率。Preferably, the entity unique ID connectivity graph is constructed with each entity unique ID as a vertex, and the relationship between multiple entity unique IDs appearing in the same user connectivity graph as an edge, which greatly reduces the amount of data that needs to be involved in the calculation , so as to effectively improve the computational efficiency.
优选地,步骤S5所述确定实体唯一ID连通图中连在一起的实体唯一ID的方法为连通分量算法;将连在一起的实体唯一ID对应的用户连通图关联后,每个用户连通图对应一个自然用户。Preferably, the method for determining the entity unique IDs connected together in the entity unique ID connected graph described in step S5 is a connected component algorithm; after associating the user connected graphs corresponding to the connected entity unique IDs, each user connected graph corresponds to a natural user.
优选地,步骤S6所述的对实体唯一ID进行去重处理时,包括:Preferably, when performing deduplication processing on the unique ID of the entity described in step S6, it includes:
a.用户连通图没有实体唯一ID:当前自然用户是新用户,基于用户连通图生 成一个实体唯一ID,并使用UUID以保证其唯一性;a. The user connectivity graph has no entity unique ID: the current natural user is a new user, an entity unique ID is generated based on the user connectivity graph, and UUID is used to ensure its uniqueness;
b.用户连通图仅一个实体唯一ID:当前自然用户的新旧用户信息数据都属于同一个自然用户,则沿用旧实体唯一ID;b. The user connectivity graph only has one entity unique ID: the old and new user information data of the current natural user belong to the same natural user, and the old entity unique ID will be used;
c.用户连通图存在多个实体唯一ID:新增用户信息产生了用户信息融合,仅保留其中任意一个实体唯一ID。c. There are multiple entity unique IDs in the user connectivity graph: Adding user information generates user information fusion, and only retains any one of the entity unique IDs.
本发明提出一种终端,包括处理器、存储器及存储在存储器上的计算机程序,所述处理器执行存储在存储器上的计算机程序,以实现所述的多层关联下的用户信息融合方法的步骤。The present invention proposes a terminal, including a processor, a memory, and a computer program stored in the memory, and the processor executes the computer program stored in the memory to realize the steps of the user information fusion method under multi-layer association .
本发明提出一种计算机存储介质,所述计算机可读存储介质上存储有计算机程序指令,该指令被处理器执行时,实现所述的多层关联下的用户信息融合方法的步骤。The present invention proposes a computer storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the instructions are executed by a processor, the steps of the user information fusion method under multi-layer association are realized.
本发明还提出一种多层关联下的用户信息融合系统,所述系统用于实现所述的多层关联下的用户信息融合方法,包括:The present invention also proposes a user information fusion system under multi-layer association, the system is used to realize the user information fusion method under multi-layer association, including:
渠道类型信息获取模块,选定待整合用户信息的数据源,每一个数据源对应一个物料表,确定物料表中能标识用户的字段及其对应渠道类型,得到渠道类型信息;The channel type information acquisition module selects the data source of the user information to be integrated, each data source corresponds to a material table, determines the field that can identify the user in the material table and its corresponding channel type, and obtains the channel type information;
用户连通图构建模块,根据渠道类型信息,确定顶点和边,基于顶点和边,构建用户信息图,然后将用户信息图拆分为独立的用户连通图;The user connectivity graph construction module determines the vertices and edges based on the channel type information, constructs the user information graph based on the vertices and edges, and then splits the user information graph into independent user connectivity graphs;
历史数据关联模块,利用渠道类型关联表,查询用户连通图的每个顶点对应实体唯一ID的历史数据,得到用户连通图对应的所有实体唯一ID,进一步以实体唯一ID作为顶点,以用户连通图里的多个实体唯一ID的关系作为边,构造实体唯一ID连通图,从而利用历史数据关联用户连通图;The historical data association module uses the channel type association table to query the historical data of the unique ID of the entity corresponding to each vertex of the user connectivity graph, and obtains the unique IDs of all entities corresponding to the user connectivity graph. The relationship between the unique IDs of multiple entities in the network is used as an edge to construct a connected graph of entity unique IDs, so as to use historical data to associate user connected graphs;
判断模块,用于判断每个用户实体唯一ID是否仅出现在一个用户连通图中,若是,对每个用户连通图按需融合用户信息;否则,构造实体唯一ID连通图;Judging module, used to judge whether the unique ID of each user entity only appears in one user connectivity graph, if so, fuse user information on demand for each user connectivity graph; otherwise, construct the entity unique ID connectivity graph;
用户连通图关联模块,确定实体唯一ID连通图中连在一起的实体唯一ID,将连在一起的实体唯一ID对应的用户连通图关联;The user connected graph association module determines the entity unique IDs connected together in the entity unique ID connected graph, and associates the user connected graphs corresponding to the connected entity unique IDs;
去重处理模块,读取用户连通图的所有顶点对应的实体唯一ID,对实体唯一ID进行去重处理;The deduplication processing module reads the entity unique IDs corresponding to all vertices of the user connectivity graph, and performs deduplication processing on the entity unique IDs;
更新模块,利用去重处理后的实体唯一ID更新渠道类型关联表。The update module uses the unique ID of the entity after deduplication to update the channel type association table.
与现有技术相比,本发明技术方案的有益效果是:Compared with the prior art, the beneficial effects of the technical solution of the present invention are:
本发明提出一种多层关联下的用户信息融合方法、终端、存储介质及系统,首先选定待整合用户信息的数据源,所述的数据源对应物料表,进一步根据物料表中的关联关系获得渠道类型信息,根据渠道类型信息,构建用户信息图,然后将用户信息图拆分为独立的用户连通图,从线性的物料表关系,扩展为二维平面的图关系,方便地支持了多层复杂关联,在大量数据的复杂关联中,找到属于同一个自然用户的记录,可用于后续的数据处理、用户标签体系构建、用户运营统计分析、营销活动推送等操作;在用户连通图的基础上,结合历史物料表,进一步构建实体唯一ID连通图,在新增用户信息数据提供的新关联的帮助下,将历史物料表数据里两个独立用户实体融合在一起,大幅减少了需要参与计算的数据量,从而有效提升计算效率。The present invention proposes a user information fusion method, terminal, storage medium and system under multi-layer association. Firstly, the data source of the user information to be integrated is selected. The data source corresponds to the material table, and further according to the association relationship in the material table Obtain channel type information, build a user information graph according to the channel type information, and then split the user information graph into independent user connectivity graphs, expanding from a linear material table relationship to a two-dimensional plane graph relationship, conveniently supporting multiple Layer complex association, in the complex association of a large amount of data, find the records belonging to the same natural user, which can be used for subsequent data processing, user label system construction, user operation statistical analysis, marketing activity push and other operations; based on the user connectivity graph In the above, combined with the historical material table, the unique ID connection graph of the entity is further constructed. With the help of the new association provided by the new user information data, the two independent user entities in the historical material table data are fused together, which greatly reduces the need to participate in the calculation. The amount of data can effectively improve the calculation efficiency.
附图说明Description of drawings
图1表示本发明实施例1中提出的多层关联下的用户信息融合方法的流程示意图;FIG. 1 shows a schematic flow diagram of a user information fusion method under multi-layer association proposed in Embodiment 1 of the present invention;
图2表示本发明实施例1中提出的将用户信息图拆分为的独立的用户连通图;Fig. 2 shows the independent user connectivity graph that divides the user information graph into proposed in Embodiment 1 of the present invention;
图3表示本发明实施例1中提出的历史物料表中历史数据对应的用户连通图;Fig. 3 shows the user connectivity diagram corresponding to the historical data in the historical material table proposed in Embodiment 1 of the present invention;
图4表示本发明实施例1中提出的查找当前用户连通图的每个顶点对应渠道类型关联表中的用户实体唯一ID的示意图;Fig. 4 represents the schematic diagram of the unique ID of the user entity in the corresponding channel type association table for each vertex of the current user connectivity graph proposed in Embodiment 1 of the present invention;
图5表示本发明实施例1中提出的将连在一起的实体唯一ID对应的用户连通图关联的示意图;Fig. 5 shows a schematic diagram of associating user connectivity graphs corresponding to the entity unique IDs proposed in Embodiment 1 of the present invention;
图6表示本发明实施例1中提出的最终的用户连通图;Fig. 6 shows the final user connectivity graph proposed in Embodiment 1 of the present invention;
图7表示本发明实施例3中提出的多层关联下的用户信息融合系统的结构示意图。FIG. 7 shows a schematic structural diagram of a user information fusion system under multi-layer association proposed in Embodiment 3 of the present invention.
具体实施方式Detailed ways
附图仅用于示例性说明,不能理解为对本专利的限制;The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;
对于本领域技术人员来说,附图中某些公知内容说明可能省略是可以理解的。For those skilled in the art, it is understandable that some well-known content descriptions in the drawings may be omitted.
下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.
附图中描述位置关系的仅用于示例性说明,不能理解为对本专利的限制;The positional relationship described in the drawings is only for illustrative purposes, and cannot be construed as a limitation to this patent;
实施例1Example 1
如图1所示,在本实施例中,首先提出了一种多层关联下的用户信息融合方 法,所述方法包括:As shown in Figure 1, in the present embodiment, at first a kind of user information fusion method under multi-layer association is proposed, and described method comprises:
S1.选定待整合用户信息的数据源,每一个数据源对应一个物料表,确定物料表中能标识用户的字段及其对应渠道类型,得到渠道类型信息;S1. Select the data source of the user information to be integrated, each data source corresponds to a material table, determine the field in the material table that can identify the user and its corresponding channel type, and obtain the channel type information;
以企业的客户数据平台CDP为背景,用户信息遍布企业的多个数据源,每一个数据源对应一个物料表,所述的物料表包括:企业会员信息表、商城用户信息表、优惠券使用记录表;物料表中渠道类型包括:用户手机号、会员ID、Email、微信unionID及微信openID,渠道类型在不同物料表中取值一样的记录对应同一个自然用户;渠道类型字段为每一种渠道类型与每一种渠道类型对应的物料表的组合。即首先找出系统中有哪些需要整合信息的数据来源,分析这些数据源对应的物料表,找到其中可以标识用户的字段;这种字段的某个取值对应一个用户,但一个用户在该物料表的该字段可以有多个取值,比如优惠券的ID,一个用户可以购买、使用多张优惠券,但一张优惠券只会被一个用户购买、使用;这样的字段就是一个渠道类型字段,如优惠券ID、Email、手机号,渠道类型表示用户实体与外界发生关系的场景,确定物料表之间可以关联的渠道类型字段,每一个这样的字段集合记为一个“渠道类型”。如优惠券使用记录表的“手机号”字段、用户信息表的“手机号”,还有其他物料表中表示消费者用户的手机号的字段,构成了一个渠道类型。同样地,还可能会有身份证、Email等渠道类型。Taking the enterprise's customer data platform CDP as the background, user information spreads across multiple data sources of the enterprise, and each data source corresponds to a material table. The material table includes: enterprise member information table, mall user information table, and coupon usage records Table; the channel types in the material table include: user mobile phone number, member ID, Email, WeChat unionID and WeChat openID. Records with the same channel type in different material tables correspond to the same natural user; the channel type field is for each channel Combination of type and material table corresponding to each channel type. That is, first find out which data sources in the system need to integrate information, analyze the material tables corresponding to these data sources, and find the fields in which users can be identified; a certain value of this field corresponds to a user, but a user in this material This field of the table can have multiple values, such as coupon ID, a user can purchase and use multiple coupons, but a coupon can only be purchased and used by one user; such a field is a channel type field , such as coupon ID, email, mobile phone number, channel type indicates the scene where the user entity has a relationship with the outside world, and determines the channel type field that can be associated between the material table, and each such field set is recorded as a "channel type". For example, the "mobile phone number" field in the coupon use record table, the "mobile phone number" in the user information table, and other fields in the material table that represent the mobile phone number of the consumer user constitute a channel type. Similarly, there may also be channel types such as ID card and Email.
S2.根据渠道类型信息,确定顶点和边,基于顶点和边,构建用户信息图,然后将用户信息图拆分为独立的用户连通图;S2. Determine the vertices and edges according to the channel type information, build a user information graph based on the vertices and edges, and then split the user information graph into independent user connectivity graphs;
同数据结构意义上的图一样,用户信息图也是由顶点和顶点之间的连线构成,根据步骤S1对渠道类型的定义,渠道类型在不同物料表中取值一样的记录应该对应同一个自然用户,如优惠券使用记录表的手机号为13344445555的记录,和商城用户信息表中的手机号为13344445555的记录,是属于同一个自然用户,所以以“渠道类型+渠道类型字段取值”作为顶点属性,对顶点属性进行hash取值,得到顶点ID,顶点ID记录了一个自然用户在某个渠道类型上的属性;在此过程中的哈希计算采用当前已公开的MurMusHash算法实现,是本领域常规技术,此处不再赘述。Like a graph in the sense of data structure, a user information graph is also composed of vertices and connections between vertices. According to the definition of channel type in step S1, records with the same channel type in different material tables should correspond to the same natural Users, such as the record with mobile phone number 13344445555 in the coupon use record table and the record with mobile phone number 13344445555 in the mall user information table belong to the same natural user, so the "channel type + channel type field value" is used as the Vertex attribute, hash the vertex attribute to get the vertex ID, which records the attribute of a natural user on a certain channel type; the hash calculation in this process is realized by the currently public MurMusHash algorithm, which is the Conventional technologies in the field will not be repeated here.
以物料表中某一条记录中出现的多个“渠道类型字段取值”作为边,所述的边为顶点之间的连线,记录了一个自然用户在不同渠道类型上的关联;另外,编的加权值可以是这条记录的更新时间,可供后续的用户连通图解决冲突使用。Multiple "channel type field values" appearing in a record in the material table are used as edges, and the edges are connections between vertices, which record the association of a natural user on different channel types; in addition, the edited The weighted value of can be the update time of this record, which can be used in subsequent user connectivity graphs to resolve conflicts.
连接顶点和边,构建用户信息图,理想情况下,一个独立的用户连通图对应了一个唯一的自然用户,与其他用户连通图是没有任何相连的边,利用连通分量算法将用户信息图拆分,所述的连通分量算法当前也有比较成熟的编程语言可以直接应用,如Spark GraphX等,计算的直接结果得出每个顶点所在用户信息图的最小顶点ID,对最小顶点ID进行分组聚合,采用GROUP函数可以实现,得到每个独立的用户连通图的所有顶点,也即得到了每个独立的用户连通图信息(顶点和相应的边)。Connect vertices and edges to build a user information graph. Ideally, an independent user connectivity graph corresponds to a unique natural user, and there are no connected edges with other user connectivity graphs. Use the connected component algorithm to split the user information graph , the connected component algorithm currently has a relatively mature programming language that can be directly applied, such as Spark GraphX, etc. The direct result of the calculation is the minimum vertex ID of the user information graph where each vertex is located, and the minimum vertex ID is grouped and aggregated, using The GROUP function can be implemented to obtain all the vertices of each independent user connectivity graph, that is, to obtain the information (vertices and corresponding edges) of each independent user connectivity graph.
S3.利用渠道类型关联表,查询用户连通图的每个顶点对应实体唯一ID的历史数据,得到用户连通图对应的所有实体唯一ID,进一步以实体唯一ID作为顶点,以用户连通图里的多个实体唯一ID的关系作为边,构造实体唯一ID连通图,从而利用历史数据关联了用户连通图;S3. Use the channel type association table to query the historical data of the entity unique ID corresponding to each vertex of the user connectivity graph, and obtain all entity unique IDs corresponding to the user connectivity graph. The relationship of the unique ID of each entity is used as an edge to construct a connected graph of the unique ID of the entity, thereby using the historical data to associate the connected graph of the user;
此步骤为关联历史数据的过程,若对于首次执行本发明提出的方法,无需关联历史,可以跳过本步骤,对于待整合用户信息的数据源属于新增计算任务,可以假定该映射已存在,其所构成的用户连通图并不代表全面的用户渠道类型信息,需要关联历史数据才能进行准确的用户信息融合计算;This step is the process of correlating historical data. If the method proposed by the present invention is executed for the first time, there is no need to correlate history, and this step can be skipped. For the data source to be integrated with user information is a new computing task, it can be assumed that the mapping already exists. The user connection graph formed by it does not represent comprehensive user channel type information, and historical data needs to be associated to perform accurate user information fusion calculations;
S4.判断每个用户实体唯一ID是否仅出现在一个用户连通图中,若是,对每个用户连通图按需融合用户信息;否则,构造实体唯一ID连通图,执行步骤S5;若每个用户实体唯一ID仅出现在一个用户连通图中,则新增用户信息产生的多个自然用户在历史物料表里也是独立的自然用户,因此,后续只需要对每个用户连通图按需融合用户信息;若多个用户连通图存在相同的用户实体唯一ID,则新增用户信息的数据产生的多个自然用户,实际上在旧数据里是属于同一个自然用户的,可以融合,需要把存在相同实体唯一ID的多个连通图连接在一起,可以通过构造实体唯一ID的连通图实现。S4. Determine whether the unique ID of each user entity only appears in one user connection graph, if so, fuse user information on demand for each user connection graph; otherwise, construct the entity unique ID connection graph, and execute step S5; if each user The unique ID of the entity only appears in one user connectivity graph, and the multiple natural users generated by the newly added user information are also independent natural users in the historical material table. Therefore, in the future, only the user information needs to be fused on demand for each user connectivity graph ; If multiple user connectivity graphs have the same user entity unique ID, the multiple natural users generated by the new user information data actually belong to the same natural user in the old data and can be merged. Multiple connected graphs of the unique ID of the entity are connected together, which can be realized by constructing a connected graph of the unique ID of the entity.
其中,实体唯一ID连通图是以每一个实体唯一ID作为顶点,以同一个用户连通图里出现多个实体唯一ID的关系作为边构造的,大幅减少了需要参与计算的数据量,从而有效提升计算效率。Among them, the entity unique ID connectivity graph is constructed with the unique ID of each entity as the vertex, and the relationship between multiple entity unique IDs in the same user connectivity graph as the edge, which greatly reduces the amount of data that needs to be involved in the calculation, thereby effectively improving Computational efficiency.
S5.确定实体唯一ID连通图中连在一起的实体唯一ID,将连在一起的实体唯一ID对应的用户连通图关联;S5. Determine the entity unique IDs connected together in the entity unique ID connected graph, and associate the user connected graphs corresponding to the connected entity unique IDs;
S6.读取用户连通图的所有顶点对应的实体唯一ID,对实体唯一ID进行去重处理;步骤S5所述确定实体唯一ID连通图中连在一起的实体唯一ID的方法 也为连通分量算法,属于图算法的一种,连通是指在无向图中,如果有从顶点v到顶点w的路径存在,则称v和w是连通的。若图G中任意两个顶点都是连通的,则称图G为连通图,否则成为非连通图。将连在一起的实体唯一ID对应的用户连通图关联后,每个用户连通图对应一个自然用户;S6. Read the entity unique IDs corresponding to all vertices of the user connected graph, and deduplicate the entity unique IDs; the method of determining the entity unique IDs connected together in the entity unique ID connected graph described in step S5 is also a connected component algorithm , belongs to a kind of graph algorithm, connectivity means that in an undirected graph, if there is a path from vertex v to vertex w, then v and w are said to be connected. If any two vertices in graph G are connected, then graph G is called connected graph, otherwise it is called disconnected graph. After associating the user connectivity graphs corresponding to the unique IDs of the connected entities, each user connectivity graph corresponds to a natural user;
对实体唯一ID进行去重处理时,包括:When deduplicating the unique ID of an entity, it includes:
a.用户连通图没有实体唯一ID:当前自然用户是新用户,基于用户连通图生成一个实体唯一ID,并使用UUID以保证其唯一性;UUID是通用唯一识别码的缩写,是现有比较常规的技术。a. The user connectivity graph does not have an entity unique ID: the current natural user is a new user, an entity unique ID is generated based on the user connectivity graph, and UUID is used to ensure its uniqueness; UUID is the abbreviation of Universal Unique Identification Code, which is a relatively conventional Technology.
b.用户连通图仅一个实体唯一ID:当前自然用户的新旧用户信息数据都属于同一个自然用户,则沿用旧实体唯一ID;b. The user connectivity graph only has one entity unique ID: the old and new user information data of the current natural user belong to the same natural user, and the old entity unique ID will be used;
c.用户连通图存在多个实体唯一ID:新增用户信息产生了用户信息融合,仅保留其中任意一个实体唯一ID,因为具体保存那个对计算结果的影响较小,所以一般保存第一个。c. There are multiple entity unique IDs in the user connectivity graph: Adding user information generates user information fusion, and only retaining any one of the entity unique IDs, because the specific saving of which one has less impact on the calculation results, so the first one is generally saved.
S7.利用去重处理后的实体唯一ID更新渠道类型关联表;S7. Utilize the unique ID of the entity after deduplication processing to update the channel type association table;
具体的,遍历所有的顶点,顶点的属性中包含渠道类型及其取值的信息,在渠道类型关联表存储其与实体唯一ID的关系,关联表的表名为channel_entity_relation_<渠道类型>,如channel_entity_relation_手机号,结构为:Specifically, traverse all vertices. The attributes of the vertices contain information about the channel type and its value, and store its relationship with the unique ID of the entity in the channel type association table. The table name of the association table is channel_entity_relation_<channel type>, such as channel_entity_relation _Mobile phone number, the structure is:
渠道类型字段取值Channel type field value 实体唯一IDEntity Unique ID
1334444555513344445555 33f5ec7daf694f9ca09c66767272d41533f5ec7daf694f9ca09c66767272d415
……... ……...
综上,通过构建用户连通图及拆分用户连通图,支持多层复杂关联关系,而非固定的简单关联规则,使用者仅需关注物料表两两的关联,无需关注所有物料表之间是如何关联起来的,并且支持新增用户信息的运行,在用户连通图的基础上,结合历史物料表(历史数据),进一步构建实体唯一ID连通图,从而可以在新增量数据提供的新关联的帮助下,将历史数据里认为是两个独立用户实体融合在一起,适用范围不局限于很具体的场景和业务逻辑,而可以根据业务实际,整理渠道类型关系,适配不同场景,从用户信息融合,扩展到商品、场所的信息融合。In summary, by constructing and splitting user connectivity graphs, multi-layer complex association relationships are supported instead of fixed simple association rules. Users only need to pay attention to the association between two material tables, and do not need to pay attention to the relationship between all material tables. How to associate, and support the operation of new user information, based on the user connectivity graph, combined with the historical material table (historical data), to further construct the entity unique ID connectivity graph, so that the new association provided by the new incremental data With the help of the historical data, the two independent user entities are considered to be fused together. The scope of application is not limited to very specific scenarios and business logic, but can sort out channel type relationships and adapt to different scenarios according to the actual business. Information fusion extends to information fusion of commodities and places.
下面结合具体的实施场景对本实施例中所提方法进行进一步说明,以CDP系统为例,场景简化为:用户在会员系统领取优惠券,随后在京东商城使用优惠券 购物。The method proposed in this embodiment will be further described below in combination with specific implementation scenarios. Taking the CDP system as an example, the scenario is simplified as follows: the user receives coupons in the membership system, and then uses the coupons to shop in JD.com.
假设数据源只包含3个物料表:优惠券使用记录表、京东用户信息表及内部会员信息表。Assume that the data source only contains 3 material tables: coupon usage record table, JD user information table and internal member information table.
分析以上物料表的关联关系,以渠道类型和渠道类型字段表格表示,具体如表1所示。Analyze the association relationship of the above material table, and express it in the form of channel type and channel type field, as shown in Table 1.
表1Table 1
Figure PCTCN2022098808-appb-000001
Figure PCTCN2022098808-appb-000001
设优惠券使用记录表有表2所示的记录。Assume that the coupon use record table has the records shown in Table 2.
表2Table 2
手机号Phone number 优惠券号Coupon number 会员IDMember ID ……...
1334444555513344445555 T00003T00003 C001C001  the
1345555666613455556666 T00004T00004 C001C001  the
1334444555513344445555 T00005T00005 C001C001  the
1356666777713566667777 T00006T00006 C002C002  the
京东用户信息表有表3所示的记录。The Jingdong user information table has the records shown in Table 3.
表3table 3
京东用户IDJD User ID 手机号Phone number ……...
JD001JD001 1334444555513344445555  the
JD002JD002 1345555666613455556666  the
JD003 JD003 1356666777713566667777  the
内部会员信息表有表4所示的记录。The internal member information table has the records shown in Table 4.
表4Table 4
会员IDMember ID EmailEmail ……...
C001C001 c001@test.comc001@test.com  the
C002C002 c002@test.comc002@test.com  the
所有的渠道类型为:“手机号”、“会员ID”及“京东用户ID”。All channel types are: "Mobile Phone Number", "Member ID" and "JD User ID".
根据整理的渠道类型信息,构建用户信息图,并拆分为独立的用户连通图。根据物料表信息计算出用户信息图的顶点,具体如表5所示,在此,为方便示例说明,提高示例的识别度,用比较简单的数值提到实际的顶点属性hash取值。According to the collated channel type information, a user information graph is constructed and split into independent user connectivity graphs. The vertices of the user information graph are calculated according to the information in the material table, as shown in Table 5. Here, for the convenience of illustration and to improve the recognition of the examples, a relatively simple value is used to refer to the actual vertex attribute hash value.
表5table 5
顶点IDVertex ID 顶点属性vertex attributes
11111111 [手机,13344445555][mobile phone, 13344445555]
22222222 [手机,13455556666][mobile phone, 13455556666]
33333333 [手机_13566667777][Mobile phone_13566667777]
44444444 [会员ID_,C001][Member ID_, C001]
55555555 [会员ID_,C002][Member ID_, C002]
66666666 [京东用户ID,JD001][JD user ID, JD001]
77777777 [京东用户ID,JD002][JD user ID, JD002]
88888888 [京东用户ID,JD003][JD user ID, JD003]
使用连通分量算法计算得到每个顶点所在连通图的最小顶点ID,计算结果如表6所示:Use the connected component algorithm to calculate the minimum vertex ID of the connected graph where each vertex is located. The calculation results are shown in Table 6:
表6Table 6
顶点IDVertex ID 所在连通图的最小顶点IDThe minimum vertex ID of the connected graph 顶点属性vertex attributes
11111111 11111111 [手机,13344445555][mobile phone, 13344445555]
22222222 11111111 [手机,13455556666][mobile phone, 13455556666]
33333333 33333333 [手机,13566667777][mobile phone, 13566667777]
44444444 11111111 [会员ID,C001][Member ID, C001]
55555555 33333333 [会员ID,C002][Member ID, C002]
66666666 11111111 [京东用户ID,JD001][JD user ID, JD001]
77777777 11111111 [京东用户ID,JD002][JD user ID, JD002]
88888888 33333333 [京东用户ID,JD003][JD user ID, JD003]
根据以上结果,得出用户连通图如图2所示,由图2可知,根据“所在连通 图的最小顶点ID”列,整个用户信息图拆分成了两个独立的用户连通图,分别对应两个自然用户,包括了用户连通图1和用户连通图2。According to the above results, the user connectivity graph is shown in Figure 2. It can be seen from Figure 2 that, according to the "minimum vertex ID of the connected graph" column, the entire user information graph is split into two independent user connectivity graphs, corresponding to Two natural users, including user connectivity graph 1 and user connectivity graph 2.
设本次计算非首次计算,历史物料表中的历史数据已执行过计算,产生了渠道类型关联表,历史数据为优惠券使用记录表有表7所示的记录:Assuming that this calculation is not the first calculation, the historical data in the historical material table has been calculated, and the channel type association table is generated. The historical data is the coupon use record table with the records shown in Table 7:
表7Table 7
手机号Phone number 优惠券号Coupon number 会员IDMember ID ……...
1356666777713566667777 T00001T00001 C001C001  the
1334444555513344445555 T00002T00002 C002C002  the
上一次执行计算结果为图3所示,得到表7所示的记录分别对应的两个自然用的唯一ID,于是“手机号”渠道类型关联表的内容如表8所示。The calculation result of the previous execution is shown in Figure 3, and two unique IDs for natural use corresponding to the records shown in Table 7 are obtained, so the contents of the "mobile phone number" channel type association table are shown in Table 8.
表8Table 8
手机号Phone number 实体唯一ID Entity Unique ID
1356666777713566667777 5109b4d4e0e7411c992da7d5ea1125385109b4d4e0e7411c992da7d5ea112538
1334444555513344445555 73c823904daa4c20a4c2035132d0e7c273c823904daa4c20a4c2035132d0e7c2
“会员ID”渠道类型关联表的内容如表9所示。The content of the channel type association table of "member ID" is shown in Table 9.
表9Table 9
会员IDMember ID 实体唯一IDEntity Unique ID
C001C001 5109b4d4e0e7411c992da7d5ea1125385109b4d4e0e7411c992da7d5ea112538
C002C002 73c823904daa4c20a4c2035132d0e7c273c823904daa4c20a4c2035132d0e7c2
此时将每个用户连通图的所有顶点属性在对应渠道类型关联表查询到实体唯一ID并记录为用户连通图的属性如图4所示,根据实体唯一ID构造实体唯一ID连通图,对应用户连通图产生了连接,得到新的用户连通图如图5所示。At this time, all the vertex attributes of each user's connected graph are queried in the corresponding channel type association table to find the entity's unique ID and recorded as the attribute of the user's connected graph, as shown in Figure 4. According to the entity's unique ID, the entity's unique ID connected graph is constructed, and the corresponding user The connectivity graph generates connections, and a new user connectivity graph is obtained, as shown in Figure 5.
经过关联历史物料表中历史数据后,由于新的两个用户连通图的某些顶点在历史数据中属于同一个用户连通图,所以两者可以关联在一起,如图5所示,目前所有数据构成了同一个自然用户,其拥有的实体唯一ID去重后,包含两个:After associating the historical data in the historical material table, since some vertices of the two new user connectivity graphs belong to the same user connectivity graph in the historical data, the two can be associated together, as shown in Figure 5, all current data It constitutes the same natural user, and the entity unique ID owned by it contains two after deduplication:
5109b4d4e0e7411c992da7d5ea1125385109b4d4e0e7411c992da7d5ea112538
73c823904daa4c20a4c2035132d0e7c273c823904daa4c20a4c2035132d0e7c2
则认为,在本次计算中,由于新的用户优惠券使用记录,旧的用户信息之间两个独立用户发生了关联,是同一个用户,发生了用户信息融合。It is considered that in this calculation, due to the new user coupon use record, the old user information is related to two independent users, and they are the same user, and user information fusion has occurred.
此时仅保留一个唯一用户ID,如保留:5109b4d4e0e7411c992da7d5ea112538,如图6所示。Only one unique user ID is reserved at this time, such as reserved: 5109b4d4e0e7411c992da7d5ea112538, as shown in Figure 6.
最后更新渠道类型关联表,方便后续计算。更新后的渠道类型关联表包括表10所示的“手机号”渠道类型关联表、表11所示“会员ID”渠道类型关联表及表12所示的“京东用户ID”渠道类型关联表,Finally, update the channel type association table to facilitate subsequent calculations. The updated channel type association table includes the "mobile phone number" channel type association table shown in Table 10, the "member ID" channel type association table shown in Table 11, and the "JD user ID" channel type association table shown in Table 12.
此后对CDP数据的使用都基于实体唯一ID进行。如计算用户画像,针对实体唯一ID进行计算,得到是各个实体唯一ID的用户(实体)画像;后续如果要在会员系统做营销活动,可以根据用户画像筛选人群,得到的是该用户画像人群对应的一系列实体唯一ID,然后可以根据京东用户ID渠道的关联表,查到对应的京东用户ID,从而进行营销活动推送等触达操作。Thereafter, the use of CDP data is based on the unique ID of the entity. For example, to calculate the user portrait, calculate the unique ID of the entity, and get the user (entity) portrait that is the unique ID of each entity; if you want to do marketing activities in the membership system, you can filter the crowd according to the user portrait, and get the corresponding user portrait group A series of entity unique IDs, and then according to the JD user ID channel association table, the corresponding JD user ID can be found, so as to perform marketing campaign push and other touch operations.
实施例2Example 2
本发明提出一种终端,包括处理器、存储器及存储在存储器上的计算机程序,所述处理器执行存储在存储器上的计算机程序,以实现实施例1所述的多层关联下的用户信息融合方法的步骤,其中,存储器可以是磁盘、闪存或其它任何非易失性存储介质,处理器与存储器连接,可以作为一个或多个集成电路来实施,具体的可以为微处理器或微控制器,在执行存储在存储器上的计算机程序时,实现多层关联下的用户信息融合。The present invention proposes a terminal, including a processor, a memory, and a computer program stored on the memory, and the processor executes the computer program stored on the memory to realize the user information fusion under the multi-layer association described in Embodiment 1 The steps of the method, wherein the memory can be a magnetic disk, flash memory or any other non-volatile storage medium, and the processor is connected to the memory, and can be implemented as one or more integrated circuits, specifically, it can be a microprocessor or a microcontroller , when executing the computer program stored on the memory, the fusion of user information under multi-layer association is realized.
本发明提出一种计算机存储介质,所述计算机可读存储介质上存储有计算机程序指令,该指令被处理器执行时,实现实施例1所述的多层关联下的用户信息融合方法的步骤。The present invention proposes a computer storage medium. The computer readable storage medium stores computer program instructions. When the instructions are executed by a processor, the steps of the user information fusion method under multi-layer association described in Embodiment 1 are realized.
实施例2Example 2
参见图7,本发明还提出一种多层关联下的用户信息融合系统,所述系统用于实现实施例1所述的多层关联下的用户信息融合方法,包括:Referring to Fig. 7, the present invention also proposes a user information fusion system under multi-layer association, the system is used to implement the user information fusion method under multi-layer association described in Embodiment 1, including:
渠道类型信息获取模块,选定待整合用户信息的数据源,每一个数据源对应一个物料表,确定物料表中标识关联关系的渠道类型与渠道类型字段,得到渠道类型信息;The channel type information acquisition module selects the data source of the user information to be integrated, each data source corresponds to a material table, determines the channel type and the channel type field identifying the association relationship in the material table, and obtains the channel type information;
用户连通图构建模块,根据渠道类型信息,确定顶点和边,基于顶点和边,构建用户信息图,然后将用户信息图拆分为独立的用户连通图;The user connectivity graph construction module determines the vertices and edges based on the channel type information, constructs the user information graph based on the vertices and edges, and then splits the user information graph into independent user connectivity graphs;
历史数据关联模块,用于收集历史物料表,确定历史物料表中各个渠道类型到用户实体唯一ID之间的映射,形成初始的渠道类型关联表,进一步查找用户 连通图的每个顶点对应渠道类型关联表中的用户实体唯一ID;The historical data association module is used to collect the historical material table, determine the mapping between each channel type in the historical material table and the unique ID of the user entity, form an initial channel type association table, and further search for the channel type corresponding to each vertex of the user connectivity graph The unique ID of the user entity in the association table;
判断模块,用于判断每个用户实体唯一ID是否仅出现在一个用户连通图中,若是,对每个用户连通图按需融合用户信息;否则,构造实体唯一ID连通图;Judging module, used to judge whether the unique ID of each user entity only appears in one user connectivity graph, if so, fuse user information on demand for each user connectivity graph; otherwise, construct the entity unique ID connectivity graph;
用户连通图关联模块,确定实体唯一ID连通图中连在一起的实体唯一ID,将连在一起的实体唯一ID对应的用户连通图关联;The user connected graph association module determines the entity unique IDs connected together in the entity unique ID connected graph, and associates the user connected graphs corresponding to the connected entity unique IDs;
去重处理模块,读取用户连通图的所有顶点对应的实体唯一ID,对实体唯一ID进行去重处理;The deduplication processing module reads the entity unique IDs corresponding to all vertices of the user connectivity graph, and performs deduplication processing on the entity unique IDs;
更新模块,利用去重处理后的实体唯一ID更新渠道类型关联表。The update module uses the unique ID of the entity after deduplication to update the channel type association table.
显然,本发明的上述实施例仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.

Claims (10)

  1. 一种多层关联下的用户信息融合方法,其特征在于,所述方法至少包括:A user information fusion method under multi-layer association, characterized in that the method at least includes:
    S1.选定待整合用户信息的数据源,每一个数据源对应一个物料表,确定物料表中能标识用户的字段及其对应渠道类型,得到渠道类型信息;S1. Select the data source of the user information to be integrated, each data source corresponds to a material table, determine the field in the material table that can identify the user and its corresponding channel type, and obtain the channel type information;
    S2.根据渠道类型信息,确定顶点和边,基于顶点和边,构建用户信息图,然后将用户信息图拆分为独立的用户连通图;S2. Determine the vertices and edges according to the channel type information, build a user information graph based on the vertices and edges, and then split the user information graph into independent user connectivity graphs;
    S3.利用渠道类型关联表,查询用户连通图的每个顶点对应实体唯一ID的历史数据,得到用户连通图对应的所有实体唯一ID,进一步以实体唯一ID作为顶点,以用户连通图里的多个实体唯一ID的关系作为边,构造实体唯一ID连通图,从而利用历史数据关联用户连通图;S3. Use the channel type association table to query the historical data of the entity unique ID corresponding to each vertex of the user connectivity graph, and obtain all entity unique IDs corresponding to the user connectivity graph. The relationship of the unique ID of each entity is used as an edge to construct a connected graph of the unique ID of the entity, so as to use the historical data to associate the connected graph of the user;
    S4.判断每个用户实体唯一ID是否仅出现在一个用户连通图中,若是,对每个用户连通图按需融合用户信息;否则,构造实体唯一ID连通图,执行步骤S5;S4. Determine whether the unique ID of each user entity only appears in one user connectivity graph, if so, fuse user information on demand for each user connectivity graph; otherwise, construct the entity unique ID connectivity graph, and execute step S5;
    S5.确定实体唯一ID连通图中连在一起的实体唯一ID,将连在一起的实体唯一ID对应的用户连通图关联;S5. Determine the entity unique IDs connected together in the entity unique ID connected graph, and associate the user connected graphs corresponding to the connected entity unique IDs;
    S6.读取用户连通图的所有顶点对应的实体唯一ID,对实体唯一ID进行去重处理;S6. Read the entity unique IDs corresponding to all vertices of the user connectivity graph, and deduplicate the entity unique IDs;
    S7.利用去重处理后的实体唯一ID更新渠道类型关联表。S7. Utilize the unique ID of the entity after deduplication processing to update the channel type association table.
  2. 根据权利要求1所述的多层关联下的用户信息融合方法,其特征在于,在企业的客户数据平台CDP中,用户信息遍布企业的多个数据源,每一个数据源对应一个物料表,所述的物料表包括:企业会员信息表、商城用户信息表、优惠券使用记录表;物料表中渠道类型包括:用户手机号、会员ID、Email、微信unionID及微信openID,渠道类型在不同物料表中取值一样的记录对应同一个自然用户;渠道类型字段为每一种渠道类型与每一种渠道类型对应的物料表的组合。The user information fusion method under multi-layer association according to claim 1 is characterized in that, in the customer data platform CDP of the enterprise, user information spreads over a plurality of data sources of the enterprise, and each data source corresponds to a material list, so The material table described includes: enterprise membership information table, mall user information table, coupon use record table; the channel types in the material table include: user mobile phone number, member ID, Email, WeChat unionID and WeChat openID, and the channel types are in different material tables Records with the same value correspond to the same natural user; the channel type field is a combination of each channel type and the material table corresponding to each channel type.
  3. 根据权利要求2所述的多层关联下的用户信息融合方法,其特征在于,步骤S2中,以“渠道类型+渠道类型字段取值”作为顶点属性,对顶点属性进行hash取值,得到顶点ID,顶点ID记录了一个自然用户在某个渠道类型上的属性;The user information fusion method under multi-layer association according to claim 2, characterized in that in step S2, "channel type + channel type field value" is used as the vertex attribute, and the vertex attribute is hashed to obtain the vertex ID, the vertex ID records the attributes of a natural user on a certain channel type;
    以物料表中某一条记录中出现的多个“渠道类型字段取值”作为边,所述的边为顶点之间的连线,记录了一个自然用户在不同渠道类型上的关联;Multiple "channel type field values" appearing in a certain record in the material table are used as edges, and the edges are connections between vertices, recording the association of a natural user on different channel types;
    连接顶点和边,构建用户信息图;然后利用连通分量算法将用户信息图拆分, 得出每个顶点所在用户信息图的最小顶点ID,对最小顶点ID进行分组聚合,得到每个独立的用户连通图的所有顶点,也即得到了每个独立的用户连通图。Connect vertices and edges to build a user information graph; then use the connected component algorithm to split the user information graph to obtain the minimum vertex ID of the user information graph where each vertex is located, group and aggregate the minimum vertex ID to obtain each independent user All vertices of the connected graph, that is, each independent user connected graph is obtained.
  4. 根据权利要求3所述的多层关联下的用户信息融合方法,其特征在于,在步骤S4中,若每个用户实体唯一ID仅出现在一个用户连通图中,则新增用户信息产生的多个自然用户在历史物料表里也是独立的自然用户,对每个用户连通图按需融合用户信息。The user information fusion method under multi-layer association according to claim 3, characterized in that, in step S4, if the unique ID of each user entity only appears in one user connection graph, then the multiple generated by the newly added user information A natural user is also an independent natural user in the historical material table, and user information is fused on-demand for each user connectivity graph.
  5. 根据权利要求4所述的多层关联下的用户信息融合方法,其特征在于,所述的实体唯一ID连通图是以每一个实体唯一ID作为顶点,以同一个用户连通图里出现多个实体唯一ID的关系作为边构造的。The user information fusion method under multi-layer association according to claim 4, wherein the entity unique ID connectivity graph uses the unique ID of each entity as a vertex, and multiple entities appear in the same user connectivity graph Unique ID relationships are constructed as edges.
  6. 根据权利要求5所述的多层关联下的用户信息融合方法,其特征在于,步骤S5所述确定实体唯一ID连通图中连在一起的实体唯一ID的方法为连通分量算法;将连在一起的实体唯一ID对应的用户连通图关联后,每个用户连通图对应一个自然用户。The user information fusion method under multi-layer association according to claim 5, wherein the method for determining the entity unique IDs connected together in the entity unique ID connected graph in step S5 is a connected component algorithm; After the user connectivity graph corresponding to the unique ID of the entity is associated, each user connectivity graph corresponds to a natural user.
  7. 根据权利要求6所述的多层关联下的用户信息融合方法,其特征在于,步骤S6所述的对实体唯一ID进行去重处理时,包括:The user information fusion method under multi-layer association according to claim 6, characterized in that, when performing deduplication processing on the unique ID of the entity described in step S6, it includes:
    a.用户连通图没有实体唯一ID:当前自然用户是新用户,基于用户连通图生成一个实体唯一ID,并使用UUID以保证其唯一性;a. The user connectivity graph does not have an entity unique ID: the current natural user is a new user, an entity unique ID is generated based on the user connectivity graph, and UUID is used to ensure its uniqueness;
    b.用户连通图仅一个实体唯一ID:当前自然用户的新旧用户信息数据都属于同一个自然用户,则沿用旧实体唯一ID;b. The user connectivity graph only has one entity unique ID: the old and new user information data of the current natural user belong to the same natural user, and the old entity unique ID will be used;
    c.用户连通图存在多个实体唯一ID:新增用户信息产生了用户信息融合,仅保留其中任意一个实体唯一ID。c. There are multiple entity unique IDs in the user connectivity graph: Adding user information generates user information fusion, and only retains any one of the entity unique IDs.
  8. 一种终端,其特征在于,包括处理器、存储器及存储在存储器上的计算机程序,所述处理器执行存储在存储器上的计算机程序,以实现权利要求1~7任意一项所述的多层关联下的用户信息融合方法的步骤。A terminal, characterized in that it includes a processor, a memory, and a computer program stored in the memory, and the processor executes the computer program stored in the memory to realize the multi-layered system described in any one of claims 1-7. The steps of the user information fusion method under association.
  9. 一种计算机存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序指令,该指令被处理器执行时,实现权利要求1~7任意一项所述的多层关联下的用户信息融合方法的步骤。A computer storage medium, characterized in that computer program instructions are stored on the computer-readable storage medium, and when the instructions are executed by a processor, the user under the multi-layer association described in any one of claims 1 to 7 is realized. Steps of the information fusion method.
  10. 一种多层关联下的用户信息融合系统,所述系统用于实现权利要求1所述的多层关联下的用户信息融合方法,其特征在于,包括:A user information fusion system under multi-layer association, the system is used to realize the user information fusion method under multi-layer association according to claim 1, characterized in that it includes:
    渠道类型信息获取模块,选定待整合用户信息的数据源,每一个数据源对应 一个物料表,确定物料表中能标识用户的字段及其对应渠道类型,得到渠道类型信息;The channel type information acquisition module selects the data source of the user information to be integrated, each data source corresponds to a material table, determines the field that can identify the user in the material table and its corresponding channel type, and obtains the channel type information;
    用户连通图构建模块,根据渠道类型信息,确定顶点和边,基于顶点和边,构建用户信息图,然后将用户信息图拆分为独立的用户连通图;The user connectivity graph construction module determines the vertices and edges based on the channel type information, constructs the user information graph based on the vertices and edges, and then splits the user information graph into independent user connectivity graphs;
    历史数据关联模块,利用渠道类型关联表,查询用户连通图的每个顶点对应实体唯一ID的历史数据,得到用户连通图对应的所有实体唯一ID,进一步以实体唯一ID作为顶点,以用户连通图里的多个实体唯一ID的关系作为边,构造实体唯一ID连通图,从而利用历史数据关联用户连通图;The historical data association module uses the channel type association table to query the historical data of the unique ID of the entity corresponding to each vertex of the user connectivity graph, and obtains the unique IDs of all entities corresponding to the user connectivity graph. The relationship between the unique IDs of multiple entities in the network is used as an edge to construct a connected graph of entity unique IDs, so as to use historical data to associate user connected graphs;
    判断模块,用于判断每个用户实体唯一ID是否仅出现在一个用户连通图中,若是,对每个用户连通图按需融合用户信息;否则,构造实体唯一ID连通图;Judging module, used to judge whether the unique ID of each user entity only appears in one user connectivity graph, if so, fuse user information on demand for each user connectivity graph; otherwise, construct the entity unique ID connectivity graph;
    用户连通图关联模块,确定实体唯一ID连通图中连在一起的实体唯一ID,将连在一起的实体唯一ID对应的用户连通图关联;The user connected graph association module determines the entity unique IDs connected together in the entity unique ID connected graph, and associates the user connected graphs corresponding to the connected entity unique IDs;
    去重处理模块,读取用户连通图的所有顶点对应的实体唯一ID,对实体唯一ID进行去重处理;The deduplication processing module reads the entity unique IDs corresponding to all vertices of the user connectivity graph, and performs deduplication processing on the entity unique IDs;
    更新模块,利用去重处理后的实体唯一ID更新渠道类型关联表。The update module uses the unique ID of the entity after deduplication to update the channel type association table.
PCT/CN2022/098808 2021-10-19 2022-06-15 User information fusion method and system under multilayer association, and terminal and storage medium WO2023065691A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111216588.0 2021-10-19
CN202111216588.0A CN114064705A (en) 2021-10-19 2021-10-19 User information fusion method, terminal, storage medium and system under multilayer association

Publications (1)

Publication Number Publication Date
WO2023065691A1 true WO2023065691A1 (en) 2023-04-27

Family

ID=80234917

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098808 WO2023065691A1 (en) 2021-10-19 2022-06-15 User information fusion method and system under multilayer association, and terminal and storage medium

Country Status (2)

Country Link
CN (1) CN114064705A (en)
WO (1) WO2023065691A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501726A (en) * 2023-06-20 2023-07-28 中国人寿保险股份有限公司上海数据中心 Information creation cloud platform data operation system based on GraphX graph calculation
CN117591705A (en) * 2024-01-19 2024-02-23 北京志翔科技股份有限公司 Sub-table association method and device based on graph search

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064705A (en) * 2021-10-19 2022-02-18 广州数说故事信息科技有限公司 User information fusion method, terminal, storage medium and system under multilayer association
CN114676288A (en) * 2022-03-17 2022-06-28 北京悠易网际科技发展有限公司 ID pull-through method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1843258A1 (en) * 2006-04-06 2007-10-10 Microsoft Corporation Modeling data from disparate data sources
CN107193894A (en) * 2017-05-05 2017-09-22 北京小度信息科技有限公司 Data processing method, individual discrimination method and relevant apparatus
CN107577787A (en) * 2017-09-15 2018-01-12 广东万丈金数信息技术股份有限公司 The method and system of associated data information storage
CN108322473A (en) * 2018-02-12 2018-07-24 北京京东金融科技控股有限公司 User behavior analysis method and apparatus
US20190236597A1 (en) * 2018-01-26 2019-08-01 Walmart Apollo, Llc Systems and methods for associating a user's shopping experiences across multiple channels
CN110543586A (en) * 2019-09-04 2019-12-06 北京百度网讯科技有限公司 Multi-user identity fusion method, device, equipment and storage medium
CN114064705A (en) * 2021-10-19 2022-02-18 广州数说故事信息科技有限公司 User information fusion method, terminal, storage medium and system under multilayer association

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1843258A1 (en) * 2006-04-06 2007-10-10 Microsoft Corporation Modeling data from disparate data sources
CN107193894A (en) * 2017-05-05 2017-09-22 北京小度信息科技有限公司 Data processing method, individual discrimination method and relevant apparatus
CN107577787A (en) * 2017-09-15 2018-01-12 广东万丈金数信息技术股份有限公司 The method and system of associated data information storage
US20190236597A1 (en) * 2018-01-26 2019-08-01 Walmart Apollo, Llc Systems and methods for associating a user's shopping experiences across multiple channels
CN108322473A (en) * 2018-02-12 2018-07-24 北京京东金融科技控股有限公司 User behavior analysis method and apparatus
CN110543586A (en) * 2019-09-04 2019-12-06 北京百度网讯科技有限公司 Multi-user identity fusion method, device, equipment and storage medium
CN114064705A (en) * 2021-10-19 2022-02-18 广州数说故事信息科技有限公司 User information fusion method, terminal, storage medium and system under multilayer association

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501726A (en) * 2023-06-20 2023-07-28 中国人寿保险股份有限公司上海数据中心 Information creation cloud platform data operation system based on GraphX graph calculation
CN116501726B (en) * 2023-06-20 2023-09-29 中国人寿保险股份有限公司上海数据中心 Information creation cloud platform data operation system based on GraphX graph calculation
CN117591705A (en) * 2024-01-19 2024-02-23 北京志翔科技股份有限公司 Sub-table association method and device based on graph search

Also Published As

Publication number Publication date
CN114064705A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2023065691A1 (en) User information fusion method and system under multilayer association, and terminal and storage medium
WO2019165868A1 (en) Marketing plan determining method and device, and electronic apparatus
JP5598017B2 (en) Judgment program, method and apparatus
WO2016004813A1 (en) Data storage method, query method and device
US9507875B2 (en) Symbolic hyper-graph database
CN109145121B (en) Rapid storage query method for time-varying graph data
CN107220266B (en) Method and device for creating service database, storing service data and determining service data
CN111459985A (en) Identification information processing method and device
CN104394118A (en) User identity identification method and system
TWI621989B (en) Graph-based method and system for analyzing users
WO2020258695A1 (en) System and method for establishing public material library
CN104408171A (en) Receipt sub-table row-correlated query device and method
WO2016191995A1 (en) Method and device for partitioning association table in distributed database
US11669301B2 (en) Effectively fusing database tables
US8799177B1 (en) Method and apparatus for building small business graph from electronic business data
CN106933919B (en) Data table connection method and device
JP2004030221A (en) Method for automatically detecting table to be modified
CN102193983A (en) Relation path-based node data filtering method of graphic database
US20220229814A1 (en) Maintaining stable record identifiers in the presence of updated data records
WO2016112502A1 (en) Method, apparatus and computing device for storing query result
CN106897198B (en) Log data processing method and device
Cho et al. Mining association rules using RFM scoring method for personalized u-commerce recommendation system in emerging data
Yang et al. Discovery of online shopping patterns across websites
WO2016119276A1 (en) Large-scale object recognition method based on hadoop frame
CN111414406A (en) Method and system for identifying same user in different channel transactions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882314

Country of ref document: EP

Kind code of ref document: A1