CN112307297A

CN112307297A - User identification unification method and system based on priority rule

Info

Publication number: CN112307297A
Application number: CN202011321712.5A
Authority: CN
Inventors: 秦秀磊; 李丹丹
Original assignee: Sunshine Insurance Group Co Ltd
Current assignee: Sunshine Insurance Group Co Ltd
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2021-02-02
Anticipated expiration: 2040-11-23
Also published as: CN112307297B

Abstract

The application provides a method and a system for unifying user identifications based on priority rules, wherein the method comprises the following steps: acquiring a first user information table of each platform in a current generation cycle, determining the attribute priority order of each platform based on a priority rule, and merging and de-duplicating the non-empty attribute with the highest priority to generate a second user information table of each platform; merging all the second user information tables by using the non-null attribute with the highest priority to generate a third user information table of the platform set in a de-duplication manner, comparing the third user information table with the third user information table in the previous generation period, screening out information change users and putting the information change users into a temporary change table; and constructing a funnel-shaped processing chain according to the attribute priority sequence, distinguishing information updating users and newly added users in the temporary change list, unifying the information updating users and the newly added users with the user identification in the previous generation period, and generating a unified identification list of the current generation period. The method and the device improve the expansibility of the user identification unified method, and have better identification accuracy and identification stability.

Description

User identification unification method and system based on priority rule

Technical Field

The application relates to the field of big data analysis and processing, in particular to a user identification unifying method and system based on a priority rule.

Background

The uniform identifier generation mechanism facing the internet users has important value on digital marketing activities such as customer insights, intelligent recommendation, marketing effect evaluation and the like. The user information from multiple platforms and channels is integrated through a systematic method, the user accounts are connected in series, and all contacts of the user equipment and the enterprise system are connected in series, so that a uniform user view is established, and the digital marketing can be more accurate.

In the prior art, one or more fixed attributes are generally adopted as a reference to unify user identifications, such as the certificate numbers or mobile phone numbers of users, and user information in internet scenes has the characteristics of diversity, variability and the like, and the quality of user data is uneven, so that the method adopting the fixed attributes as the reference faces certain limitations, has the defects of low accuracy, incapability of associating user behavior data and the like, and is difficult to meet the requirements of various marketing scenes.

Disclosure of Invention

In view of the above, an object of the present application is to provide a method and a system for unifying user identities based on priority rules, so as to solve the deficiencies of the prior art.

In a first aspect, an embodiment of the present application provides a method for unifying user identities based on a priority rule, where the method includes:

acquiring data of buried points and service data of each platform in a current generation period, and generating a first user information table corresponding to each platform according to the data of the buried points and the service data;

determining an attribute priority order of the first user information table of each platform based on a priority rule, and merging and de-duplicating the first user information table by using a non-empty attribute with the highest priority in the attribute priority order to generate a second user information table corresponding to each platform;

merging and de-duplicating the second user information tables of all platforms based on the non-empty attribute with the highest priority in each second user information table to generate a third user information table aiming at a platform set, comparing each attribute data in the third user information table with the third user information table of the previous generation cycle, screening out information change users in the current generation cycle, and putting the information change users into a temporary change table;

determining an attribute priority order of a unified identification list in a previous generation cycle based on the priority rule, constructing a funnel-shaped processing chain according to the attribute priority order of the unified identification list in the previous generation cycle, comparing the temporary change list with the unified identification list in the previous generation cycle through the funnel-shaped processing chain to determine information updating users and new users in the information changing users, unifying user identifications corresponding to each information changing user in the current generation cycle with user identifications corresponding to each information changing user in the previous generation cycle based on data corresponding to the information updating users and data corresponding to the new users, and generating the unified identification list of the current generation cycle.

Optionally, the determining, based on a priority rule, an attribute priority order of the first user information table of each platform, and combining and de-duplicating the first user information table by using a non-empty attribute with a highest priority in the attribute priority order to generate a second user information table corresponding to each platform includes:

based on the priority rule, calculating the priority order of the attributes in the first user information table according to the attribute value characteristics of each attribute in the first user information table;

determining the head attributes of each record according to the attribute priority order; the head seat attribute is a non-empty attribute with the highest priority;

and based on the head seat attribute, combining and removing the duplicate of the first user information table of each platform to generate a second user information table corresponding to each platform.

Optionally, the priority order of each attribute in the first user information table is determined by the following steps:

determining the priority order of the attributes according to the attribute value characteristics of each attribute in the first user information table by using the following formula;

wherein, c_mFor the mth attribute in the first user information table, K is the removal c_mAnd the number of attributes remaining outside the prioritized attributes, P (c)_m|c_i) Function representation c_mAttribute c in the case of non-null_iData volume that is also not empty, H (c)_m|c_i) Function representation c_mNeutralization attribute c_iThere is one-to-many data volume, α and β represent weights, F (c)_m) Is calculated as c_mA priority value.

Optionally, the merging and deduplication are performed on the second user information tables of all the platforms based on the non-empty attribute with the highest priority in each second user information table, a third user information table for a platform set is generated, each attribute data in the third user information table is compared with the third user information table in the previous generation cycle, information change users in the current generation cycle are screened out, and the information change users are put into a temporary change table, including:

combining and de-duplicating all the second user information tables based on the non-empty attribute with the highest priority to obtain a third user information table corresponding to a platform set in the current generation period;

and comparing each attribute data in the third user information table with each attribute data in the third user information table in the previous generation period, screening out information change users in the generation period according to the comparison result, and putting the information change users into the temporary change table.

Optionally, the comparing the attribute data in the third user information table with the attribute data in the third user information table in the previous generation cycle, screening the information change users in the generation cycle according to the comparison result, and placing the information change users in the temporary change table, includes:

connecting according to the attribute data corresponding to each user in the third user information table according to a fixed sequence, and taking an MD5 value to generate first fingerprint information corresponding to each user;

connecting according to the attribute data corresponding to each user in the third user information table of the previous generation period in a fixed sequence, and taking an MD5 value to generate second fingerprint information corresponding to each user;

and screening out information change users in the current generation period based on the comparison result of the first fingerprint information and the second fingerprint information, and putting the information change users into the temporary change table.

Optionally, the determining, based on the priority rule, an attribute priority order of a unified identifier table in a previous generation cycle, constructing a funnel-shaped processing chain according to the attribute priority order of the unified identifier table in the previous generation cycle, comparing, by the funnel-shaped processing chain, the temporary change table with the unified identifier table in the previous generation cycle, determining an information update user and a new user in the information change users, unifying, based on data corresponding to the information update user and data corresponding to the new user, a user identifier corresponding to each information change user in a current generation cycle with a user identifier corresponding to each information change user in the previous generation cycle, and generating the unified identifier table in the current generation cycle includes:

determining the attribute priority order of the unified identification table of the previous generation period, and constructing the funnel-shaped processing chain based on the attribute priority order;

comparing each attribute value in the temporary change table with the unified identification table of the previous generation cycle in sequence according to the funnel-shaped processing chain, wherein if the funnel-shaped processing chain is associated with the attribute value in the temporary change table, the user corresponding to the attribute value is the information updating user, and the newly added user is not associated after all the comparison is finished;

and respectively unifying the identification of the information updating user and the identification of the newly added user with the identification of the user in the previous generation period, generating a new unified user identification for the newly added user, and generating a unified identification table of the current generation period by using the existing identification of the information updating user.

Optionally, the method further includes:

according to the funnel-shaped processing chain, comparing the temporary change table with the unified identification table of the previous generation period to determine the information updating user and the newly added user;

and generating a new user identifier for the new user, and adding the new user identifier into the unified identifier table of the current generation period.

Optionally, the priority order of the attributes of the unified identity list of the previous generation cycle is determined by the following steps:

calculating the priority of the attribute according to the attribute data characteristics in the unified identification table of the previous generation period by using the following formula;

D(c_m)＝α·W(c_m)-β·R(c_m)；

wherein, c_mFor the m-th attribute in the unified identification table of the previous generation cycle, W (c)_m) Function representation attribute c_mMedium to non-empty data volume ratio, R (c)_m) Function representation c_mWhere there is a ratio of merged records, α and β represent weights, D (c)_m) Is the calculated priority value of the mth attribute.

Optionally, the attribute includes any one or more of the following information:

the method comprises the steps of authentication center account attribute, social account attribute, APP account attribute, equipment identification attribute, mobile phone number attribute, identity document attribute, mailbox attribute, license plate attribute and the like.

In a second aspect, the present application provides a system for unifying user identities based on priority rules, including:

the data acquisition system is used for acquiring data of buried points and service data of each platform in a current generation period and generating a first user information table corresponding to each platform according to the data of the buried points and the service data;

the in-platform de-duplication module is used for determining the attribute priority order of the first user information table of each platform based on a priority rule, and combining and de-duplicating the first user information table by using a non-empty attribute with the highest priority in the attribute priority order to generate a second user information table corresponding to each platform;

the new and changed user screening module is used for merging and removing duplication of the second user information tables of all platforms based on the non-empty attribute with the highest priority in each second user information table to generate a third user information table aiming at a platform set, comparing each attribute data in the third user information table with the third user information table in the previous generation period, screening out information changed users in the current generation period, and putting the information changed users into a temporary change table;

the user identification unifying module is used for determining the attribute priority order of the unified identification list in the last generation cycle based on the priority rule, constructing a funnel-shaped processing chain according to the attribute priority order of the unified identification list in the last generation cycle, comparing the temporary change list with the unified identification list in the last generation cycle through the funnel-shaped processing chain, determining an information updating user and a new user in the information changing users, unifying the user identification corresponding to each information changing user in the current generation cycle with the user identification corresponding to each information changing user in the last generation cycle based on the data corresponding to the information updating user and the data corresponding to the new user, and generating the unified identification list in the current generation cycle.

The method for unifying user identifications based on the priority rules, provided by the embodiment of the application, comprehensively considers the data characteristics of each user attribute, dynamically positions the head attribute of each record, combines and deduplicates the first user information table by taking the head attribute as a reference, and generates the second user information table. By comparing the third user information table with the third user information table in the last generation period, most of users with unchanged information can be filtered out, a small number of newly added users and information change users are screened out, the subsequent comparison overhead is reduced to the maximum extent, and the expansibility of the method is improved, especially the expansibility in a massive user scene. And then, a funnel-shaped processing chain is constructed based on the attribute priority order, each attribute of the data change user is compared with the unified identification table in the previous generation period in sequence, and the new user and the information update user are determined, so that the new user and the information update user are unified with the user identification in the previous generation period on the basis, and the method has better identification accuracy and identification stability.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic flowchart of a method for unifying user identities based on priority rules according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another method for unifying user identities based on priority rules according to an embodiment of the present application;

fig. 3 is a schematic flowchart of another method for unifying user identities based on priority rules according to an embodiment of the present application;

fig. 4 is a flowchart illustrating another method for unifying user identities based on priority rules according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a system for unifying user identities based on priority rules according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides a method for unifying user identifications based on priority rules, as shown in fig. 1, the method comprises the following steps:

s101, acquiring buried point data and service data of each platform in the current generation period, and generating a first user information table corresponding to each platform according to the buried point data and the service data.

In the step S101, the buried point data refers to user behavior data collected by a front-end buried point system of each platform; the service data refers to transaction data related to a user recorded by each platform. The first user information table is a user information table generated by one platform in the current generation period, and comprises attributes and attribute records. The attribute record refers to a record corresponding to each user and containing all attributes and attribute data corresponding to all attributes. The attributes include any one or more of the following: the method comprises the steps of authentication center account attribute, social account attribute, APP account attribute, equipment identification attribute, mobile phone number attribute, identity document attribute, mailbox attribute, license plate attribute and the like. The attribute records include any one or more of the following data: the system comprises an authentication center account identification, a social account identification, an APP account identification, a device identification, a mobile phone number identification, an identity document identification, a mailbox identification, a license plate identification and the like. Attribute data corresponding to the account number attribute of the authentication center is an identifier corresponding to each user returned by the authentication center; attribute data corresponding to the social account attribute is an identifier corresponding to each user in a public number or an applet; the attribute data corresponding to the APP account attribute is an identifier corresponding to each user in the APP; the attribute data corresponding to the equipment identification attribute is the equipment identification corresponding to each user; the attribute data corresponding to the mobile phone number attribute is a mobile phone number identifier corresponding to each user; the attribute data corresponding to the identity document attribute is the identity document identification corresponding to each user; the attribute data corresponding to the mailbox attribute is a mailbox identifier corresponding to each user; the attribute data corresponding to the license plate attribute is the license plate identifier corresponding to each user. The generation period refers to the frequency of implementing the user identifier uniformly, for example, the generation period may be by day. Each platform refers to a system which can be browsed or logged in by a user in the internet, such as APP software, wechat public numbers, wechat applets, websites and the like, and when the user browses or logs in the platforms, the front-end embedded point system of the platform records the device identification and the access behavior of the user.

In specific implementation, each platform can obtain the data of the embedded points and the service data of each platform according to a preset generation period, and generate a first user information table by logic unified processing according to user information.

S102, determining the attribute priority order of the first user information table of each platform based on the priority rule, combining and de-duplicating the first user information table by using the non-empty attribute with the highest priority in the attribute priority order, and generating a second user information table corresponding to each platform.

In step S102, the second user information table refers to the user information table generated after the first user information table of each platform is merged and deduplicated in the current generation cycle. And aiming at the first user information table of each platform, determining the priority order of each attribute in each first user information table based on a priority rule, determining the non-empty attribute with the highest priority of each user based on the priority order, and combining and de-duplicating data in the first user information table by using the non-empty attribute with the highest priority to obtain a second user information table corresponding to each platform in the current generation period.

S103, merging and de-duplicating the second user information tables of all the platforms based on the non-empty attribute with the highest priority in each second user information table to generate a third user information table for the platform set, comparing each attribute data in the third user information table with the third user information table in the previous generation cycle, screening out information change users in the current generation cycle, and putting the information change users into a temporary change table.

In step S103, the third user information table refers to an information table that contains the user data integrated by all platforms in the current generation cycle, and the table does not contain the final unified user identifier of each user. The platform set refers to a set integrating all platforms, a big data platform can be correspondingly arranged on the basis of the platform set, and user data of all platforms can be integrated in the big data platform. The temporary change table is a data table storing the screened information change users.

In specific implementation, the non-empty attribute with the highest priority determined in step S102 is used to merge and deduplicate the second user information tables of all the platforms again to generate a third user information table for the platform set, each attribute data in the third user information table is compared with the third user information table of the previous generation cycle, information change users in the current generation cycle are screened out, and the information change users are placed in the temporary change table.

S104, determining an attribute priority sequence of the unified identification list in the previous generation cycle based on a priority rule, constructing a funnel-shaped processing chain according to the attribute priority sequence of the unified identification list in the previous generation cycle, comparing the temporary change list with the unified identification list in the previous generation cycle through the funnel-shaped processing chain to determine information updating users and new users in the information changing users, unifying user identifications corresponding to each information changing user in the current generation cycle with user identifications corresponding to each information changing user in the previous generation cycle based on data corresponding to the information updating users and data corresponding to the new users, and generating the unified identification list of the current generation cycle.

In step S104, the unified identification table refers to a finally generated user information table, which contains the unified user identification and the attribute record of each user, where the unified user identification is a unique identifier of each user and does not have any business meaning. The funnel-shaped processing chain refers to a data chain for comparing the temporary change table with the unified identification table in the previous generation period and screening comparison results. The data associated with the previous round is removed before each round of attribute comparison, so that the method is a funnel-like processing chain. And determining the priority order of the attributes of the unified identification table in the previous generation period according to the data characteristics of the unified identification table, and using the unified identification table for the subsequent data comparison and the construction of the funnel-shaped processing chain. And further distinguishing which users are information updating users and which are new users relative to the unified identification table of the last generation period according to the funnel-shaped processing chain. The information updating user refers to a user who has at least one attribute data corresponding to the user appeared in the third user information table in the last generation cycle, but all attribute data corresponding to the user are not completely consistent with the attribute data corresponding to the user in the third user information table in the last generation cycle. The new added user refers to a user for which all attribute data corresponding to the user does not appear in the third user information table of the previous generation cycle. The information updating user uses the same unified user identification in two adjacent generation periods, and the association is carried out based on the user identification, so that the effect of unifying the user identification of the information updating user in the current generation period and the user identification in the previous generation period is achieved. And generating a new user unified user identifier for the newly added user, and finally adding user data with unchanged information in two adjacent generation periods so as to generate a unified identifier table of the current generation period.

Through the four steps, firstly, the data characteristics of each user attribute are comprehensively considered, the non-empty attribute with the highest priority of each record is dynamically positioned, the non-empty attribute with the highest priority is taken as a reference to combine and deduplicate the first user information table, and the second user information table is generated. By comparing the third user information table with the third user information table in the last generation period, most of users with unchanged information can be filtered out, users with changed information in a small amount can be screened out, the subsequent comparison overhead is reduced to the maximum extent, and the expansibility of the method is improved, especially the expansibility under the scene of mass users. And then, a funnel-shaped processing chain is constructed based on the attribute priority order, each attribute of the data change user is compared with the unified identification table in the previous generation period in sequence, and the new user and the information update user are determined, so that the new user and the information update user are unified with the user identification in the previous generation period on the basis, and the method has better identification accuracy and identification stability.

Further, as shown in fig. 2, in the method for unifying user identifiers based on a priority rule provided in the embodiment of the present application, the determining, based on the priority rule, an attribute priority order of the first user information table of each platform, and combining and de-duplicating the first user information table using a non-empty attribute with a highest priority to generate a second user information table corresponding to each platform includes:

s201, based on the priority rule, according to the attribute value characteristics of each attribute in the first user information table, calculating the priority sequence of the attributes in the first user information table.

S202, determining the head attributes of each record according to the attribute priority sequence; wherein, the head attribute is the non-empty attribute with the highest priority.

And S203, merging and removing duplication of the first user information table of each platform based on the head attributes, and generating a second user information table corresponding to each platform.

In the above steps S201 to S203, first, a priority order of each attribute is calculated according to an attribute value characteristic corresponding to each attribute in the first user information table of each platform, where the priority order of the attributes of the first user information table of each platform may be different due to different attribute value characteristics. And determining the head attributes of each record in the first user information table according to the calculated priority order, wherein the head attributes of the records are non-empty attributes with the highest priority, and the head attributes of each record in the first user information table are also different. And combining and removing duplication of the first user information table according to the head attributes of each record to obtain a second user information table corresponding to each platform.

The priority order of each of the attributes in the first user information table in step S201 is determined by the following steps:

For a single platform, the user data of the first user information table needs to be subjected to de-reforming combination by taking the head attribute as a reference, and a second user information table corresponding to the platform is generated; for multiple platforms, a user may browse, log in, or complete a transaction on multiple platforms, and the attribute identification data of the user may be stored in different platforms, so that the attribute data corresponding to each platform user also needs to be deduplicated based on the head attributes.

Further, as shown in fig. 3, in the method for unifying user identifiers based on priority rules provided in this embodiment of the present application, the second user information tables of all platforms are merged and deduplicated based on the non-empty attribute with the highest priority in each second user information table to generate a third user information table for a platform set, each attribute data in the third user information table is compared with the third user information table in the previous generation cycle, information change users in the current generation cycle are screened out, and the information change users are put into a temporary change table, which includes

S301, based on the non-empty attribute with the highest priority, merging and de-duplicating all the second user information tables to obtain a third user information table corresponding to the platform set in the current generation period.

In the step S301, according to the head attributes of each record determined in the step S202, all the second user information tables are merged and deduplicated to obtain a third user information table corresponding to the platform set in the current generation period.

S302, comparing each attribute data in the third user information table with each attribute data in the third user information table in the previous generation period, screening out information change users in the generation period according to the comparison result, and putting the information change users in the temporary change table.

After merging and deduplication are performed on data corresponding to the platform set, it is further required to find out an information change user in a current generation cycle relative to a previous generation cycle, and generate a new unified user identifier or an associated historical unified user identifier for the part of users, S302 further includes:

step 3021, connecting according to the attribute data corresponding to each user in the third user information table in a fixed order, and taking the MD5 value to generate first fingerprint information corresponding to each user.

In step 3021, the first fingerprint information is represented by MD5(Message-Digest Algorithm 5) values formed by connecting all the user attribute data in a fixed order, each piece of user attribute data is connected in a fixed order to obtain a string, and then an MD5 function is called to generate an MD5 value for the connected string. The fixed order means that the order of data connection is the same in each generation cycle.

Step 3022, connecting the attribute data corresponding to each user in the third user information table of the previous generation period according to a fixed sequence, and taking the MD5 value to generate second fingerprint information corresponding to each user.

In the above step 3022, the second fingerprint information is generated for the third user information table of the previous generation cycle, and the process of generating the second fingerprint information is the same as the step of generating the first fingerprint information in step S3021.

Step 3023, based on the comparison result between the first fingerprint information and the second fingerprint information, screening out information change users in the current generation cycle, and putting the information change users into the temporary change table.

In step 3023, the first fingerprint information corresponding to each user in the third user information table is compared with the second fingerprint information corresponding to each user in the third user information in the previous generation cycle. And if the first fingerprint information is the same as the second fingerprint information, the third user information table is represented to have no change relative to the user data of the third user information table in the previous generation period. And if the first fingerprint information is different from the second fingerprint information, the third user information table is changed or a user is added relative to the user data of the third user information of the previous generation period. Screening out user data with different first fingerprint information and all second fingerprint information of the third user information table in the last generation period, and putting the user data into a temporary change table.

In specific implementation, if the user quantity in the platform set is large, for example, tens of millions, comparison of each attribute value of each user in two adjacent generation periods is completed, the calculated quantity is very large, each piece of user attribute data is connected according to a fixed sequence and then the MD5 value is taken, users with unchanged information can be filtered to the maximum extent by comparing information fingerprints, and the comparison efficiency is improved. Any attribute value change of the user can cause the MD5 value change, and a few new users and information change users can be directly located through fingerprint comparison.

Further, as shown in fig. 4, the determining an attribute priority order of a unified identifier table in a previous generation cycle based on the priority rule, constructing a funnel-shaped processing chain according to the attribute priority order of the unified identifier table in the previous generation cycle, comparing the temporary change table with the unified identifier table in the previous generation cycle by using the funnel-shaped processing chain, determining an information updating user and a new user in the information changing users, unifying a user identifier corresponding to each information changing user in a current generation cycle with a user identifier corresponding to each information changing user in the previous generation cycle based on data corresponding to the information updating user and data corresponding to the new user, and generating the unified identifier table in the current generation cycle, includes:

s401, determining the attribute priority order of the unified identification table of the previous generation cycle, and constructing a funnel-shaped processing chain based on the attribute priority order.

The funnel-shaped processing chain can be constructed through the priority sequence of the attributes and compared with the unified identification table of the previous generation cycle.

Calculating the priority of the attribute according to the attribute data characteristics in the unified identification table of the previous generation period by using the following formula:

D(c_m)＝α·W(c_m)-β·R(c_m)；

And constructing a funnel-shaped processing chain according to the calculated attribute priority, and sequentially comparing each user attribute value in the temporary change table with the unified identification table of the previous generation period.

S402, according to the funnel-shaped processing chain, sequentially comparing each attribute value in the temporary change table with the unified identification table of the previous generation period, if the funnel-shaped processing chain is associated with the attribute value in the temporary change table, the user corresponding to the attribute value is an information updating user, and after all the comparison is finished, the user which is not associated is a new user.

In step S402, the user attribute values in the temporary change table are sequentially compared with the version of the previous day according to the constructed funnel-shaped processing chain, so as to find out the change situation of all data. The attribute labels with high priority order are compared firstly, after the comparison of the attribute labels with high priority order is completed, the comparison of the attribute labels with low priority order is continuously implemented, and several rounds of comparison are carried out for several attribute labels. Before each round of attribute tag comparison, the attribute data associated with the previous round is removed, so that the method is a funnel-shaped processing chain. And the user associated in the step is the information updating user, and the user which is not associated in the step is the new user.

S403, respectively unifying the identification of the information updating user and the new user with the identification of the user in the previous generation cycle, generating a new unified user identification for the new user, and generating a unified identification table of the current generation cycle by using the existing unified user identification for the information updating user.

In step S403, according to the determined attribute data corresponding to the information updating user, records having one or more same user identifiers as the information updating user in the previous generation cycle are found, and then are unified with the matched user identifiers. And if the new user is the new user, generating a new unified user identifier for the new user. And the user with unchanged information directly writes the information into the unified identification table of the current generation period.

In specific implementation, if one or more same attribute values are matched with a user record in the temporary change table from the unified identifier table in the previous generation cycle, the two user records need to be associated and unified based on the existing unified user identifier. For example, the record of an information updating user in the temporary change table includes non-null information such as the authentication center account id cid1, the social account id openid1, and the device id devid1, as shown in the following table.

Authentication center account identification	Social account identification	Device identification	...
				cid1	openid1	devid1	...

Matching two user records which have the same attribute label with the updated user in the last generation period in total through a funnel-shaped processing chain, wherein one user record comprises a unified user identifier uid1, a certificate center account identifier cid1 and a social account identifier openid1, and the certificate center account identifier and the social account identifier are the same as the attribute identifier corresponding to the user in the current generation period; a piece of data contains a unified user identity uid2 and a device identity devid1, which is identical to the device identity of the user in the current generation period, as in the table below.

Unified user identification	Authentication center account identification	Social account identification	Device identification	…
					uid1	cid1	openid1	…
uid2			devid1	…

The information updating user in the current generation period uses the unified user identifier of any one of the two records as the unified identifier of the user in the current generation period. Assuming that uid1 is selected, in the current generation cycle, the uid1 record is updated with the most recent record in the temporary change table, while the previous uid2 record is no longer written to the unified identity table of the current generation cycle because it is absorbed by the new record coverage. Subsequently, the mapping of the absorbed unified user identification uid2 to the new unified user identification uid1 is written into an association table for subsequent activity evaluation and user tracking, as in the following table.

Serial number	uid_new	uid_old
			1	uid1	uid2

S403 further includes:

step 4031, according to the funnel-shaped processing chain, comparing the temporary change table with the unified identification table of the previous generation cycle, and determining the information update user and the new user.

Step 4032, generate new unified user id for the new user, and add it to the unified id table of the current generation cycle.

In specific implementation, the user which is not related after the funnel-shaped processing chain comparison is completed is determined as a new user, and a new unified user identifier is generated for the part of users. The uniform user Identifier is generated by calling a UUID (Universal Unique Identifier) function built in the platform set, and can uniquely identify the user and add the user into the uniform Identifier table of the current generation period.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a system for unifying user identifications based on priority rules according to an embodiment of the present application, where the system includes:

the data acquisition system 501 is configured to acquire buried point data and service data of each platform in a current generation cycle, and generate a first user information table corresponding to each platform according to the buried point data and the service data;

an in-platform deduplication module 502, configured to determine an attribute priority order of the first user information table of each platform based on a priority rule, and merge deduplication is performed on the first user information table by using a non-empty attribute with a highest priority in the attribute priority order, so as to generate a second user information table corresponding to each platform;

a newly added and changed user screening module 503, configured to merge and deduplicate the second user information tables of all platforms based on the non-empty attribute with the highest priority in each second user information table, generate a third user information table for a platform set, compare attribute data in the third user information table with a third user information table of a previous generation cycle, screen out information changed users in a current generation cycle, and place the information changed users in a temporary change table;

the user identifier unifying module 504 is configured to determine an attribute priority order of a unified identifier table in a previous generation cycle based on the priority rule, construct a funnel-shaped processing chain according to the attribute priority order of the unified identifier table in the previous generation cycle, compare the temporary change table with the unified identifier table in the previous generation cycle through the funnel-shaped processing chain, determine an information update user and a new user among the information change users, unify a user identifier corresponding to each information change user in a current generation cycle with a user identifier corresponding to each information change user in the previous generation cycle based on data corresponding to the information update user and data corresponding to the new user, and generate the unified identifier table in the current generation cycle.

Optionally, the system for unifying user identities based on priority rules further includes:

a first calculating module, configured to calculate, based on the priority rule, a priority order of the attributes in the first user information table according to attribute value characteristics of each attribute in the first user information table;

the first determining module is used for determining the head attributes of each record according to the attribute priority order; the head seat attribute is a non-empty attribute with the highest priority;

and the first generation module is used for merging and de-duplicating the first user information table of each platform based on the head seat attribute to generate the second user information table corresponding to each platform.

Optionally, when the first calculating module calculates the priority order of the attributes in the first user information table according to the attribute value feature of each attribute in the first user information table, the method includes:

the second generation module is used for combining and de-duplicating all the second user information tables based on the non-empty attribute with the highest priority to obtain the third user information table corresponding to the platform set in the current generation period;

and the data comparison module is used for comparing each attribute data in the third user information table with each attribute data in the third user information table in the previous generation period, screening out information change users in the generation period according to a comparison result, and putting the information change users into the temporary change table.

a third generating module, configured to connect in a fixed order according to the attribute data corresponding to each user in the third user information table, and take an MD5 value to generate first fingerprint information corresponding to each user;

a fourth generation module, configured to connect in a fixed order according to the attribute data corresponding to each user in the third user information table in the previous generation cycle, and take an MD5 value to generate second fingerprint information corresponding to each user;

and the fifth generation module is used for screening out information change users in the current generation period based on the comparison result of the first fingerprint information and the second fingerprint information and putting the information change users into the temporary change table.

the chain construction module is used for determining the attribute priority order of the unified identification table of the previous generation cycle and constructing the funnel-shaped processing chain based on the attribute priority order;

a first comparing unit, configured to sequentially compare, according to the funnel-shaped processing chain, each attribute value in the temporary change table with the unified identifier table of the previous generation cycle, where if the funnel-shaped processing chain is associated with an attribute value in the temporary change table, a user corresponding to the attribute value is the information update user, and after all comparisons are completed, a new user is not associated with the attribute values;

and the sixth generation module is used for unifying the identification of the information updating user and the identification of the newly added user with the identification of the user in the previous generation period respectively, generating a new unified user identification for the newly added user, and generating a unified identification table of the current generation period by using the existing unified user identification for the information updating user.

the second comparison unit is used for comparing the temporary change table with the unified identification table of the previous generation period according to the funnel-shaped processing chain to determine the information updating user and the newly added user;

and the seventh generating module is used for generating a new unified user identifier for the new user and adding the new unified user identifier into the unified identifier table of the current generating period.

Optionally, the determining, by the chain construction module, the attribute priority order of the unified identification table of the previous generation cycle, and constructing the funnel-shaped processing chain based on the attribute priority order, includes:

D(c_m)＝α·W(c_m)-β·R(c_m)；

Corresponding to the method for unifying user identities based on priority rules in fig. 1, an embodiment of the present application further provides a schematic diagram of an electronic device 600, as shown in fig. 6, where the electronic device 600 includes a processor 610, a memory 620, and a bus 630. The memory 620 stores machine-readable instructions executable by the processor 610, when the electronic device 600 runs, the processor 610 communicates with the memory 620 through the bus 630, and when the machine-readable instructions are executed by the processor 610, the method for unifying the user identifications based on the priority rule can be executed, so that the defects that in the prior art, the accuracy is not high, the user behavior data cannot be associated and the like are overcome.

Corresponding to the method for unifying user identities based on priority rules in fig. 1, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the method for unifying user identities based on priority rules.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is executed, the method for unifying user identifiers based on the priority rule can be executed, so that the defects that the accuracy is not high, the user behavior data cannot be associated, and the like in the prior art are overcome. The data characteristics of each user attribute are comprehensively considered, the head attributes of each record are dynamically positioned, the first user information table is merged and deduplicated by taking the head attributes as the reference, and the second user information table is generated. By comparing the third user information table with the third user information table in the last generation period, most of users with unchanged information can be filtered out, users with changed information in a small amount can be screened out, the subsequent comparison overhead is reduced to the maximum extent, and the expansibility of the method is improved, especially the expansibility under the scene of mass users. And then, a funnel-shaped processing chain is constructed based on the attribute priority order, each attribute of the data change user is compared with the unified identification table in the previous generation period in sequence, and the new user and the information update user are determined, so that the new user and the information update user are unified with the user identification in the previous generation period on the basis, and the method has better identification accuracy and identification stability.

In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for unifying user identities based on priority rules is characterized by comprising the following steps:

2. The method of claim 1, wherein the determining an attribute priority order of the first user information table for each platform based on a priority rule, and performing merge deduplication on the first user information table using a non-empty attribute with a highest priority in the attribute priority order to generate a second user information table corresponding to each platform comprises:

3. The method of claim 2, wherein the priority order of each of the attributes in the first user information table is determined by:

4. The method according to claim 1, wherein the merging and deduplication are performed on the second user information tables of all platforms based on the non-empty attribute with the highest priority in each second user information table to generate a third user information table for a platform set, each attribute data in the third user information table is compared with the third user information table in the previous generation cycle, information change users in the current generation cycle are screened out, and the information change users are placed into a temporary change table, including:

5. The method according to claim 4, wherein the comparing the attribute data in the third user information table with the attribute data in the third user information table of the previous generation cycle, and screening the information change users in the generation cycle according to the comparison result, and placing the information change users in the temporary change table comprises:

6. The method according to claim 1, wherein the determining of the priority order of the attributes of the unified identity list in the previous generation cycle based on the priority rule, the constructing of a funnel-shaped processing chain according to the priority order of the attributes of the unified identity list in the previous generation cycle, the comparing of the temporary change table with the unified identity list in the previous generation cycle by the funnel-shaped processing chain to determine the information updated user and the new user in the information changed users, the unifying of the user identifier corresponding to each of the information changed users in the current generation cycle with the user identifier corresponding to each of the information changed users in the previous generation cycle based on the data corresponding to the information updated user and the data corresponding to the new user to generate the unified identity list in the current generation cycle, the method comprises the following steps:

and respectively unifying the identification of the information updating user and the identification of the newly added user with the identification of the user in the previous generation period, generating a new unified user identification for the newly added user, and generating a unified identification table of the current generation period by using the existing unified user identification for the information updating user.

7. The method of claim 6, further comprising:

and generating a new unified user identifier for the new user, and adding the new unified user identifier into the unified identifier table of the current generation period.

8. The method of claim 6, wherein the priority order of the attributes of the unified identification list of the previous generation cycle is determined by:

D(c_m)＝α·W(c_m)-β·R(c_m)；

9. The method of claim 1, wherein the attributes comprise any one or more of the following:

10. A system for unifying subscriber identities based on priority rules, comprising: