CN114862449A - Method and device for calculating unique natural person identifier, electronic equipment and storage medium - Google Patents

Method and device for calculating unique natural person identifier, electronic equipment and storage medium Download PDF

Info

Publication number
CN114862449A
CN114862449A CN202210376670.8A CN202210376670A CN114862449A CN 114862449 A CN114862449 A CN 114862449A CN 202210376670 A CN202210376670 A CN 202210376670A CN 114862449 A CN114862449 A CN 114862449A
Authority
CN
China
Prior art keywords
uniqueid
user
aggregation
natural person
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210376670.8A
Other languages
Chinese (zh)
Inventor
宋亚恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hujin Information Technology Co ltd
Original Assignee
Shanghai Hujin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hujin Information Technology Co ltd filed Critical Shanghai Hujin Information Technology Co ltd
Priority to CN202210376670.8A priority Critical patent/CN114862449A/en
Publication of CN114862449A publication Critical patent/CN114862449A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24575Query processing with adaptation to user needs using context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Fuzzy Systems (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method, a device, electronic equipment and a storage medium for calculating a unique natural person identifier, wherein the method for calculating the unique natural person identifier comprises the following steps: generating a user equipment width table according to information of the user and the equipment, wherein the user equipment width table comprises fields related to the user and the equipment and corresponding values: performing first iterative computation, selecting a first non-empty value as a UniqueId by each line in the user equipment wide table, forming a single column, and taking the original content of each line as a Context; forming a first iteration calculation table; performing first polymerization, merging the rows with the same UniqueId in the first iterative calculation table, and removing the column of the UniqueId to form a first polymerization table; performing cyclic iteration, namely taking each column of fields of the first aggregation table as the UniqueId in sequence, and repeatedly performing the iterative computation and the aggregation to form a final aggregation table; each row of the final aggregated table is marked with a unique natural person identifier.

Description

Method and device for calculating unique natural person identifier, electronic equipment and storage medium
Technical Field
The invention relates to the field of circled computation.
Background
The unique natural person identification algorithm mainly provides a unique dimension for the crowd circle selection logic to carry out the directional output of the crowd package.
In the process of performing service refinement operation, the crowd circling calculation may be performed by different IDs (both, and also), for example: the method needs to circle all users who have advertised in the tire (possibly unregistered and only have device information) within the past 30 days and have sex within 30 years of the male, obviously, the circle selection needs to be conducted from two dimensions of a device ID and a user ID, if the circle selection is achieved according to the logic, the circle selection process is quite complex and low in performance, therefore, data needs to be associated to a certain dimension urgently, the circle selection is conducted on the dimension, and therefore the method is easy to achieve, and the method is the application scene of the unique natural person identification algorithm.
Most of the current practice in the industry is to uniformly calculate based on graph calculation capability of graph x, and a graph calculation algorithm based on Spark-graph x only supports nodes of a long and integer type, and does not support devices such as device types (GUID types), so that if the node is used, a plurality of layers of mapping relationships are needed, conversion is performed once before calculation, and conversion is needed after calculation. The disadvantages mainly lie in the following aspects:
firstly, direct string ID calculation is not supported;
secondly, a mapping relation is required to be maintained;
third, large volume data calculation is slow due to the intermediate links of the mapping relationship.
Disclosure of Invention
The following presents a summary of various exemplary aspects. Some simplifications and omissions may be made in the following summary, which is intended to highlight and introduce some aspects of the various exemplary embodiments, but not to limit the scope of the invention. Detailed descriptions of exemplary embodiments sufficient to enable those of ordinary skill in the art to make and use the inventive concepts will be presented in the following sections.
The technical scheme of the invention provides a method for calculating a unique natural person identifier, which comprises the following steps: generating a user equipment width table according to information of the user and the equipment, wherein the user equipment width table comprises fields related to the user and the equipment and corresponding values: performing first iterative calculation, selecting a first non-empty value as a UniqueId from each line in the user equipment wide table to form a single column, and taking the original content of each line as a Context; forming a first iteration calculation table; performing first polymerization, merging the rows with the same UniqueId in the first iterative calculation table, and removing the column of the UniqueId to form a first polymerization table; performing cyclic iteration, namely taking each column of fields of the first aggregation table as the UniqueId in sequence, and repeatedly performing the iterative computation and the aggregation to form a final aggregation table; each row of the final aggregated table is marked with a unique natural person identifier.
Optionally, the user equipment wide table includes at least one of: UserId, DeviceId, phone, imei, idfa, openid, Tags.
Optionally, the method further comprises: in the loop iteration, the data of the iterative computation table or the aggregation table is redistributed and averagely distributed to different machines.
Optionally, the method further comprises: and in the iterative computation, the data of the iterative computation table are serialized and deserialized.
Optionally, the aggregation table includes a Context, and determines whether each column of contents of the Context has a duplicate, and if the content has the duplicate, the contents are merged.
Another technical solution of the present invention provides a device for calculating a unique natural person identifier, including: a table building module configured to generate a user device width table according to information of the user and the device, wherein the user device width table includes fields related to the user and the device and corresponding values thereof: a computing module configured to: performing first iterative computation, selecting a first non-empty value as a UniqueId by each line in the user equipment wide table, forming a single column, and taking the original content of each line as a Context; forming a first iteration calculation table; performing first polymerization, merging the rows with the same UniqueId in the first iterative calculation table, and removing the column of the UniqueId to form a first polymerization table; performing cyclic iteration, namely taking each column of fields of the first aggregation table as the UniqueId in sequence, and repeatedly performing the iterative computation and the aggregation to form a final aggregation table; each row of the final aggregated table is marked with a unique natural person identifier.
Optionally, the user equipment wide table includes at least one of: UserId, DeviceId, phone, imei, idfa, openid, Tags.
Optionally, the calculation module is further configured to redistribute data of the iterative calculation table or the aggregation table to different machines on average in loop iterations.
Optionally, the calculation module is further configured to iterate operations of serializing and deserializing data of the calculation table in the iterative calculation.
Optionally, the aggregation table includes a Context, and determines whether each column of contents of the Context has a duplicate, and if the content has the duplicate, the contents are merged.
Another technical solution of the present invention also provides an electronic device, including: a processor, a memory, and a computer program stored in the memory for execution by the processor, wherein the processor implements the steps of the method according to any of the above aspects when executing the computer program.
Another aspect of the present invention further provides a computer readable storage medium, wherein the computer program is executed by a processor to implement the steps of the method according to any one of the above aspects.
The technical scheme of the invention mainly has the following beneficial effects:
firstly, a self-developed large-scale connected graph algorithm is used for directly calculating a connected graph based on a primary GUID type, quickly aggregating all connected graphs with different ID dimensions, assigning a unique ID, and directly operating primary data without performing a layer of ID mapping operation in the scheme of Spark-GraphX calculation adopted by the relative industry, namely obtaining the result.
Secondly, all users capable of being associated inside and outside the website are marked with unique IDs, the generated oneId is a self-increasing numerical type, subsequent operations (such as BitMap) are facilitated, and an efficient scheme for crowd selection based on BitMap can be perfectly supported
Thirdly, the algorithm greatly reduces the generation time of the OneID and improves the efficiency.
Drawings
For a better understanding of various exemplary embodiments, reference may be made to the accompanying drawings in which:
FIG. 1 is a flow chart diagram illustrating a method for unique natural person identification calculation provided by an embodiment;
FIG. 2 is a diagram illustrating some steps in a method for unique natural person identification calculation provided by an embodiment;
FIG. 3 is a diagram illustrating some steps in a method for unique natural person identification calculation provided by an embodiment;
fig. 4 is a schematic structural diagram of an apparatus for calculating a unique natural person identifier according to an embodiment.
To facilitate understanding, the same reference numerals have been used to designate elements having substantially the same or similar structure and/or substantially the same or similar function.
Detailed Description
The description and drawings illustrate the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are included within its scope. Furthermore, all examples cited herein are principally intended expressly to be only for pedagogical purposes to aid the reader in understanding the principles of the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Further, as used herein, the term "or" refers to a non-exclusive or (i.e., and/or) unless otherwise indicated (e.g., "or otherwise" or in the alternative). Moreover, the various embodiments described herein are not necessarily mutually exclusive, as some embodiments may be combined with one or more other embodiments to form new embodiments.
Interpretation of terms:
graph calculation: and (4) calculating the maximum link map at the inner side of the graph.
Unique natural person identification: in internet e-commerce, a unique identification of one person can be a user ID, a device ID, an OpenID, a mobile phone number and the like, which are scattered in each business field, and in some scenes, the unique identification ID is required to be identified as one person, so that the unique identification ID is assigned to the person by associating the account numbers and is the unique natural person identification.
A first embodiment provides a method for calculating a unique natural person identifier, and fig. 1 shows a flow chart of the method for calculating a unique natural person identifier provided by an embodiment, the method including:
s101, generating a user equipment width table according to information of a user and equipment, wherein the user equipment width table comprises fields related to the user and the equipment and corresponding values:
the fields related to the user and the device may include: user ID, equipment ID, mobile phone number, mobile phone serial number, advertisement identifier, digital identification frame, user attribute tag and equipment behavior tag, and other information related to the user and the equipment can also be included.
The user and the equipment have one-to-many and many-to-one conditions. Firstly, generating a relation broad table by using information of a user and equipment through a warehouse system (relation of all dimensions). For example, as shown in FIG. 2, three tables exist to record the relationship of users and devices: the first is a user tag table, including UserId (user ID), tag (e.g., including attribute tag), etc.; the second is a user equipment table which comprises UserId, DeviceId (equipment ID), phone (mobile phone number), imei (mobile phone serial number), idfa (advertisement identifier), openid (digital identity frame) and the like; device behavior table, including DeviceId and Tags device behavior, etc. (e.g., behavior of clicking on an advertisement, etc.). And associating the user tag table with the user equipment table by using the UserId as a dimension, and associating the user equipment table with the equipment behavior table by using the DeviceId as the dimension to form a user equipment width table, wherein the user equipment width table comprises the UserId, the DeviceId, phone, imei, idfa, openid, Tags (attributes), Tags (equipment behaviors) and the like.
For example, the ue wide table of fig. 2, the first row is UniqueId UID, Context: DeviceId, imei, idfa, openid, Tags, second line UniqueId ═ DeviceId, Context: null, idfa, null, T1, T2, third row UniqueId ═ DeviceId, Context: imei, null, null, T1, T3.
S102, carrying out first iterative computation (Map computation, a distributed algorithm), selecting a first non-empty value as a UniqueId by each line in a user equipment wide table, forming a single column, and taking the original content of each line as a Context; forming a first iteration calculation table;
namely, the temporary aggregation Key; for example, in the user device wide table of fig. 2, the first row not being empty is userid (uid), the second row not being empty is DeviceId, null is null, and the third row not being empty is DeviceId, which are respectively referred to as UniqueId. The method comprises the steps of converting user device wide list data into (UniqueId, Context) type data, wherein the Context type contains comprehensive information of various dimensions of a user and a device, such as UserId, DeviceId, imei, idfa, openid, Tags, device behaviors and the like.
In which format serialization and deserialization operations (Serialize and DeSerialize) can be performed.
S103, carrying out first aggregation, merging rows with the same UniqueId in the first iterative computation table, and removing columns of the UniqueId to form a first aggregation table (Context table);
s104, carrying out loop iteration, namely repeatedly carrying out iterative computation and aggregation by taking each column of fields of the first aggregation table as a UniqueId in sequence to form a final aggregation table (Context table);
starting the first iteration of calculation based on the user device wide table, the fields of the user device wide table, such as UserId, DeviceId, imei, idfa, openid, Tags, device behavior, etc., are scanned in sequence. And performing second iterative calculation and aggregation by taking the UserId as the UniqueId, performing third iterative calculation and aggregation by taking the DeviceId as the UniqueId, and the like until all fields are subjected to iterative calculation and aggregation. Encountering a null value may take a random value.
It can also judge whether each column of Context has duplication, merge the Context, merge them By the Key (Reduce By Key), merge the same keys together, and merge them into a new Context (large By Field with series).
In order to solve the unbalanced problem caused by explosion, a link of data redistribution (redistribution) is added in the middle, the data can be evenly distributed on different machines, the problem of data inclination can be effectively solved, a foundation is laid for the subsequent Context re-aggregation, and iteration is continuously carried out until the last dimension.
And S105, marking each row of the final aggregation table with a unique natural person identifier.
After the iteration is completed, Context is inversely decomposed into structured data, then each line of structured data is endowed with a self-added unique natural person identification ID, namely OneID, and the process can be ZipWithIndex, and finally a data structure of the lowest table (the identifications capable of being associated in all dimensions are associated) as shown in FIG. 2 and FIG. 3 is generated.
The whole process can be called as a distributed connected graph algorithm, and finally a connected graph width table is generated.
The second embodiment also provides an example of a method of unique natural person identification calculation, such as:
user tag table as shown in table 1, UserId user IDs are U1, U2, U3, respectively, Tags (attributes) may include user age, gender, etc., may be represented by T1, T2, T3, T4, and ellipses represent that other fields, features, etc. may also be present.
UserId Tags (Attribute) ……
U1 T1,T2 ……
U2 T4 ……
U3 T1,T3 ……
TABLE 1
User device and device behavior table combinations as shown in table 2, DeviceId is device ID, phone is phone, Tags may include device behavior, e.g., click on advertisement, etc., Null represents Null.
UserId DeviceId Phone …… Tags (behavior)
U1 D1 153 …… T9
U2 D2 177 …… Null
U3 D1 166 …… T10
Null D4 166 …… T11
TABLE 2
The user equipment width table is obtained by fusing table 1 and table 2 according to the user id as the dimension, as shown in table 3:
UserId DeviceId Phone …… Tags
U1 D1 153 …… T1,T2,T9
U2 D2 177 …… T4
U3 D1 166 …… T1,T3,T11
Null D4 166 …… T11
TABLE 3
The first iteration according to table 3 selects the first non-empty field from left to right in each row as the UniqueID, and the last row is empty first, D4 second, so the first is either empty or D4. The table containing key and value as shown in table 4 was calculated:
Figure BDA0003590959640000071
TABLE 4
Then, the first polymerization was carried out (since the UniqueID was not repeated and no combination was required, the polymerization result was the original value), which is shown in table 5:
Figure BDA0003590959640000081
TABLE 5
The Context finally obtained by the first iteration and aggregation is shown in table 6:
Figure BDA0003590959640000082
TABLE 6
A second iteration is then performed (taking the first column UserId in Context of table 6 as the dimension and the null value as a random value), resulting in the following table 7:
Figure BDA0003590959640000083
TABLE 7
And (3) second polymerization: (original result because the unique id is not duplicated) as shown in table 8:
Figure BDA0003590959640000084
TABLE 8
The UserId according to Context was polymerized again (UserId not repeated so original values) to give the following values as shown in table 9:
Figure BDA0003590959640000091
TABLE 9
A third iteration is then performed (with DeviceId in the second column of Context of table 9 as the dimension) resulting in the following table 10:
Figure BDA0003590959640000092
watch 10
And (3) third polymerization: (aggregation by UniqueID dimension, merging the first and third row of D1) to obtain the values shown in Table 11:
Figure BDA0003590959640000093
TABLE 11
Once again, the DeviceId as per Context is aggregated as follows (DeviceId does not repeat so original values), which is shown in table 12:
Figure BDA0003590959640000094
TABLE 12
For the fourth iteration (expanded for the dimensions according to the third column of phone numbers phone in table 12), as shown in table 13:
Figure BDA0003590959640000101
watch 13
The fourth aggregation (the second and fourth lines of cell phone numbers 166 are merged according to the cell phone aggregation) as shown in table 14:
Figure BDA0003590959640000102
TABLE 14
Phones are once again aggregated according to Context of Table 14 (this list of phones has duplicates as Key, where 153,166 duplicates 153,166), and combined to give the results shown in Table 15:
Figure BDA0003590959640000103
watch 15
So far, all fields participate in iterative computation, the iterative computation is completed, and Context is marked with unique values (Onekey, OneID) to obtain a final result, which is shown in table 16:
Figure BDA0003590959640000104
TABLE 16
The third embodiment provides a unique natural person identification calculation device, which comprises a table building module 201 and a calculation module 202, see fig. 4.
A table building module 201 configured to generate a user device width table according to the information of the user and the device, where the user device width table includes fields related to the user and the device and corresponding values:
the fields related to the user and the device may include: user ID, equipment ID, mobile phone number, mobile phone serial number, advertisement identifier, digital identification frame, user attribute tag and equipment behavior tag, and other information related to the user and the equipment can also be included.
The user and the equipment have one-to-many and many-to-one conditions. Firstly, generating a relation broad table by using information of a user and equipment through a warehouse system (relation of all dimensions). For example, as shown in fig. 2, there are three tables to record the relationship between users and devices: the first is a user tag table, including UserId (user ID), tag (e.g., including attribute tag), etc.; the second is a user equipment table which comprises UserId, DeviceId (equipment ID), phone (mobile phone number), imei (mobile phone serial number), idfa (advertisement identifier), openid (digital identity frame) and the like; device behavior table, including DeviceId and Tags device behavior, etc. (e.g., behavior of clicking on an advertisement, etc.). And associating the user tag table with the user equipment table by using the UserId as a dimension, and associating the user equipment table with the equipment behavior table by using the DeviceId as the dimension to form a user equipment width table, wherein the user equipment width table comprises the UserId, the DeviceId, phone, imei, idfa, openid, Tags (attributes), Tags (equipment behaviors) and the like.
For example, the ue wide table of fig. 2, the first row is UniqueId UID, Context: DeviceId, imei, idfa, openid, Tags, second line UniqueId ═ DeviceId, Context: null, idfa, null, T1, T2, third row UniqueId ═ DeviceId, Context: imei, null, null, T1, T3.
A calculation module 202 configured to perform a first iterative calculation (Map calculation, a distributed algorithm), where each row in the ue wide table selects a first non-empty value as a UniqueId to form a single column, and the original content of each row is used as a Context; forming a first iteration calculation table;
namely, the temporary aggregation Key; for example, in the user device wide table of fig. 2, the first row not being empty is userid (uid), the second row not being empty is DeviceId, null is null, and the third row not being empty is DeviceId, which are respectively referred to as UniqueId. The method comprises the steps of converting user device wide list data into (UniqueId, Context) type data, wherein the Context type contains comprehensive information of various dimensions of a user and a device, such as UserId, DeviceId, imei, idfa, openid, Tags, device behaviors and the like.
In which format serialization and deserialization operations (Serialize and DeSerialize) can be performed.
A calculation module 202, further configured to perform a first aggregation, merge rows with the same UniqueId in the first iterative calculation table, and remove columns of the UniqueId to form a first aggregation table (Context table);
the calculation module 202 is further configured to perform loop iteration, sequentially use each column of fields of the first aggregation table as a UniqueId, and repeatedly perform the iterative calculation and aggregation to form a final aggregation table (Context table);
starting the first iteration of calculation based on the user device wide table, the fields of the user device wide table, such as UserId, DeviceId, imei, idfa, openid, Tags, device behavior, etc., are scanned in sequence. And performing second iterative calculation and aggregation by taking the UserId as the UniqueId, performing third iterative calculation and aggregation by taking the DeviceId as the UniqueId, and the like until all fields are subjected to iterative calculation and aggregation. Encountering a null value may take a random value.
It can also judge whether each column of Context has duplication, merge the Context, merge them By the Key (Reduce By Key), merge the same keys together, and merge them into a new Context (large By Field with series).
In order to solve the unbalanced problem caused by explosion, a link of data redistribution (redistribution) is added in the middle, the data can be evenly distributed on different machines, the problem of data inclination can be effectively solved, a foundation is laid for the subsequent Context re-aggregation, and iteration is continuously carried out until the last dimension.
The calculation module 202 is further configured to mark each row of the last aggregated table with a unique natural person identifier.
After the iteration is completed, Context is inversely decomposed into structured data, then each line of structured data is endowed with a self-added unique natural person identification ID, namely OneID, and the process can be ZipWithIndex, and finally a data structure of the lowest table (the identifications capable of being associated in all dimensions are associated) as shown in FIG. 2 and FIG. 3 is generated.
The whole process can be called as a distributed connected graph algorithm, and finally a connected graph width table is generated.
A fourth embodiment also provides an electronic device including: a processor, a memory and a computer program stored in the memory for running, wherein the processor implements the steps of the method of any one of the above embodiments, such as steps S101 to S103, when executing the computer program, or implements the functions of the modules/units in each of the above embodiments, such as the functions of units 201 to 203, when executing the computer program. The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program in the electronic device.
The electronic device may be a mobile terminal such as a smart phone, or a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The electronic device may include, but is not limited to, a processor, a memory, may include more or fewer components, or may combine certain components, e.g., the electronic device may also include an input-output device, a network access device, a bus, etc. The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. The memory may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device. Further, the memory may also include both an internal storage unit and an external storage device of the electronic device.
The fifth embodiment also provides a computer-readable storage medium, which when executed by a processor implements the steps of the method of any of the above embodiments.
Each functional unit in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier signal, telecommunications signal, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment. Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed system, electronic device and method may be implemented in other ways. For example, the above-described embodiments of systems and electronic devices are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (12)

1. A method of unique natural person identification computation, comprising:
generating a user equipment width table according to information of the user and the equipment, wherein the user equipment width table comprises fields related to the user and the equipment and corresponding values:
performing first iterative computation, selecting a first non-empty value as a UniqueId by each line in the user equipment wide table, forming a single column, and taking the original content of each line as a Context; forming a first iteration calculation table;
performing first polymerization, merging the rows with the same UniqueId in the first iterative calculation table, and removing the column of the UniqueId to form a first polymerization table;
performing cyclic iteration, namely taking each column of fields of the first aggregation table as the UniqueId in sequence, and repeatedly performing the iterative computation and the aggregation to form a final aggregation table;
each row of the final aggregated table is marked with a unique natural person identifier.
2. The method of unique natural person identification computation of claim 1, wherein the user device wide table comprises at least one of: UserId, DeviceId, phone, imei, idfa, openid, Tags.
3. The method of unique natural person identification computation of claim 1, further comprising: and in the loop iteration, the data of the iterative computation table or the aggregation table is redistributed and averagely distributed to different machines.
4. The method of unique natural person identification computation of claim 1, further comprising: and in the iterative computation, the data of the iterative computation table are serialized and deserialized.
5. The method of claim 4, wherein the aggregate table includes contexts, and wherein the Context determines whether each column of contents in the contexts is duplicated, and if so, the contents are merged.
6. An apparatus for unique natural person identification computation, comprising:
a table building module configured to generate a user device width table according to information of the user and the device, wherein the user device width table includes fields related to the user and the device and corresponding values thereof:
a computing module configured to:
performing first iterative computation, selecting a first non-empty value as a UniqueId by each line in the user equipment wide table, forming a single column, and taking the original content of each line as a Context; forming a first iteration calculation table;
performing first polymerization, merging the rows with the same UniqueId in the first iterative calculation table, and removing the column of the UniqueId to form a first polymerization table;
performing cyclic iteration, namely taking each column of fields of the first aggregation table as the UniqueId in sequence, and repeatedly performing the iterative computation and the aggregation to form a final aggregation table;
each row of the final aggregated table is marked with a unique natural person identifier.
7. The apparatus according to claim 6, wherein the user device wide list comprises at least one of: UserId, DeviceId, phone, imei, idfa, openid, Tags.
8. The apparatus according to claim 6, wherein the calculation module is further configured to redistribute data of the iterative calculation table or the aggregation table to different machines in a loop iteration.
9. The apparatus according to claim 6, wherein the calculation module is further configured to perform operations of serializing and deserializing data of the iterative calculation table in an iterative calculation.
10. The apparatus of claim 6, wherein the aggregate table comprises contexts, and wherein the Context determines whether each column of contents in the contexts is duplicated, and if so, the contents are merged.
11. An electronic device, comprising: a processor, a memory, and a computer program stored for execution on the memory, the processor, when executing the computer program, implementing the method of any of claims 1-5.
12. A computer-readable storage medium, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-5.
CN202210376670.8A 2022-04-12 2022-04-12 Method and device for calculating unique natural person identifier, electronic equipment and storage medium Pending CN114862449A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210376670.8A CN114862449A (en) 2022-04-12 2022-04-12 Method and device for calculating unique natural person identifier, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210376670.8A CN114862449A (en) 2022-04-12 2022-04-12 Method and device for calculating unique natural person identifier, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114862449A true CN114862449A (en) 2022-08-05

Family

ID=82629217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210376670.8A Pending CN114862449A (en) 2022-04-12 2022-04-12 Method and device for calculating unique natural person identifier, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114862449A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757396A (en) * 2022-10-25 2023-03-07 杭州比智科技有限公司 Oneid project implementation method and oneid project implementation system
CN117349358A (en) * 2023-12-04 2024-01-05 中国电子投资控股有限公司 Data matching and merging method and system based on distributed graph processing framework

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7139765B1 (en) * 2000-04-03 2006-11-21 Alan Balkany Hierarchical method for storing data with improved compression
CN108064069A (en) * 2018-01-16 2018-05-22 深圳市和讯华谷信息技术有限公司 A kind of unique mark of WiFi subnet network determines method, medium and equipment
CN108804670A (en) * 2018-06-11 2018-11-13 腾讯科技(深圳)有限公司 Data recommendation method, device, computer equipment and storage medium
US20190187689A1 (en) * 2016-05-09 2019-06-20 Strong Force Iot Portfolio 2016, Llc Methods and devices for user directed data collection
CN110532479A (en) * 2019-09-05 2019-12-03 北京思维造物信息科技股份有限公司 A kind of information recommendation method, device and equipment
CN113297288A (en) * 2021-04-28 2021-08-24 上海淇玥信息技术有限公司 User real-time label generation method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7139765B1 (en) * 2000-04-03 2006-11-21 Alan Balkany Hierarchical method for storing data with improved compression
US20190187689A1 (en) * 2016-05-09 2019-06-20 Strong Force Iot Portfolio 2016, Llc Methods and devices for user directed data collection
CN108064069A (en) * 2018-01-16 2018-05-22 深圳市和讯华谷信息技术有限公司 A kind of unique mark of WiFi subnet network determines method, medium and equipment
CN108804670A (en) * 2018-06-11 2018-11-13 腾讯科技(深圳)有限公司 Data recommendation method, device, computer equipment and storage medium
CN110532479A (en) * 2019-09-05 2019-12-03 北京思维造物信息科技股份有限公司 A kind of information recommendation method, device and equipment
CN113297288A (en) * 2021-04-28 2021-08-24 上海淇玥信息技术有限公司 User real-time label generation method and device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757396A (en) * 2022-10-25 2023-03-07 杭州比智科技有限公司 Oneid project implementation method and oneid project implementation system
CN117349358A (en) * 2023-12-04 2024-01-05 中国电子投资控股有限公司 Data matching and merging method and system based on distributed graph processing framework
CN117349358B (en) * 2023-12-04 2024-02-20 中国电子投资控股有限公司 Data matching and merging method and system based on distributed graph processing framework

Similar Documents

Publication Publication Date Title
CN114862449A (en) Method and device for calculating unique natural person identifier, electronic equipment and storage medium
CN101620636B (en) Method and apparatus for displaying tabular data
CN100383788C (en) Method for realizing system resources management
CN115544183B (en) Data visualization method, device, computer equipment and storage medium
CN108960672B (en) Quota and time-limited wind control method and device and computer readable storage medium
CN106557307B (en) Service data processing method and system
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN111667018B (en) Object clustering method and device, computer readable medium and electronic equipment
CN113127730A (en) Community detection method based on overlapping communities, terminal equipment and storage medium
CN116719822A (en) Method and system for storing massive structured data
CN116738954A (en) Report export method, report template configuration device and computer equipment
CN113704309B (en) Graph data processing method, device, computer equipment and storage medium
CN111368027B (en) Knowledge graph query method and device based on sparse matrix and computer equipment
CN112685574B (en) Method and device for determining hierarchical relationship of domain terms
CN113364632B (en) Method, device, storage medium and apparatus for generating architecture diagram of service system
CN117455641A (en) Data verification method, device, computer equipment and storage medium
CN116910115A (en) Group query method, device, computer equipment and storage medium
CN114327363A (en) Graph data processing method, graph data processing apparatus, computing device, storage medium, and program product
EP4348442A1 (en) Graph embeddings via node-property-aware fast random projection
CN117725128A (en) Block file storage method, device, computer equipment, storage medium and product
CN118115794A (en) Image classification method based on image neural network
Su et al. On Elegant Labelling and Magic Labelling of Large‐Scale Graphs
WO2022093206A1 (en) Dimensionality reduction
Wu et al. Subspace clustering on mobile data for discovering circle of friends
CN116629984A (en) Product information recommendation method, device, equipment and medium based on embedded model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination