CN112148981A - Method, device, equipment and storage medium for identifying same - Google Patents

Method, device, equipment and storage medium for identifying same Download PDF

Info

Publication number
CN112148981A
CN112148981A CN202011052993.9A CN202011052993A CN112148981A CN 112148981 A CN112148981 A CN 112148981A CN 202011052993 A CN202011052993 A CN 202011052993A CN 112148981 A CN112148981 A CN 112148981A
Authority
CN
China
Prior art keywords
information data
user information
undirected
unique identifier
connected graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011052993.9A
Other languages
Chinese (zh)
Inventor
钟奇
孙昌青
蔡龙颜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Autopilot Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Autopilot Technology Co Ltd filed Critical Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority to CN202011052993.9A priority Critical patent/CN112148981A/en
Publication of CN112148981A publication Critical patent/CN112148981A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to a method, a device, equipment and a storage medium for identifying the same person, wherein the method comprises the following steps: acquiring user information data of each service domain and an association relation between the user information data; constructing a undirected connected graph according to the user information data and the incidence relation, wherein each node in the undirected connected graph corresponds to one user information data, and each undirected edge corresponds to one incidence relation; assigning a unique identifier to each connected subgraph in the undirected connected graph. The embodiment of the invention realizes the same-person identification by utilizing the undirected connected graph, can serially connect the behavior characteristics of the same user in the whole company domain, eliminates data islands, is easy to understand and land, has strong expansibility and low calculation cost, effectively solves the problems of complicated identification process, high technical realization threshold and poor landability in the prior art, and has higher popularization and application value.

Description

Method, device, equipment and storage medium for identifying same
Technical Field
The embodiment of the invention relates to the technical field of big data processing, in particular to a method, a device, equipment and a storage medium for identifying the same person.
Background
With the gradual maturity of internet technology, consumption and behavior habits of people are greatly changed, and the way that people connect to the internet is also diversified. In daily life, a user can access business systems of different domains of a certain company through any one mode of a mobile phone APP, a PC, a WeChat applet, H5 and O2O at any time and any place, so that the user can browse, inquire or consult related interested contents. Accordingly, different behavior feature data of the same user can be generated in business systems of different domains of the company. Before being processed, the behavioral characteristic data may be isolated from each other, which is not only unavailable but also inconvenient to manage, so for the purpose of enhancing data management, many companies may establish a "one-person-one-file" data management service centered on "person", in short, the behavioral characteristic data in various service systems are aggregated, and then the behavioral characteristic data of the same user in the whole company is connected in series, so as to eliminate data islanding. This process needs to be applied to the same-person identification technology, that is, it needs to quickly determine which behavior feature data belong to the same user from a large amount of behavior feature data. At present, in the prior art, the identification of the same person is mostly realized by using MapReduce based on a confidence coefficient and then by using a multi-round iterative convergence mode. MapReduce is a programming model for parallel operation of large-scale data sets (greater than 1 TB). The concepts "Map" and "Reduce" are their main ideas, both borrowed from functional programming languages, and also features borrowed from vector programming languages. The method greatly facilitates programmers to operate programs on the distributed system under the condition of no distributed parallel programming. However, the same-person identification performed in this way has many disadvantages, such as a complicated implementation process, a high threshold for technical implementation, and a poor landing performance.
Therefore, it is necessary to improve the existing peer recognition technology to overcome the above-mentioned drawbacks, or to develop a new peer recognition technology.
Disclosure of Invention
The embodiment of the invention discloses a method, a device, equipment and a storage medium for identifying the same person, which can guide a vehicle to run and further improve the efficiency of a user in driving to reach a destination.
In a first aspect, a method for identifying a person is provided, the method comprising:
acquiring user information data of each service domain and an association relation between the user information data;
constructing a undirected connected graph according to the user information data and the incidence relation, wherein each node in the undirected connected graph corresponds to one user information data, and each undirected edge corresponds to one incidence relation;
assigning a unique identifier to each connected subgraph in the undirected connected graph.
Further, in the method for identifying the same person, after the step of assigning a unique identification code to each connected subgraph in the undirected connected graph, the method further comprises:
periodically acquiring newly added user information data of each service domain and newly added association relation among the user information data;
adding the newly added user information data as a new node into the undirected connected graph;
connecting the user information data associated with each other in the undirected connected graph through the undirected edges according to the newly added association relationship;
assigning a unique identifier to a connected subgraph in the undirected connected graph that is not assigned a unique identifier;
judging whether a connected subgraph with two or more unique identifiers exists in the undirected connected graph;
if yes, selecting one of the two or more unique identifiers as a final unique identifier according to a set rule.
Further, in the method for identifying the same person, the step of selecting one of the two or more unique identifiers as a final unique identifier according to a set rule includes:
selecting the earliest allocation time one from the two or more unique identifiers as a final unique identifier;
alternatively, the first and second electrodes may be,
selecting one of the two or more unique identifiers having the latest allocation time as a final unique identifier;
alternatively, the first and second electrodes may be,
randomly selecting one of the two or more unique identifiers as a final unique identifier.
Further, in the method for identifying the same person, after the step of assigning a unique identification code to each connected subgraph in the undirected connected graph, the method further comprises:
according to the generation time of the user information data, performing statistical analysis on the behavior track of the user corresponding to each connected subgraph to determine the interest degree of each user;
and pushing corresponding content information to a corresponding user or providing corresponding customer service according to the interest degree.
In a second aspect, there is provided a peer identification apparatus, the apparatus comprising:
the first acquisition module is used for acquiring user information data of each service domain and an association relation between the user information data;
the construction module is used for constructing a non-directional connected graph according to the user information data and the incidence relation, each node in the non-directional connected graph corresponds to one user information data, and each non-directional edge corresponds to one incidence relation;
a first assignment module to assign a unique identifier to each connected subgraph in the undirected connected subgraph.
Further, the same person identification device further includes:
a second obtaining module, configured to periodically obtain newly added user information data of each service domain and newly added association relationships between the user information data after the step of allocating a unique identification code to each connected subgraph in the undirected connected graph;
an adding module, configured to add the newly added user information data as a new node to the undirected connected graph;
the association module is used for connecting the user information data associated with each other in the undirected connected graph through the undirected edge according to the newly added association relation;
the second allocating module is used for allocating a unique identifier to the connected subgraph which is not allocated with the unique identifier in the undivided connected subgraph;
the judging module is used for judging whether the connected subgraph with two or more unique identifiers exists in the undirected connected subgraph; if yes, selecting one of the two or more unique identifiers as a final unique identifier according to a set rule.
Further, in the same-person identification device, the judgment module includes a judgment submodule, and the judgment submodule is configured to:
selecting the earliest allocation time one from the two or more unique identifiers as a final unique identifier;
alternatively, the first and second electrodes may be,
selecting one of the two or more unique identifiers having the latest allocation time as a final unique identifier;
alternatively, the first and second electrodes may be,
randomly selecting one of the two or more unique identifiers as a final unique identifier.
Further, the same person identification device further includes:
the analysis module is used for performing statistical analysis on the behavior track of the user corresponding to each connected subgraph according to the generation time of user information data after the step of allocating the unique identification code to each connected subgraph in the undirected connected subgraph so as to determine the interest degree of each user;
and the service module is used for pushing corresponding content information to a corresponding user or providing corresponding customer service according to the interest degree.
In a third aspect, there is provided a computer device comprising a memory storing a computer program and a processor implementing the same person identification method as any one of the above when the processor executes the computer program.
In a fourth aspect, there is provided a storage medium containing computer executable instructions for execution by a computer processor for implementing the method of peer identification as defined in any one of the above.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the undirected connected graph is used for realizing the same-person identification, the behavior characteristics of the same user in the whole company domain can be connected in series, the data island is eliminated, the method is easy to understand and land, the expansibility is strong, the calculation cost is low, the problems of complex identification process, high technical realization threshold and poor land landability in the prior art are effectively solved, and the method has high popularization and application values.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a method for identifying a fellow person according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a construction process of a connectionless connectivity graph in accordance with an embodiment of the present invention;
fig. 3 is a schematic flow chart of a method for identifying a same person according to a second embodiment of the present invention;
FIG. 4 is a schematic diagram of a construction process of a connectionless connectivity graph in accordance with a second embodiment of the present invention;
FIG. 5 is a schematic diagram of a construction process of a connectionless connectivity graph in accordance with a second embodiment of the present invention;
fig. 6 is a schematic flow chart of a method for identifying a same person according to a third embodiment of the present invention;
fig. 7 is a functional block diagram of a device for identifying a same person according to a fourth embodiment of the present invention;
fig. 8 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention.
Detailed Description
The embodiment of the invention discloses a method, a device, equipment and a storage medium for identifying the same person, which can quickly and efficiently realize the identification of the same person and provide an information view with complete user dimension so as to better support downstream requirements of user portraits, a submarine passenger mining engine and the like.
The technical solution of the present invention will be further described in detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
In view of the above-mentioned drawbacks of the existing peer recognition, the present inventors have conducted extensive practical experience and professional knowledge for many years based on a big data algorithm, and have actively studied and innovated in cooperation with the application of theory, in order to create a peer recognition method with easy understanding, easy landing, strong expandability and low calculation cost, so that the method has higher practicability. After continuous research, design and repeated trial and improvement, the invention with practical value is finally created.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for identifying a fellow person according to an embodiment of the present invention. As shown in fig. 1, the method for identifying the same person may include the following steps:
s101, obtaining user information data of each service domain and an association relation between the user information data.
It should be noted that the user information data refers to an identification ID (such as a device ID, a mobile phone number, an IP address, and the like) of the user, and may also include browsing time, browsing content, and the like as needed.
Since the service data of the same user can be connected to the service systems of different domains of the company through the mobile phone APP, PC, wechat applet, H5, O2O, and the like, and then each service system can give an identification ID to the user to distinguish different users, the user will have a plurality of different identification IDs in the service systems of different domains, and in order to eliminate data islands, the embodiment needs to connect the user information data of the user in the company universe in series.
In addition, impurities except for the data required by the user inevitably exist in the massive data, so that after the data is acquired, the data needs to be cleaned to remove noise, and the reliability of the data can be ensured. Generally, noise cleaning methods are many, such as a binning method, a clustering method and a regression method, different denoising methods have different advantages, and all noise cleaning methods can basically perform all-around noise cleaning on the noise existing in the data.
S102, constructing a non-directional connected graph according to the user information data and the incidence relation, wherein each node in the non-directional connected graph corresponds to one user information data, and each non-directional edge corresponds to one incidence relation.
It should be noted that, in this embodiment, all the user information data are considered as nodes on the undirected connected graph one by one, and if an association relationship exists between any two user information data, the nodes corresponding to the two user information data with the association relationship on the undirected connected graph are connected by using a undirected edge, so as to form a complete undirected connected graph structure.
S103, assigning a unique identifier to each connected subgraph in the undirected connected subgraph.
It should be noted that, in the undirected connected graph, after all nodes having association relation are connected to an undirected edge, originally, the nodes independent from each other form a connected subgraph. Two or more nodes with undirected edges may exist in one connected subgraph, or only one node may exist, but all the nodes in each connected subgraph are regarded as behavior tracks left by the same user in business systems of different domains of a company in different connection modes.
For the purpose of easily distinguishing different connected subgraphs, the present embodiment will assign a stable, unique and persistent GID (Group ID, i.e. Group ID) to each connected subgraph through the GID generator to identify each connected subgraph. Specifically, operations such as addition and fusion may be involved in the GID generation process, where fusion refers to an operation in which two originally independent connected subgraphs are merged into one connected subgraph after an association relationship exists, so that the two GIDs are integrated into one GID. All operation histories of the embodiment can be traced back completely, and a precise and uniform view is provided for subsequent data analysis and mining.
In order to more clearly show the implementation of the embodiments of the present invention, a detailed description is provided below with reference to a specific example.
As shown in fig. 2, it is assumed that a service starts from 06, 01 of 2020, and 8 pieces of user information data, d1, d3, ls1, ls2, ls3, m2, u1, and u2, and 6 corresponding association relations are obtained from channels such as APP logs, CRM data tables, web page logs, and mobile phone binding relations on that day, where d represents a device (device), ls represents a user unique identifier logged in each platform from a web page side, m represents a mobile phone number, and u represents a user ID.
By constructing an undirected connected graph based on the 8 pieces of user information data and the corresponding 6 pieces of association relations, we can see that where d3 and ls3 are temporally isolated nodes, d1, u1, m2, and ls1 are connected to each other, and ls2 and u2 are connected to each other, so that there are four connected subgraphs in the day, and the GID generator assigns unique identifiers to the 4 connected subgraphs, i.e., GIDs:
G1(d1,u1,m2,ls1),G2(ls2,u2),G3(d3),G4(ls3);
thus, all behaviors of all business systems of the same user in different domains of the company can be represented by the same GID, such as G1, so that the behaviors of the user can be concatenated to provide a god perspective for subsequent data analysis and mining.
The embodiment of the invention provides a same-person identification method, which realizes same-person identification by utilizing a directionless connected graph, can serially connect behavior characteristics of the same user in the whole company, eliminates data islands, is easy to understand and land, has strong expansibility and low calculation cost, effectively solves the problems of complex identification process, high technical implementation threshold and poor landability in the prior art, and has higher popularization and application values.
Example two
Referring to fig. 3, fig. 3 is a schematic flow chart of another method for identifying the same person according to the embodiment of the invention. In this embodiment, on the basis of the technical solution provided in the first embodiment, after a unique identifier is assigned to each connected subgraph in the undirected connected graph, the method is further optimized. The explanation of the same or corresponding terms as those in the above embodiments is not repeated herein, and specifically, the method provided in this embodiment may further include the following steps:
periodically acquiring newly added user information data of each service domain and newly added association relation among the user information data;
adding the newly added user information data as a new node into the undirected connected graph;
connecting the user information data associated with each other in the undirected connected graph through the undirected edges according to the newly added association relationship;
assigning a unique identifier to a connected subgraph in the undirected connected graph that is not assigned a unique identifier;
judging whether a connected subgraph with two or more unique identifiers exists in the undirected connected graph;
if yes, selecting one of the two or more unique identifiers as a final unique identifier according to a set rule.
Based on the above optimization, as shown in fig. 3, the method for identifying the same person provided in this embodiment may specifically include the following steps:
s201, obtaining user information data of each service domain and an association relation between the user information data.
S202, constructing a non-directional connected graph according to the user information data and the incidence relation, wherein each node in the non-directional connected graph corresponds to one user information data, and each non-directional edge corresponds to one incidence relation.
And S203, assigning a unique identifier to each connected subgraph in the undirected connected subgraph.
S204, periodically acquiring newly added user information data of each service domain and newly added association relation among the user information data.
It should be noted that, some user information data are newly added to the service systems of different domains of a company every day, and the user information data need to be maintained at regular time, that is, need to be classified into an undirected connected graph. The obtained period duration can be one hour, twelve hours and one day, and the value of the period duration is not limited in the application.
S205, adding the newly added user information data as a new node into the undirected connected graph.
It should be noted that, categorized user information data and/or association relationship may exist in the newly added data obtained periodically, so that categorized user information data and/or association relationship need to be removed first to obtain unsorted user information data and/or association relationship, and then the unsorted data need to be integrated. Specifically, the unclassified user information data is added to the undirected connected graph as a new node.
S206, connecting the user information data associated with each other in the undirected connected graph through the undirected edges according to the newly added association relationship.
It should be noted that after the unclassified user information data is added as a new node to the undirected connected graph, according to the unclassified association relationship, the nodes having the association relationship in the undirected connected graph need to be connected by the undirected edges, so that the originally isolated smaller connected subgraphs in the undirected connected graph and composed of a single node are merged into a larger connected subgraph, or two or more larger connected subgraphs are merged into a larger connected subgraph.
And S207, allocating a unique identifier to the connected subgraph which is not allocated with the unique identifier in the undirected connected graph.
It should be noted that the connected subgraph to which no GID is assigned refers to a connected subgraph composed of newly added user information data and/or association relations. Similarly, the newly added connected subgraph is endowed with a stable, unique and persistent GID through the GID generator so as to identify the connected subgraph.
S208, judging whether a connected subgraph with two or more unique identifiers exists in the undirected connected graph; if so, step S209 is executed, otherwise, the flow ends.
It should be noted that, since the originally isolated connected subgraphs all have a corresponding GID, a plurality of GIDs exist in the combined connected subgraphs, which is very inconvenient for subsequent data integration and analysis, and therefore, only one of the GIDs needs to be retained.
And S209, selecting one of the two or more unique identifiers as a final unique identifier according to a set rule.
In the present embodiment, it is selectable as to which GID is to be retained, for example, according to the assigned time (generation time) of the GID or following some predetermined logic.
Preferably, the step S209 may further include the steps of:
selecting the earliest allocation time one from the two or more unique identifiers as a final unique identifier;
alternatively, the first and second electrodes may be,
selecting one of the two or more unique identifiers having the latest allocation time as a final unique identifier;
alternatively, the first and second electrodes may be,
randomly selecting one of the two or more unique identifiers as a final unique identifier.
It should be noted that, taking the allocation time as an example, since the allocation of each GID is sequential, we may follow the logic of allocating first and then having a high priority level, and specify that in the subsequent merging process of connected subgraphs, the GID with the earliest allocation time, that is, the GID with the highest priority level, is selected as the earliest GID.
In order to show the implementation process of the embodiment of the invention more clearly, further improvement is made below on the basis of the example given in the embodiment.
Assume first that the GIDs in the example of the embodiment are assigned in the order from the beginning to the end: g1, G2, G3, G4;
as shown in fig. 4, the newly added nodes and the associated relationships in the year 2020, 06, 02 are: (d3, u3), (m5), (ls3, u2), it is now necessary to add these nodes to the undirected connected graph that was cured the previous day.
Firstly, because the association relation of the undirected edge (d3, u3) is newly added, d3 in the undirected connected graph is no longer an isolated node, and because d3 already has a GID (G3), the existing G3 is reused instead of a new GID, and then G3(d3, u3) is obtained;
secondly, since the relationship of the undirected edge is newly added (ls3, u2), ls3 is no longer an isolated node, but ls3 appears in the previous day and is already allocated with a GID (G4), so that the problem of merging of G2(ls2, u2) and G4(ls3) is faced at this time. Illustratively, since G2 has an earlier distribution time than G4, in order to make GID as stable as possible, when G2 needs to be merged with G4, G2 having an earlier distribution time will be left as GID of connected subgraphs (ls2, u2, ls 3);
finally, as the newly added m5 has no association relationship between other nodes and the newly added m5 in the day, the newly added m5 is temporarily used as an isolated node to be placed in the undirected connected graph, and then the GID generator distributes GID to the node, so that the connected subgraph G5(m5) is obtained;
thus, the result of the same person identification on this day is: g1(d1, u1, m2, ls1), G2(ls2, u2, ls3), G3(d3, u3), G5(m 5);
as shown in fig. 5, the newly added nodes and the associated relationships in 2020, 06, 03 are: (ls1, u2), (m5, u3), (ls3, u7), (d6, u6), it is now necessary to add these nodes to the undirected communication map that was cured the previous day.
Firstly, two connected subgraphs of G1(d1, u1, m2, ls1) and G2(ls2, u2, ls3) can be fused into a larger connected subgraph according to the relationship of (ls1, u2), the distribution time of G1 is earlier than that of G2, so that G1 is left as GID of the fused connected subgraph (d1, u1, m2, ls1, ls2, u2, ls 3);
secondly, the association relationship (m5, u3) can fuse the connected subgraph of G3(d3, u3) and G5(m5) which is originally an isolated node into a larger connected subgraph, and the distribution time of G3 is earlier than that of G5, so that G3 is left as GID of the fused connected subgraph (m5, u3, d 3);
furthermore, for the association relationship (ls3, u7), since ls3 is already the node connecting the subgraph G1, the newly added node u7 will be added to G1;
finally, for the association relationship of (d6, u6), d6 and u6 are both newly added nodes, and the GID generator will allocate new GIDs to the 2 nodes, i.e. G6(d6, u 6);
thus, the result of the same person identification on this day is: g1(d1, u1, m2, ls1, ls2, u2, ls3, u7), G3(m5, u3, d3), G6(d6, u 6).
The embodiment of the invention has the beneficial effect of the first embodiment, and solves the problem of classifying the newly added data every day from the initial day, namely, the newly added user information data and/or the association relation are/is regularly merged into the previously solidified undirected connected graph, so that all behavior tracks of each person can be finally integrated into one connected subgraph as much as possible, and the accuracy of data analysis can be improved.
EXAMPLE III
Referring to fig. 6, fig. 6 is a schematic flow chart illustrating another method for identifying a fellow person according to the embodiment of the present invention. In this embodiment, on the basis of the technical solution provided in the first embodiment, after a unique identifier is assigned to each connected subgraph in the undirected connected graph, the method is further optimized. The explanation of the same or corresponding terms as those in the above embodiments is not repeated herein, and specifically, the method provided in this embodiment may further include the following steps:
according to the generation time of the user information data, performing statistical analysis on the behavior track of the user corresponding to each connected subgraph to determine the interest degree of each user;
and pushing corresponding content information to a corresponding user or providing corresponding customer service according to the interest degree.
Based on the above optimization, as shown in fig. 6, the method for identifying the same person provided in this embodiment may specifically include the following steps:
s301, obtaining user information data of each service domain and the incidence relation between the user information data.
S302, constructing a non-directional connected graph according to the user information data and the incidence relation, wherein each node in the non-directional connected graph corresponds to one user information data, and each non-directional edge corresponds to one incidence relation.
S303, assigning a unique identifier to each connected subgraph in the undirected connected subgraph.
S304, according to the generation time of the user information data, performing statistical analysis on the behavior track of the user corresponding to each connected subgraph to determine the interest degree of each user.
It should be noted that if it is known through statistical analysis of the behavior trace of the user that the user has recently visited the business systems of different domains of the company in various access ways more frequently, the user is interested, and therefore the user can be listed as a highly available client to treat.
S305, pushing corresponding content information to a corresponding user or providing corresponding customer service according to the interest degree.
It should be noted that, with regard to what interest level, what content information is pushed or what customer service is provided, a threshold matching manner may be specifically adopted, that is, when the interest level reaches a certain threshold, a customer service operation corresponding to the threshold is executed, such as pushing a car purchasing coupon message, a nearest website and/or manual customer service telephone contact.
The embodiment of the invention has the beneficial effects of the first embodiment, and the interesting degree of the user can be obtained by analyzing the behavior track of the user, so that the content push and the customer service can be performed more specifically, and the use experience of the user is improved.
Example four
Fig. 7 is a schematic diagram of functional modules of a peer recognition apparatus according to a sixth embodiment of the present invention, where the apparatus is adapted to execute the peer recognition method according to the sixth embodiment of the present invention. The device specifically comprises the following functional modules:
a first obtaining module 41, configured to obtain user information data of each service domain and an association relationship between the user information data;
a construction module 42, configured to construct a directed connectivity graph according to the user information data and the association relationship, where each node in the directed connectivity graph corresponds to one user information data, and each directed edge corresponds to one association relationship;
a first assigning module 43 for assigning a unique identifier to each connected subgraph in the undirected connected graph.
Preferably, the above apparatus further comprises:
a second obtaining module, configured to periodically obtain newly added user information data of each service domain and newly added association relationships between the user information data after the step of allocating a unique identification code to each connected subgraph in the undirected connected graph;
an adding module, configured to add the newly added user information data as a new node to the undirected connected graph;
the association module is used for connecting the user information data associated with each other in the undirected connected graph through the undirected edge according to the newly added association relation;
the second allocating module is used for allocating a unique identifier to the connected subgraph which is not allocated with the unique identifier in the undivided connected subgraph;
the judging module is used for judging whether the connected subgraph with two or more unique identifiers exists in the undirected connected subgraph; if yes, selecting one of the two or more unique identifiers as a final unique identifier according to a set rule.
Preferably, in the above apparatus, the determining module includes a determining submodule, and the determining submodule is configured to:
selecting the earliest allocation time one from the two or more unique identifiers as a final unique identifier;
alternatively, the first and second electrodes may be,
selecting one of the two or more unique identifiers having the latest allocation time as a final unique identifier;
alternatively, the first and second electrodes may be,
randomly selecting one of the two or more unique identifiers as a final unique identifier.
Preferably, the above apparatus further comprises:
the analysis module is used for performing statistical analysis on the behavior track of the user corresponding to each connected subgraph according to the generation time of user information data after the step of allocating the unique identification code to each connected subgraph in the undirected connected subgraph so as to determine the interest degree of each user;
and the service module is used for pushing corresponding content information to a corresponding user or providing corresponding customer service according to the interest degree.
The embodiment of the invention provides a same-person identification device, which realizes same-person identification by utilizing a directionless connected graph, can serially connect behavior characteristics of the same user in the whole company, eliminates data islands, is easy to understand and land, has strong expansibility and low calculation cost, effectively solves the problems of complex identification process, high technical implementation threshold and poor landability in the prior art, and has higher popularization and application values.
The device can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
EXAMPLE five
Fig. 8 is a schematic structural diagram of a computer device according to a fifth embodiment of the present invention. FIG. 8 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 8 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present invention.
As shown in FIG. 8, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, and commonly referred to as a "hard drive"). Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 8, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing, such as implementing the human identification method provided by the embodiment of the present invention, by executing programs stored in the system memory 28.
That is, the processing unit implements, when executing the program: acquiring user information data of each service domain and an association relation between the user information data; constructing a undirected connected graph according to the user information data and the incidence relation, wherein each node in the undirected connected graph corresponds to one user information data, and each undirected edge corresponds to one incidence relation; assigning a unique identifier to each connected subgraph in the undirected connected graph.
EXAMPLE six
Sixth embodiment of the present invention provides a computer-readable storage medium, on which computer-executable instructions are stored, where the instructions, when executed by a processor, implement the method for identifying a person as provided in all inventive embodiments of this application:
that is, the processing unit implements, when executing the program: acquiring user information data of each service domain and an association relation between the user information data; constructing a undirected connected graph according to the user information data and the incidence relation, wherein each node in the undirected connected graph corresponds to one user information data, and each undirected edge corresponds to one incidence relation; assigning a unique identifier to each connected subgraph in the undirected connected graph.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for identifying a person, the method comprising:
acquiring user information data of each service domain and an association relation between the user information data;
constructing a undirected connected graph according to the user information data and the incidence relation, wherein each node in the undirected connected graph corresponds to one user information data, and each undirected edge corresponds to one incidence relation;
assigning a unique identifier to each connected subgraph in the undirected connected graph.
2. The peer identification method of claim 1, wherein after the step of assigning a unique identification code to each connected subgraph in the undirected connected graph, the method further comprises:
periodically acquiring newly added user information data of each service domain and newly added association relation among the user information data;
adding the newly added user information data as a new node into the undirected connected graph;
connecting the user information data associated with each other in the undirected connected graph through the undirected edges according to the newly added association relationship;
assigning a unique identifier to a connected subgraph in the undirected connected graph that is not assigned a unique identifier;
judging whether a connected subgraph with two or more unique identifiers exists in the undirected connected graph;
if yes, selecting one of the two or more unique identifiers as a final unique identifier according to a set rule.
3. The method of claim 2, wherein the step of selecting one of the two or more unique identifiers as a final unique identifier according to a set rule comprises:
selecting the earliest allocation time one from the two or more unique identifiers as a final unique identifier;
alternatively, the first and second electrodes may be,
selecting one of the two or more unique identifiers having the latest allocation time as a final unique identifier;
alternatively, the first and second electrodes may be,
randomly selecting one of the two or more unique identifiers as a final unique identifier.
4. The peer identification method of claim 1, wherein after the step of assigning a unique identification code to each connected subgraph in the undirected connected graph, the method further comprises:
according to the generation time of the user information data, performing statistical analysis on the behavior track of the user corresponding to each connected subgraph to determine the interest degree of each user;
and pushing corresponding content information to a corresponding user or providing corresponding customer service according to the interest degree.
5. A peer identification device, the device comprising:
the first acquisition module is used for acquiring user information data of each service domain and an association relation between the user information data;
the construction module is used for constructing a non-directional connected graph according to the user information data and the incidence relation, each node in the non-directional connected graph corresponds to one user information data, and each non-directional edge corresponds to one incidence relation;
a first assignment module to assign a unique identifier to each connected subgraph in the undirected connected subgraph.
6. The peer recognition device of claim 5, further comprising:
a second obtaining module, configured to periodically obtain newly added user information data of each service domain and newly added association relationships between the user information data after the step of allocating a unique identification code to each connected subgraph in the undirected connected graph;
an adding module, configured to add the newly added user information data as a new node to the undirected connected graph;
the association module is used for connecting the user information data associated with each other in the undirected connected graph through the undirected edge according to the newly added association relation;
the second allocating module is used for allocating a unique identifier to the connected subgraph which is not allocated with the unique identifier in the undivided connected subgraph;
the judging module is used for judging whether the connected subgraph with two or more unique identifiers exists in the undirected connected subgraph; if yes, selecting one of the two or more unique identifiers as a final unique identifier according to a set rule.
7. The device of claim 6, wherein the determining module comprises a determining submodule configured to:
selecting the earliest allocation time one from the two or more unique identifiers as a final unique identifier;
alternatively, the first and second electrodes may be,
selecting one of the two or more unique identifiers having the latest allocation time as a final unique identifier;
alternatively, the first and second electrodes may be,
randomly selecting one of the two or more unique identifiers as a final unique identifier.
8. The peer recognition device of claim 5, further comprising:
the analysis module is used for performing statistical analysis on the behavior track of the user corresponding to each connected subgraph according to the generation time of user information data after the step of allocating the unique identification code to each connected subgraph in the undirected connected subgraph so as to determine the interest degree of each user;
and the service module is used for pushing corresponding content information to a corresponding user or providing corresponding customer service according to the interest degree.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor, when executing the computer program, implements the method of identifying a fellow person as claimed in any of claims 1 to 4.
10. A storage medium containing computer executable instructions for execution by a computer processor to implement the method of peer identification according to any one of claims 1 to 4.
CN202011052993.9A 2020-09-29 2020-09-29 Method, device, equipment and storage medium for identifying same Pending CN112148981A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011052993.9A CN112148981A (en) 2020-09-29 2020-09-29 Method, device, equipment and storage medium for identifying same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011052993.9A CN112148981A (en) 2020-09-29 2020-09-29 Method, device, equipment and storage medium for identifying same

Publications (1)

Publication Number Publication Date
CN112148981A true CN112148981A (en) 2020-12-29

Family

ID=73894310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011052993.9A Pending CN112148981A (en) 2020-09-29 2020-09-29 Method, device, equipment and storage medium for identifying same

Country Status (1)

Country Link
CN (1) CN112148981A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113328888A (en) * 2021-05-31 2021-08-31 上海明略人工智能(集团)有限公司 Private domain flow ID processing method, system, medium and equipment
CN113486218A (en) * 2021-09-06 2021-10-08 北京世纪好未来教育科技有限公司 Data processing method and device, electronic equipment and storage medium
CN115730251A (en) * 2022-12-06 2023-03-03 贝壳找房(北京)科技有限公司 Relationship recognition method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902977A (en) * 2014-03-31 2014-07-02 华为技术有限公司 Face identification method and device based on Gabor binary mode
US8972398B1 (en) * 2011-02-28 2015-03-03 Google Inc. Integrating online search results and social networks
CN108985954A (en) * 2018-07-02 2018-12-11 武汉斗鱼网络科技有限公司 A kind of method and relevant device of incidence relation that establishing each mark
CN110533085A (en) * 2019-08-12 2019-12-03 大箴(杭州)科技有限公司 With people's recognition methods and device, storage medium, computer equipment
KR20200010658A (en) * 2018-06-29 2020-01-31 한양대학교 산학협력단 Method for identifing person, computing system and program using the same
CN110825919A (en) * 2018-07-23 2020-02-21 阿里巴巴集团控股有限公司 ID data processing method and device
CN110929173A (en) * 2019-12-05 2020-03-27 深圳前海微众银行股份有限公司 Method, device, equipment and medium for identifying same person
CN111444350A (en) * 2020-03-20 2020-07-24 支付宝(杭州)信息技术有限公司 Method and device for predicting identity label of user and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8972398B1 (en) * 2011-02-28 2015-03-03 Google Inc. Integrating online search results and social networks
CN103902977A (en) * 2014-03-31 2014-07-02 华为技术有限公司 Face identification method and device based on Gabor binary mode
KR20200010658A (en) * 2018-06-29 2020-01-31 한양대학교 산학협력단 Method for identifing person, computing system and program using the same
CN108985954A (en) * 2018-07-02 2018-12-11 武汉斗鱼网络科技有限公司 A kind of method and relevant device of incidence relation that establishing each mark
CN110825919A (en) * 2018-07-23 2020-02-21 阿里巴巴集团控股有限公司 ID data processing method and device
CN110533085A (en) * 2019-08-12 2019-12-03 大箴(杭州)科技有限公司 With people's recognition methods and device, storage medium, computer equipment
CN110929173A (en) * 2019-12-05 2020-03-27 深圳前海微众银行股份有限公司 Method, device, equipment and medium for identifying same person
CN111444350A (en) * 2020-03-20 2020-07-24 支付宝(杭州)信息技术有限公司 Method and device for predicting identity label of user and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴梅梅: "《机器学习算法及其应用》", 31 May 2020, 机械工业出版社 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113328888A (en) * 2021-05-31 2021-08-31 上海明略人工智能(集团)有限公司 Private domain flow ID processing method, system, medium and equipment
CN113486218A (en) * 2021-09-06 2021-10-08 北京世纪好未来教育科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113486218B (en) * 2021-09-06 2021-12-14 北京世纪好未来教育科技有限公司 Data processing method and device, electronic equipment and storage medium
CN115730251A (en) * 2022-12-06 2023-03-03 贝壳找房(北京)科技有限公司 Relationship recognition method

Similar Documents

Publication Publication Date Title
CN112148981A (en) Method, device, equipment and storage medium for identifying same
Cordeiro et al. Evolving networks and social network analysis methods and techniques
US10409828B2 (en) Methods and apparatus for incremental frequent subgraph mining on dynamic graphs
CN102402652B (en) Method, system and terminal for controlling authority
CN103399887A (en) Query and statistical analysis system for mass logs
CN107807932B (en) Hierarchical data management method and system based on path enumeration
Yu et al. Mass log data processing and mining based on Hadoop and cloud computing
CN108171528B (en) Attribution method and attribution system
WO2022142859A1 (en) Data processing method and apparatus, computer readable medium, and electronic device
CN111368013A (en) Unified identification method, system, equipment and storage medium based on multiple accounts
CN110555172A (en) user relationship mining method and device, electronic equipment and storage medium
US10216802B2 (en) Presenting answers from concept-based representation of a topic oriented pipeline
CN115858488A (en) Parallel migration method and device based on data governance and readable medium
US20210089623A1 (en) Context-based topic recognition using natural language processing
US20170214588A1 (en) Enterprise cloud garbage collector
Inoue et al. Analysis of cooperative research and development networks on Japanese patents
CN110555732B (en) Marketing strategy pushing method and device and marketing strategy operation platform
CN111127232B (en) Method, device, server and medium for discovering interest circle
KR20220155377A (en) Account identification method, identification device, electronic device and computer readable medium
US20170344454A1 (en) Determining dynamic statistics based on key value patterns
CN102724290A (en) Method, device and system for getting target customer group
JP7410040B2 (en) Determining query-aware resiliency in virtual agent systems
CN112241474A (en) Information processing method, device and storage medium
US20230029218A1 (en) Feature engineering using interactive learning between structured and unstructured data
CN110888695A (en) Method and device for generating page based on permission

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination