CN113177064A - Data management method and device - Google Patents

Data management method and device Download PDF

Info

Publication number
CN113177064A
CN113177064A CN202110483979.2A CN202110483979A CN113177064A CN 113177064 A CN113177064 A CN 113177064A CN 202110483979 A CN202110483979 A CN 202110483979A CN 113177064 A CN113177064 A CN 113177064A
Authority
CN
China
Prior art keywords
data
network data
user
temporary
data group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110483979.2A
Other languages
Chinese (zh)
Inventor
俞子轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dazhu Hangzhou Technology Co ltd
Original Assignee
Dazhu Hangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dazhu Hangzhou Technology Co ltd filed Critical Dazhu Hangzhou Technology Co ltd
Priority to CN202110483979.2A priority Critical patent/CN113177064A/en
Publication of CN113177064A publication Critical patent/CN113177064A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Abstract

The application discloses a data management method and a device, wherein the method comprises the following steps: acquiring each network data group of a user; performing data matching based on the feature data in each network data group and the target feature data of each historical user in a database; under the condition that at least one feature data in a first network data group is matched with a target feature data of a first historical user in the database, acquiring identification information of the first historical user so as to associate the first network data group with the identification information; under the condition that all characteristic data in a first network data group are not matched with all target characteristic data of all first historical users in the database, generating identification information of the users based on the first network data group; wherein the identification information is used for characterizing the identity of the user. The method and the device realize the unified management of each network data group of the same user.

Description

Data management method and device
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a data management method and apparatus.
Background
In the internet era, account numbers are usually required to be registered for different application platforms to log in, so that each user has a plurality of virtual identities/account numbers on the online, the IDs of the account numbers are different, such as qq nicknames, WeChat names and the like, different information is stored in the different account numbers, and the information of the account numbers is independent and unrelated, so that whether the account numbers belong to the same user cannot be accurately identified. Therefore, the network data on the line of the same user cannot be accurately and comprehensively managed.
Disclosure of Invention
The embodiment of the application adopts the following technical scheme: the embodiment of the application adopts the following technical scheme: a data management method and device mainly aim to solve the problem that the online data of the same user cannot be accurately and comprehensively managed in the prior art.
In order to solve the above problem, the present application provides a data management method, including:
acquiring each network data group of a user;
performing data matching based on the feature data in each network data group and the target feature data of each historical user in a database;
under the condition that at least one feature data in a first network data group is matched with a target feature data of a first historical user in the database, acquiring identification information of the first historical user so as to associate the first network data group with the identification information;
under the condition that all characteristic data in a first network data group are not matched with all target characteristic data of all first historical users in the database, generating identification information of the users based on the first network data group; wherein the identification information is used for characterizing the identity of the user.
Optionally, the method further includes: generating temporary identifications corresponding to the network data groups;
associating the first network data group with the identification information specifically includes: and establishing an association relation between the temporary identifier and the identification information so as to associate the first network data group with the identification information.
Optionally, the network data group includes any one or more of the following data: account information, network address of the account, contact information, user identity information, user occupation information and user preference information;
the characteristic data comprises one or more of the following data: user name, contact information and user identification card information.
Optionally, the method further includes:
establishing a first temporary data table based on each network data group and the temporary identifier corresponding to each network data group;
acquiring a data matching table based on a database; the data matching table comprises historical network data sets of historical users and temporary identifications of the historical network data sets, and the temporary identifications of the historical network data sets are associated with identification information of the users;
the data matching based on the feature data in each network data group and the target feature data of each historical user in the database specifically comprises:
and matching based on the characteristic data in each network data group in the first temporary data table and the target characteristic data in each historical network data group in the data matching table.
Optionally, when at least one feature data in the first network data group is matched with a target feature data of a first historical user in the database, acquiring identification information of the first historical user to associate the first network data group with the identification information, specifically including:
under the condition that each feature data in a first network data group in the first temporary data table is matched with each target feature data of a first historical user in the data matching table, performing first marking on a temporary identifier of the first network data group, and generating a second temporary data table based on the temporary identifier subjected to the first marking and the corresponding first network data group; the label information of the first mark comprises identification information of a first historical user matched with a first network data group;
under the condition that partial feature data in a first network data group in the first temporary data table are matched with partial target feature data of a first historical user in the data matching table, performing second marking on a temporary identifier of the first network data group, and generating a third temporary data table based on the temporary identifier subjected to the second marking and the corresponding first network data group; the label information of the second mark comprises identification information of a first historical user matched with the first network data group;
updating the data matching table based on the second temporary data table and the third temporary data table;
and grouping the temporary identifications based on the identification information of the first historical user in the label information of each temporary identification in the data matching table, and grouping the temporary identifications containing the identification information of the same first historical user to associate each first network data group with the identification information of the first historical user.
Optionally, the generating, based on the first network data group, the identification information of the user when each feature data in the first network data group is not matched with each target feature data of each first historical user in the database specifically includes:
under the condition that all feature data in a first network data group in the first temporary data table are not matched with all target feature data of a first historical user in the data matching table, performing third marking on the temporary identifier of the first network data group, and adding the temporary identifier subjected to the third marking and the corresponding first network data group to a third temporary data table; the label information of the third label comprises prompt information used for representing that the first network data group is not matched;
updating the data matching table based on the third temporary data table;
and generating identification information of the user for the corresponding first network data group based on prompt information in the label information of each temporary identification in the data matching table.
Optionally, the method further includes:
determining the confidence of each feature data in the network data set based on the source of the network data set;
in the case that the first part of feature data in the first network data group in the first temporary data table matches with the partial target feature data of the first historical user in the data matching table, and the second part of feature data in the network data group matches with the partial target feature data of the second historical user, the method further comprises:
acquiring a first confidence coefficient of target characteristic data of a first historical user in a data matching table;
acquiring a second confidence coefficient of target characteristic data of a second historical user in the data matching table;
and determining that the first network data group is matched with a first historical user or a second historical user based on the first confidence degree and the second confidence degree so as to perform second marking on the temporary identification of the first network data group.
In order to solve the above problem, the present application provides a data management apparatus, including:
the acquisition module is used for acquiring each network data group of the user;
the matching module is used for performing data matching on the characteristic data in each network data group and the target characteristic data of each historical user in the database;
the association module is used for acquiring the identification information of the first historical user under the condition that at least one feature data in a first network data group is matched with a target feature data of the first historical user in the database so as to associate the first network data group with the identification information;
the first generation module is used for generating identification information of the user based on the first network data group under the condition that all feature data in the first network data group are not matched with all target feature data of all first historical users in the database; wherein the identification information is used for characterizing the identity of the user.
Optionally, the data management apparatus in this embodiment further includes:
the second generation module is used for generating temporary identifications corresponding to the network data groups;
the association module is specifically configured to: and establishing an association relation between the temporary identifier and the identification information so as to associate the first network data group with the identification information.
Optionally, the data management apparatus in this embodiment further includes: a first establishing module and an obtaining module;
the first establishing module is used for: establishing a first temporary data table based on each network data group and the temporary identifier corresponding to each network data group;
the obtaining module is configured to: acquiring a data matching table based on a database; the data matching table comprises historical network data sets of historical users and temporary identifications of the historical network data sets, and the temporary identifications of the historical network data sets are associated with identification information of the users;
the matching module is specifically configured to: and matching based on the characteristic data in each network data group in the first temporary data table and the target characteristic data in each historical network data group in the data matching table.
According to the method and the device, a network data set of a user is obtained from the Internet, then whether the user is the same as a historical user in a database is judged by matching characteristic data in the network data set with characteristic data of the historical user in the database, if the user is the same, the network data set of the user is associated with identification information of the corresponding user in the database, if the user is not matched, the user does not exist in the database, one identification information is established for the user, the network data set for the user is associated with the identification information, therefore, the fact that a plurality of network data sets are associated with the identification information of the same user is achieved, and further unified management of the network data sets of the same user is achieved.
Drawings
FIG. 1 is a flowchart of a data management method according to an embodiment of the present application;
fig. 2 is a block diagram of a data management apparatus according to another embodiment of the present application.
Detailed Description
Various aspects and features of the present application are described herein with reference to the drawings.
It will be understood that various modifications may be made to the embodiments of the present application. Accordingly, the foregoing description should not be construed as limiting, but merely as exemplifications of embodiments. Those skilled in the art will envision other modifications within the scope and spirit of the application.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the application and, together with a general description of the application given above and the detailed description of the embodiments given below, serve to explain the principles of the application.
These and other characteristics of the present application will become apparent from the following description of preferred forms of embodiment, given as non-limiting examples, with reference to the attached drawings.
It is also to be understood that although the present application has been described with reference to some specific examples, those skilled in the art are able to ascertain many other equivalents to the practice of the present application.
The above and other aspects, features and advantages of the present application will become more apparent in view of the following detailed description when taken in conjunction with the accompanying drawings.
Specific embodiments of the present application are described hereinafter with reference to the accompanying drawings; however, it is to be understood that the disclosed embodiments are merely exemplary of the application, which can be embodied in various forms. Well-known and/or repeated functions and constructions are not described in detail to avoid obscuring the application of unnecessary or unnecessary detail. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present application in virtually any appropriately detailed structure.
The specification may use the phrases "in one embodiment," "in another embodiment," "in yet another embodiment," or "in other embodiments," which may each refer to one or more of the same or different embodiments in accordance with the application.
An embodiment of the present application provides a data management method, which may be specifically applied to the field of internet to implement management of network data in the internet, and as shown in fig. 1, the data management method in the embodiment includes the following steps:
step S101, acquiring each network data group of a user;
the network data group in this embodiment refers to a group of network data, which may include any one or more of the following information: account information, network address of the account, contact information, user identity information, user occupation information, user preference information, and the like. In this embodiment, the following method may be adopted to obtain the network data set: when a user registers an account in a certain application, acquiring a network data group by acquiring registration information; or the user can acquire a network data set when logging in a certain website through an account.
Step S102, performing data matching based on the feature data in each network data group and the target feature data of each historical user in a database;
in this embodiment, the historical database stores target feature information of a plurality of historical users, so that after a network data set is obtained, feature information in the network data set can be matched with the target feature information of each historical user. In this embodiment, the target feature information includes any one or more of the following: the user name, the contact information and the user identity card information. For example, the database stores the feature information of 3 historical users: the target characteristic information of the user A comprises: the information of the ID card is 111, the name of the last name is Zhang III, and the contact way is 123; the target characteristic information of the user B comprises: the information of the identity card is 222, the name is Liqu, and the contact way is 456; the target characteristic information of the user C comprises: the information of the ID card is 333, the name is Wangwu, and the contact way is 789. At this time, network data sets of 3 users in the internet are acquired and obtained respectively, wherein information in a first network data set comprises: the account is A, the identification number is 111, the contact way is 890, and basketball is favored; the information in the second network data set includes: account is B, identification number is 222, name is lie four, contact address is 456, hobby is travel, and information in the third network data group includes: account is C, identification number is 444, contact means is 901, the hobby is music. Therefore, the characteristic information, namely the identity card number, the contact way and the name in the three network data sets can be matched with the identity card number, the contact way and the name of each historical user in the historical database.
Step S103, under the condition that at least one feature data in a first network data group is matched with one target feature data of a first historical user in the database, acquiring identification information of the first historical user so as to associate the first network data group with the identification information;
in this embodiment, for example, when the identity card number 111 matched in the first network data group is consistent with the identity card number of the historical user a in the database, it may be determined that the user and the user a in the database are the same person, and thus, the identification information of the first historical user a may be obtained, so as to associate the first network data group with the identification information of the user a. The identification information is a unique identification used for representing the identity of the user in the internet, and specifically may be an ID number. If the identity card number, the name and the contact way in the second network data set are consistent with the identity card number, the name and the contact way of the user B in the database after matching, the user of the second network data set and the user in the database can be determined to be the same person, and therefore the identification information of the first historical user B can be obtained, and the second network data set is associated with the identification information of the user B.
Step S104, under the condition that each feature data in a first network data group is not matched with each target feature data of each first historical user in the database, generating identification information of the user based on the first network data group; wherein the identification information is used for characterizing the identity of the user.
In a specific implementation process of this embodiment, for example, when it is determined after matching that characteristic information in a third network data set, that is, the identity card number is 444, and the contact manner is 901, is not consistent with characteristic data (the identity card number and the contact manner) of three users in the history database, it indicates that the user of the third network data set does not exist in the history database, so that a user identifier may be generated for the user, and then the first network data set is associated with the user identifier.
According to the method and the device, a network data set of a user is obtained from the Internet, then whether the user is the same as a historical user in a database is judged by matching characteristic data in the network data set with characteristic data of the historical user in the database, if the user is the same, the network data set of the user is associated with identification information of the corresponding user in the database, if the user is not matched, the user does not exist in the database, one identification information is established for the user, the network data set for the user is associated with the identification information, therefore, the fact that a plurality of network data sets are associated with the identification information of the same user is achieved, and further unified management of the network data sets of the same user is achieved.
Another embodiment of the present application provides a data management method, including the steps of:
step S201, acquiring each network data group of a user;
step S202, determining the confidence of each feature data in the network data group based on the source of the network data group;
in this embodiment, the confidence level may be specifically set according to different scenarios, for example, when the network data group is obtained from an application platform such as an entertainment website, the confidence level of the feature data in the network data group may be determined to be lower by one level. If the network data group is obtained from a social network site/application, for example, from WeChat or microblog, the confidence level of the feature data in the network data group can be determined to be two levels. If the network data group is obtained from a transaction scene such as a bank, the confidence degree of the feature data in the network data group can be determined to be higher by three levels. In this embodiment, the confidence level may be set for each scene according to actual needs, or a confidence value may be set, for example, the confidence level of an application platform such as an entertainment website is set to 0.1; setting the execution degree of application platforms such as social network sites to be 0.2; the execution degree of the transaction platform such as a bank is set to be 0.3.
Step S203, generating temporary identifications corresponding to each network data set;
in this step, after obtaining a plurality of network data sets, a temporary identifier may be generated for each network data set, for example, a tmpid, that is, a temporary ID may be generated to uniquely identify the network data set.
Step S204, establishing a first temporary data table based on each network data group and the temporary identifier corresponding to each network data group;
in this step, after the confidence is assigned to each network data group and the temporary ID is generated for each network data group, the first temporary data table Tmp1 may be created.
Step S205, acquiring a data matching table based on a database; the data matching table comprises historical network data sets of historical users and temporary identifications of the historical network data sets, and the temporary identifications of the historical network data sets are associated with identification information of the users;
in the specific implementation process of this embodiment, the data matching table T1 includes each network data set of the historical user and the tmpid of each network data set.
Step S206, matching is carried out on the basis of the feature data in each network data group in the first temporary data table and the target feature data in each historical network data group in the data matching table;
step S207, when each feature data in the first network data group in the first temporary data table matches each target feature data of the first historical user in the data matching table, performing a first label on the temporary identifier of the first network data group, and generating a second temporary data table Tmp2 based on the temporary identifier subjected to the first label and the corresponding first network data group; the label information of the first mark comprises identification information of a first historical user matched with a first network data group;
when partial feature data in a first network data group in the first temporary data table is matched with partial target feature data of a first historical user in the data matching table, performing second marking on a temporary identifier of the first network data group, and generating a third temporary data table Tmp3 based on the temporary identifier subjected to second marking and the corresponding first network data group; the label information of the second mark comprises identification information of a first historical user matched with the first network data group;
under the condition that all feature data in a first network data group in the first temporary data table are not matched with all target feature data of a first historical user in the data matching table, performing third marking on the temporary identifier of the first network data group, and adding the temporary identifier subjected to the third marking and the corresponding first network data group to a third temporary data table; the label information of the third label comprises prompt information used for representing that the first network data group is not matched;
a step S208 of updating the data matching table T1 based on the second temporary data table Tmp2 and the third temporary data table Tmp 3;
step S209, based on the identification information of the first historical user in the label information of each temporary identifier in the data matching table T1, grouping each temporary identifier, grouping the temporary identifiers including the identification information of the same first historical user into one group, so as to associate each first network data group with the identification information of the first historical user; and generating identification information of the user for the corresponding first network data group based on prompt information in the label information of each temporary identification in the data matching table.
In the specific implementation process of the step, an identification information Oneid result table T2 can be further created according to the database, and a mapping table T3 of tmpid temporary ID and identification information Oneid can be created based on the database. Namely, the identification information Oneid of each historical user is recorded by using an identification information Oneid result table T2, and the corresponding relation between each tmpid and Oneid is recorded by using a mapping table T3, so as to associate the network data group corresponding to each tmpid with Oneid. The updated network data groups in the data matching table T1 can thus be grouped based on T2, T3; specifically, the network data groups associated with the same identification information Oneid can be divided into a group by using label information corresponding to each network data group, then parallel computation is further performed through a mapreduce parallel computing platform, and in combination with pyspark and minihash minimum hash algorithm, whether each network data group in the group belongs to the same Oneid is further determined, after the network data groups belong to the same Oneid, a mapping relation between tmpid (temporary ID) of each network data group and the Oneid is stored in a T3 table, so that the network data groups belonging to the same user can be accurately associated. The T2 and T3 tables may also be updated after association based on the association results. Based on the T3 table, the tmpid temporary ID in the T1 table, which does not appear in the T3 table, is found out to determine an unmatched network data group, or a completely unmatched network data group may also be determined directly according to the prompt information in the tag information of each temporary ID in the T1 table, then the corresponding identification information Oneid is generated according to each completely unmatched network data group, then the newly generated identification information Oneid is added in the table T2, and the mapping relationship between the newly generated identification information Oneid and the temporary ID of the corresponding network data group is stored in the table T3.
In a specific implementation process of this embodiment, the confidence levels of the corresponding target features in the data matching table T1 may also be updated based on the feature data in the second temporary data table Tmp 2. Since the second temporary data table Tmp2 is created after the feature data in the network data set is completely matched with the feature data of the historical users in the data matching table T1, it can be stated that the corresponding target feature data in the data matching table T1 is more credible, and thus the confidence of the corresponding target feature data can be improved. Further, in the case that the first part of feature data in the first network data set in the first temporary data table matches with the part of target feature data of the first historical user in the data matching table, and the second part of feature data in the network data set matches with the part of target feature data of the second historical user, the method further includes: acquiring a first confidence coefficient of target characteristic data of a first historical user in a data matching table; acquiring a second confidence coefficient of target characteristic data of a second historical user in the data matching table; and determining that the first network data group is matched with a first historical user or a second historical user based on the first confidence degree and the second confidence degree so as to perform second marking on the temporary identification of the first network data group. For example, when the data matching table contains network data sets of 3 users, the network data set a of the user a contains identity card information of 111, the name of three, and the contact way of 123; the network data group b of the user B comprises the identity card information of 222, the name of Liqu and the contact way of 456; the network data group c of the user C comprises identity card information of 333, a name of Wangwu and a contact way of 789. When the first temporary data table Tmp1 contains 1 network data group, the information in the network data group includes: account is a, identification number is 111, contact means is 456, basketball is preferred. Thus, when matching the feature data in each network data set in the first data matching table T1 and the first temporary data table Tmp1, it can be obtained that the identification number 111 of the user is consistent with the identification card information in the network data set a of the user a in table T1, and the contact way 456 of the user is consistent with the contact way of the network data set b of the user b in table T1, that is, it is matched that the user is possibly the same as the user a or the same as the user b at the same time, so that it can be further confirmed whether the user belongs to the user a or the user b according to the confidence of the matched target feature, for example, since the confidence of the identification number is greater than that of the contact way, it can be determined that the user belongs to the user a, and further the identification information of the user a is added to the tag when the temporary ID of the network data set is subjected to the second tagging, therefore, the marking result can be more accurate, a foundation is laid for accurately associating the network data group with the user unique identification information Oneid subsequently, and the unique identity of the user in the internet field is represented by utilizing the Oneid. After the marking is completed, a third temporary data table T3 may be created based on the network data group and the temporary ID corresponding to the network data group.
In this embodiment, after the network data group of a certain online user is collected from the internet in real time, the network data group may be matched with the existing user in the database to perform association, and after the association, related content and the like may be pushed to the online user according to information such as preferences and the like in other network arrays of the user in the database. The unified management of each network data group of the same user is realized, and meanwhile, the information pushing can be carried out more reasonably and accurately.
Another embodiment of the present application provides a data management apparatus, as shown in fig. 2, including:
the acquisition module is used for acquiring each network data group of the user;
the matching module is used for performing data matching on the characteristic data in each network data group and the target characteristic data of each historical user in the database;
the association module is used for acquiring the identification information of the first historical user under the condition that at least one feature data in a first network data group is matched with a target feature data of the first historical user in the database so as to associate the first network data group with the identification information;
the first generation module is used for generating identification information of the user based on the first network data group under the condition that all feature data in the first network data group are not matched with all target feature data of all first historical users in the database; wherein the identification information is used for characterizing the identity of the user.
In this embodiment, the data management apparatus in the specific implementation process further includes: the second generation module is used for generating temporary identifications corresponding to the network data groups; the association module is specifically configured to: and establishing an association relation between the temporary identifier and the identification information so as to associate the first network data group with the identification information.
In the embodiment, in the specific implementation process, the network data group includes any one or more of the following data: account information, network address of the account, contact information, user identity information, user occupation information and user preference information; the characteristic data comprises one or more of the following data: user name, contact information and user identification card information.
In this embodiment, the data management apparatus in the specific implementation process further includes: a first establishing module and an obtaining module; the first establishing module is used for establishing a first temporary data table based on each network data group and the temporary identifier corresponding to each network data group. The obtaining module is used for obtaining a data matching table based on the database; the data matching table comprises historical network data sets of historical users and temporary identifications of the historical network data sets, and the temporary identifications of the historical network data sets are associated with identification information of the users; the matching module in this embodiment is specifically configured to: and matching based on the characteristic data in each network data group in the first temporary data table and the target characteristic data in each historical network data group in the data matching table.
In the specific implementation process of the embodiment, the system further comprises a marking module and an updating module; the marking module is used for carrying out first marking on the temporary identifier of the first network data group under the condition that each feature data in the first network data group in the first temporary data table is matched with each target feature data of the first historical user in the data matching table, and generating a second temporary data table based on the temporary identifier after the first marking and the corresponding first network data group; the label information of the first mark comprises identification information of a first historical user matched with a first network data group;
the marking module is further configured to: under the condition that partial feature data in a first network data group in the first temporary data table are matched with partial target feature data of a first historical user in the data matching table, performing second marking on a temporary identifier of the first network data group, and generating a third temporary data table based on the temporary identifier subjected to the second marking and the corresponding first network data group; the label information of the second mark comprises identification information of a first historical user matched with the first network data group;
the update module is to: updating the data matching table based on the second temporary data table and the third temporary data table;
the association module is configured to: and grouping the temporary identifications based on the identification information of the first historical user in the label information of each temporary identification in the data matching table, and grouping the temporary identifications containing the identification information of the same first historical user to associate each first network data group with the identification information of the first historical user.
In this embodiment, the marking module is further configured to: under the condition that all feature data in a first network data group in the first temporary data table are not matched with all target feature data of a first historical user in the data matching table, performing third marking on the temporary identifier of the first network data group, and adding the temporary identifier subjected to the third marking and the corresponding first network data group to a third temporary data table; the label information of the third label comprises prompt information used for representing that the first network data group is not matched;
in a specific implementation process of this embodiment, the present embodiment further includes an updating module, where the updating module is configured to update the data matching table based on the third temporary data table;
the first generation module is specifically configured to: and generating identification information of the user for the corresponding first network data group based on prompt information in the label information of each temporary identification in the data matching table.
The data management device in this embodiment further includes a determining module, configured to determine, based on a source of the network data group, a confidence level of each feature data in the network data group;
in this embodiment, when a first part of feature data in a first network data set in the first temporary data table matches with a part of target feature data of a first historical user in the data matching table, and a second part of feature data of the network data set matches with a part of target feature data of a second historical user, the tagging module is specifically configured to: acquiring a first confidence coefficient of target characteristic data of a first historical user in a data matching table; acquiring a second confidence coefficient of target characteristic data of a second historical user in the data matching table; and determining that the first network data group is matched with a first historical user or a second historical user based on the first confidence degree and the second confidence degree so as to perform second marking on the temporary identification of the first network data group.
According to the method and the device, a network data set of a user is obtained from the Internet, then whether the user is the same as a historical user in a database is judged by matching characteristic data in the network data set with characteristic data of the historical user in the database, if the user is the same, the network data set of the user is associated with identification information of the corresponding user in the database, if the user is not matched, the user does not exist in the database, one identification information is established for the user, the network data set for the user is associated with the identification information, therefore, the fact that a plurality of network data sets are associated with the identification information of the same user is achieved, and further unified management of the network data sets of the same user is achieved.
The above embodiments are only exemplary embodiments of the present application, and are not intended to limit the present application, and the protection scope of the present application is defined by the claims. Various modifications and equivalents may be made by those skilled in the art within the spirit and scope of the present application and such modifications and equivalents should also be considered to be within the scope of the present application.

Claims (10)

1. A method for managing data, comprising:
acquiring each network data group of a user;
performing data matching based on the feature data in each network data group and the target feature data of each historical user in a database;
under the condition that at least one feature data in a first network data group is matched with a target feature data of a first historical user in the database, acquiring identification information of the first historical user so as to associate the first network data group with the identification information;
under the condition that all characteristic data in a first network data group are not matched with all target characteristic data of all first historical users in the database, generating identification information of the users based on the first network data group; wherein the identification information is used for characterizing the identity of the user.
2. The method of claim 1, wherein the method further comprises: generating temporary identifications corresponding to the network data groups;
associating the first network data group with the identification information specifically includes: and establishing an association relation between the temporary identifier and the identification information so as to associate the first network data group with the identification information.
3. The method of claim 1, wherein the network data set includes any one or more of the following data: account information, network address of the account, contact information, user identity information, user occupation information and user preference information;
the characteristic data comprises one or more of the following data: user name, contact information and user identification card information.
4. The method of claim 2, wherein the method further comprises:
establishing a first temporary data table based on each network data group and the temporary identifier corresponding to each network data group;
acquiring a data matching table based on a database; the data matching table comprises historical network data sets of historical users and temporary identifications of the historical network data sets, and the temporary identifications of the historical network data sets are associated with identification information of the users;
the data matching based on the feature data in each network data group and the target feature data of each historical user in the database specifically comprises:
and matching based on the characteristic data in each network data group in the first temporary data table and the target characteristic data in each historical network data group in the data matching table.
5. The method as claimed in claim 4, wherein said obtaining identification information of the first historical user in case that at least one feature data in the first network data group matches a target feature data of the first historical user in the database, so as to associate the first network data group with the identification information, specifically comprises:
under the condition that each feature data in a first network data group in the first temporary data table is matched with each target feature data of a first historical user in the data matching table, performing first marking on a temporary identifier of the first network data group, and generating a second temporary data table based on the temporary identifier subjected to the first marking and the corresponding first network data group; the label information of the first mark comprises identification information of a first historical user matched with a first network data group;
under the condition that partial feature data in a first network data group in the first temporary data table are matched with partial target feature data of a first historical user in the data matching table, performing second marking on a temporary identifier of the first network data group, and generating a third temporary data table based on the temporary identifier subjected to the second marking and the corresponding first network data group; the label information of the second mark comprises identification information of a first historical user matched with the first network data group;
updating the data matching table based on the second temporary data table and the third temporary data table;
and grouping the temporary identifications based on the identification information of the first historical user in the label information of each temporary identification in the data matching table, and grouping the temporary identifications containing the identification information of the same first historical user to associate each first network data group with the identification information of the first historical user.
6. The method according to claim 4, wherein, in the case that each feature data in the first network data group does not match each target feature data of each first historical user in the database, generating the identification information of the user based on the first network data group specifically includes:
under the condition that all feature data in a first network data group in the first temporary data table are not matched with all target feature data of a first historical user in the data matching table, performing third marking on the temporary identifier of the first network data group, and adding the temporary identifier subjected to the third marking and the corresponding first network data group to a third temporary data table; the label information of the third label comprises prompt information used for representing that the first network data group is not matched;
updating the data matching table based on the third temporary data table;
and generating identification information of the user for the corresponding first network data group based on prompt information in the label information of each temporary identification in the data matching table.
7. The method of claim 5, wherein the method further comprises:
determining the confidence of each feature data in the network data set based on the source of the network data set;
in the case that the first part of feature data in the first network data group in the first temporary data table matches with the partial target feature data of the first historical user in the data matching table, and the second part of feature data in the network data group matches with the partial target feature data of the second historical user, the method further comprises:
acquiring a first confidence coefficient of target characteristic data of a first historical user in a data matching table;
acquiring a second confidence coefficient of target characteristic data of a second historical user in the data matching table;
and determining that the first network data group is matched with a first historical user or a second historical user based on the first confidence degree and the second confidence degree so as to perform second marking on the temporary identification of the first network data group.
8. A data management apparatus, comprising:
the acquisition module is used for acquiring each network data group of the user;
the matching module is used for performing data matching on the characteristic data in each network data group and the target characteristic data of each historical user in the database;
the association module is used for acquiring the identification information of the first historical user under the condition that at least one feature data in a first network data group is matched with a target feature data of the first historical user in the database so as to associate the first network data group with the identification information;
the first generation module is used for generating identification information of the user based on the first network data group under the condition that all feature data in the first network data group are not matched with all target feature data of all first historical users in the database; wherein the identification information is used for characterizing the identity of the user.
9. The apparatus of claim 8, further comprising:
the second generation module is used for generating temporary identifications corresponding to the network data groups;
the association module is specifically configured to: and establishing an association relation between the temporary identifier and the identification information so as to associate the first network data group with the identification information.
10. The apparatus of claim 9, further comprising: a first establishing module and an obtaining module;
the first establishing module is used for: establishing a first temporary data table based on each network data group and the temporary identifier corresponding to each network data group;
the obtaining module is configured to: acquiring a data matching table based on a database; the data matching table comprises historical network data sets of historical users and temporary identifications of the historical network data sets, and the temporary identifications of the historical network data sets are associated with identification information of the users;
the matching module is specifically configured to: and matching based on the characteristic data in each network data group in the first temporary data table and the target characteristic data in each historical network data group in the data matching table.
CN202110483979.2A 2021-04-30 2021-04-30 Data management method and device Pending CN113177064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110483979.2A CN113177064A (en) 2021-04-30 2021-04-30 Data management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110483979.2A CN113177064A (en) 2021-04-30 2021-04-30 Data management method and device

Publications (1)

Publication Number Publication Date
CN113177064A true CN113177064A (en) 2021-07-27

Family

ID=76925898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110483979.2A Pending CN113177064A (en) 2021-04-30 2021-04-30 Data management method and device

Country Status (1)

Country Link
CN (1) CN113177064A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120174203A1 (en) * 2010-12-29 2012-07-05 Frank Jonathan H Identifying a user account in a social networking system
CN105376287A (en) * 2014-08-29 2016-03-02 优视科技有限公司 Identification data processing method and system, and server
CN105978717A (en) * 2016-05-09 2016-09-28 深圳市永兴元科技有限公司 Network account recognition method and device
CN108632367A (en) * 2018-04-18 2018-10-09 家园网络科技有限公司 Account correlating method and information-pushing method
CN110557466A (en) * 2019-09-11 2019-12-10 北京明略软件系统有限公司 data processing method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120174203A1 (en) * 2010-12-29 2012-07-05 Frank Jonathan H Identifying a user account in a social networking system
CN105376287A (en) * 2014-08-29 2016-03-02 优视科技有限公司 Identification data processing method and system, and server
CN105978717A (en) * 2016-05-09 2016-09-28 深圳市永兴元科技有限公司 Network account recognition method and device
CN108632367A (en) * 2018-04-18 2018-10-09 家园网络科技有限公司 Account correlating method and information-pushing method
CN110557466A (en) * 2019-09-11 2019-12-10 北京明略软件系统有限公司 data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109299110B (en) Data query method and device, storage medium and electronic equipment
CN109634959B (en) Block indexing method and block indexing device
CN102804180A (en) Characterizing Unregistered Domain Names
CN110225039B (en) Authority model obtaining method, authority authentication method, gateway, server and storage medium
CN110213290B (en) Data acquisition method, API gateway and storage medium
CN102916811A (en) Multielement entity identity certificate information storage method
CN108093026A (en) The processing method and processing device of multi-tenant request
RU2012129539A (en) DETERMINING THE DETAILED LOCATION OF NETWORKED COMPUTERS
CN108154024B (en) Data retrieval method and device and electronic equipment
CN111143410A (en) Accommodation public security management method, device and system
CN111177481A (en) User identifier mapping method and device
CN111414528B (en) Method and device for determining equipment identification, storage medium and electronic equipment
CN113177064A (en) Data management method and device
CN107273456A (en) A kind of accurate recognition methods of multi dimensional analysis intelligent terminal feature
CN115794780A (en) Method and device for collecting network space assets, electronic equipment and storage medium
CN113836569A (en) Data query method and related equipment
US10216771B2 (en) Creating and handling identification for a resource in a configuration database
CN111090435A (en) Industrial internet identification analysis method
CN105653540B (en) Method and device for processing file attribute information
JP5790768B2 (en) Search method and information management apparatus
CN108366136A (en) A kind of analysis method and device of domain name
CN114553819B (en) IPv6 address identification method and device
CN113032471A (en) Database processing method and device, electronic equipment and medium
CN106789315B (en) System configuration method and device
CN112667273A (en) Authority management method and device of page button, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210727

RJ01 Rejection of invention patent application after publication