WO2017107367A1 - 用户标识处理的方法、终端和非易失性计算可读存储介质 - Google Patents

用户标识处理的方法、终端和非易失性计算可读存储介质 Download PDF

Info

Publication number
WO2017107367A1
WO2017107367A1 PCT/CN2016/082414 CN2016082414W WO2017107367A1 WO 2017107367 A1 WO2017107367 A1 WO 2017107367A1 CN 2016082414 W CN2016082414 W CN 2016082414W WO 2017107367 A1 WO2017107367 A1 WO 2017107367A1
Authority
WO
WIPO (PCT)
Prior art keywords
field
data
user identifier
data corresponding
feature
Prior art date
Application number
PCT/CN2016/082414
Other languages
English (en)
French (fr)
Inventor
姚乾乾
叶幸春
刘鹤
张海川
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP16877181.4A priority Critical patent/EP3396558B1/en
Publication of WO2017107367A1 publication Critical patent/WO2017107367A1/zh
Priority to US15/667,023 priority patent/US10878121B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/046Forward inferencing; Production systems
    • G06N5/047Pattern matching networks; Rete networks
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6047Power optimization with respect to the encoder, decoder, storage or transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/30Managing network names, e.g. use of aliases or nicknames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0407Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
    • H04L63/0421Anonymous communication, i.e. the party's identifiers are hidden from the other party or parties, e.g. using an anonymizer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/02Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/42Anonymization, e.g. involving pseudonyms

Definitions

  • the present invention relates to the field of data identification, and in particular, to a method, a terminal, and a non-volatile computing readable storage medium for user identification processing.
  • the traditional user identification method mainly identifies the field by fuzzy search, or by limiting the range of user identification data values, or matching based on the full amount of registration data.
  • the fuzzy search matching has a higher error rate, and the data value range is matched. The data range varies greatly, and the correct user ID cannot be accurately captured.
  • the full registration data is used for matching, the efficiency is low, and the user identification security is low.
  • a method for user identification processing comprising the following steps:
  • Matching the feature of the data corresponding to each field with the feature rule of the user identifier If the feature of the data corresponding to the field matches the feature rule of the user identifier, the data corresponding to the field is the user identifier, if the field is If the feature of the corresponding data fails to match the feature rule of the user identifier, the data corresponding to the field is not the user identifier;
  • the data corresponding to the field in the source data table is converted into a third-party user account, and the data corresponding to the field in the source data table is not maintained by the data of the user identifier.
  • a terminal comprising a memory and a processor, wherein the memory stores computer readable instructions, and when the instructions are executed by the processor, the processor performs the following steps:
  • Matching the feature of the data corresponding to each field with the feature rule of the user identifier If the feature of the data corresponding to the field matches the feature rule of the user identifier, the data corresponding to the field is the user identifier, if the field is If the feature of the corresponding data fails to match the feature rule of the user identifier, the data corresponding to the field is not the user identifier;
  • the data corresponding to the field in the source data table is converted into a third-party user account, and the data corresponding to the field in the source data table is not maintained by the data of the user identifier.
  • One or more non-transitory computer readable storage media containing computer executable instructions that, when executed by one or more processors, cause the processor to perform the following steps:
  • Matching the feature of the data corresponding to each field with the feature rule of the user identifier If the feature of the data corresponding to the field matches the feature rule of the user identifier, the data corresponding to the field is the user identifier, if the field is If the feature of the corresponding data fails to match the feature rule of the user identifier, the data corresponding to the field is not the user identifier;
  • the data corresponding to the field in the source data table is converted into a third-party user account, and the data corresponding to the field in the source data table is not maintained by the data of the user identifier.
  • 1A is a schematic diagram showing the internal structure of a terminal in an embodiment
  • 1B is a schematic diagram showing the internal structure of a server in an embodiment
  • FIG. 2 is a flow chart of a method for user identification processing in an embodiment
  • FIG. 3 is a schematic structural view of a source data table in an embodiment
  • FIG. 4 is a flow chart of a method for user identification processing in another embodiment
  • FIG. 5 is a structural block diagram of an apparatus for processing user identification in an embodiment
  • FIG. 6 is a structural block diagram of an apparatus for user identification processing in another embodiment
  • FIG. 7 is a structural block diagram of an apparatus for processing user identification in another embodiment
  • FIG. 8 is a structural block diagram of an apparatus for user identification processing in another embodiment.
  • first may be referred to as a second client
  • second client may be referred to as a first client, without departing from the scope of the present invention.
  • FIG. 1A is a schematic diagram showing the internal structure of a terminal in an embodiment.
  • the terminal includes a processor, a storage medium, a memory, a network interface, a display screen, and an input device connected through a system bus.
  • the storage medium of the terminal stores an operating system, and further includes a device for processing the user identifier, and the device for processing the user identifier is used to implement a method for processing the user identifier.
  • the processor is used to provide computing and control capabilities to support the operation of the entire terminal.
  • the memory in the terminal provides an environment for the operation of the device processed by the user identification in the storage medium, and the network interface is used for network communication with the server, such as sending a data request to the server, receiving data returned by the server, and the like.
  • the display screen of the terminal may be a liquid crystal display or an electronic ink display screen.
  • the input device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the terminal housing, or may be an external device. Keyboard, trackpad or mouse.
  • the terminal can be a cell phone, a tablet or a personal digital assistant. A person skilled in the art can understand that the structure shown in FIG.
  • 1A is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied.
  • the specific terminal may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
  • FIG. 1B is a schematic diagram showing the internal structure of a server in an embodiment.
  • the server includes a processor coupled through a system bus, a non-volatile storage medium, a memory, a network interface, a display screen, and an input device.
  • the non-volatile storage medium of the server stores an operating system, a database, and a device for processing a user identifier
  • the database stores various data and a user identifier and a third-party user account data and a correspondence relationship, and the device for processing the user identifier
  • the server's processor is used to provide computing and control capabilities that support the operation of the entire server.
  • the memory of the server provides an environment for the operation of the device handled by the user identification in the non-volatile storage medium.
  • the display screen of the server may be a liquid crystal display or an electronic ink display.
  • the input device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the terminal housing, or may be an external device. Keyboard, trackpad or mouse.
  • the network interface of the server is configured to communicate with an external terminal through a network connection, such as receiving a user identification request sent by the terminal, and returning a third-party user account to the terminal.
  • the server can be implemented with a stand-alone server or a server cluster consisting of multiple servers. Those skilled in the art can understand that the structure shown in FIG.
  • 1B is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on a server to which the solution of the present application is applied.
  • the specific server may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
  • FIG. 2 is a flow chart of a method of user identification processing in an embodiment. As shown in FIG. 2, a method for processing user identification may be performed on the server or terminal in FIG. 1, and includes the following steps:
  • Step 202 Scan a source data table to obtain features of data corresponding to each field of the source data table.
  • the source data table refers to data acquired from the network, which is generally stored in a tabular form. There are one or more fields in the source data table, each field representing a type of data, such as a sequence number field, a name field, a user identification field, a gender field, an age field, an address field, and the like.
  • FIG. 3 is a schematic diagram showing the structure of a source data table in an embodiment.
  • the sequence number field, the name field, the gender field, the user identification field, the age field, the address field, and the like are included in the first row of the source data table.
  • Each field corresponds to a column.
  • the data corresponding to the serial number field may be a natural number starting with 1 and incrementing by 1.
  • the data corresponding to the name field can be various names, such as Wang Huaweing, Li Xiaobai, Zhao Xiaohong, and so on.
  • the data corresponding to the gender field can be "male", “female", and "unknown".
  • the data corresponding to the user identification field may be data that conforms to the user identification rule, such as the instant messaging account number 12345 to 9999999999.
  • the data corresponding to the age field may be 0 to 150 or the like.
  • the data corresponding to the address field can be each address.
  • the source data table in this embodiment may be data generated by user behavior of each website or the like.
  • the characteristics of the data are calculated for the data corresponding to each field in the source data table.
  • This feature can include average values and standard deviations, and the like. The mean and standard deviation are relatively stable, and the combined check has higher reliability. Additionally, the feature can also include a maximum value and a minimum value.
  • Step 204 Match the feature of the data corresponding to each field with the feature rule of the user identifier.
  • the feature rule of the user identifier can be obtained according to the statistics of the massive amount in advance.
  • the feature rule of the user identifier may be that the mean and the standard deviation are within a certain range.
  • the feature rules of different user IDs are different. It is necessary to perform statistical analysis based on massive data to obtain the feature rules of the corresponding user identifier.
  • a user ID is an identifier used to indicate the uniqueness of a user's identity.
  • the user ID can be an instant messaging account or a mobile communication logo or an email or ID number or a payment account.
  • the feature of the data corresponding to the field is matched with the feature rule of the user identifier.
  • the feature rule of the user identifier is that the mean value is between [100000, 110000], and the standard deviation is [1, 2], and the calculated field corresponds to The characteristics of the data, that is, the mean value is between [100000, 110000] and the standard deviation is between [1, 2], then the characteristics of the data corresponding to the field match the feature rules of the user identifier successfully. If the feature of the data corresponding to the calculated field, that is, any one of the mean value and the standard deviation is not within the feature rule of the user identifier, the feature of the data corresponding to the field fails to match the feature rule of the user identifier.
  • Step 206 If the feature of the data corresponding to the field matches the feature rule of the user identifier, the data corresponding to the field is the user identifier, and then step 210 is performed.
  • Step 208 If the feature of the data corresponding to the field fails to match the feature rule of the user identifier, the data corresponding to the field is not the user identifier, and then step 212 is performed.
  • Step 210 Convert the data corresponding to the field in the source data table to the third-party user identifier.
  • the third-party user identifier is an open user identifier, that is, openid, which is a secure implementation manner that allows the user to log in to the third-party platform by using the user identifier, but does not disclose the user identifier to the third party, and provides openness for the user identifier.
  • the mapping between the user identifier and the third-party user account is established in advance, and the corresponding third-party user account is found in the mapping relationship between the user identifier and the third-party user account according to the user identifier, and the user identifier is replaced with the third-party user account.
  • Step 212 The data corresponding to the field in the source data table is not changed by the data of the user identifier.
  • the source data table is kept unchanged.
  • the method for processing the user identifier is to match the feature of the data corresponding to each field in the source data table, and match the feature of the data corresponding to each field with the feature rule of the user identifier. If the matching is successful, the field corresponds to the feature rule. The data is converted into a third-party user account. If the matching fails, the data corresponding to the field is kept unchanged, and the matching is performed by the feature rule of the user identifier, thereby improving the accuracy and efficiency of the identification of the user identifier, and converting the user identifier into As a third-party user account, the third-party platform is not allowed to obtain the user ID, which improves the security of the user ID.
  • a method for user identification processing includes the following steps:
  • Step 402 Select data of the user identification field from the test data as positive sample data, and select data of the non-user identification field as negative sample data.
  • test data can be a large amount of network data.
  • a user ID is an identifier used to indicate the uniqueness of a user's identity.
  • the user ID can be an instant messaging account or a mobile communication logo or an email or ID number or a payment account.
  • the user identifier is an instant messaging account
  • the test data may include instant messaging account data and advertisement exposure data, commodity order path data, web browsing data, user search data, and the like.
  • the data corresponding to the instant messaging account field is extracted from the test data as positive sample data.
  • the data corresponding to the non-immediate communication account field is randomly selected as the negative sample data, and the negative sample data is used as the reference group.
  • step 404 feature calculation is performed on the positive sample data and the negative sample data, respectively.
  • the average value and the standard deviation are respectively obtained for the positive sample data and the negative sample data, and the maximum value and the minimum value can also be obtained.
  • the feature of the calculated positive sample data and the feature of the negative sample data are taken as a row, and the feature is a summary table of the columns. Performing feature calculation on the positive sample data and the negative sample data respectively can obtain the statistical distribution state of the features of the user identification, and analyze and obtain the distribution rule.
  • Step 406 Align the features of the positive sample data with the features of the negative sample data to obtain a feature rule of the positive sample data.
  • the feature of the positive sample data and the feature of the negative sample data may be summarized, and the bar graph is used to display the comparison, and the features between the two are obviously distinguished, and the trusted range of the user identification data is extracted.
  • the feature rules of the user identity are then formed.
  • Step 408 Scan the source data table to obtain the characteristics of the data corresponding to each field of the source data table.
  • the source data table refers to data acquired from the network, which is generally stored in a tabular form. There are one or more fields in the source data table, each field representing a type of data, such as a sequence number field, a name field, a user identification field, a gender field, an age field, an address field, and the like.
  • Step 410 Match the feature of the data corresponding to each field with the feature rule of the user identifier.
  • the feature rule of the user identifier can be obtained according to the statistics of the massive amount in advance.
  • the feature rule of the user identifier may be that the mean and the standard deviation are within a certain range.
  • the feature rules of different user IDs are different. It is necessary to perform statistical analysis based on massive data to obtain the feature rules of the corresponding user identifier.
  • a user ID is an identifier used to indicate the uniqueness of a user's identity.
  • the user ID can be an instant messaging account or a mobile communication logo or an email or ID number or a payment account.
  • the feature of the data corresponding to the field is matched with the feature rule of the user identifier.
  • the feature rule of the user identifier is that the mean value is between [100000, 110000], and the standard deviation is [1, 2], and the calculated field corresponds to The characteristics of the data, that is, the mean value is between [100000, 110000] and the standard deviation is between [1, 2], then the characteristics of the data corresponding to the field match the feature rules of the user identifier successfully. If the feature of the data corresponding to the calculated field, that is, any one of the mean value and the standard deviation is not within the feature rule of the user identifier, the feature of the data corresponding to the field fails to match the feature rule of the user identifier.
  • Step 412 If the feature of the data corresponding to the field matches the feature rule of the user identifier, the data corresponding to the field is the user identifier. If the feature of the data corresponding to the field matches the feature rule of the user identifier, the The data corresponding to the field is not the user ID.
  • Step 414 Convert the data corresponding to the field in the source data table to the third-party user identifier, and the data corresponding to the field in the source data table is not the data of the user identifier, and then perform step 418. .
  • the third-party user identifier is an open user identifier, that is, openid, which is a secure implementation manner that allows the user to log in to the third-party platform by using the user identifier, but does not disclose the user identifier to the third party, and provides openness for the user identifier.
  • the mapping between the user identifier and the third-party user account is established in advance, and the corresponding third-party user account is found in the mapping relationship between the user identifier and the third-party user account according to the user identifier, and the user identifier is replaced with the third-party user account.
  • step 416 the data corresponding to each field in the source data table is not the user identifier, and the source data table is kept unchanged.
  • Step 418 Correct the feature rule of the user identifier according to the data corresponding to the field that has been successfully matched and the data corresponding to the field that is not successfully matched by the user identifier.
  • the data corresponding to the successfully matched field and the user identifier are not successfully matched.
  • the data corresponding to the field is modified according to the data corresponding to the field that has been successfully matched and the data corresponding to the field that contains the user identifier and is not successfully matched.
  • the data corresponding to the field containing the user identifier but not successfully matched may be analyzed, and the naming rules and/or types of the user identifier may be obtained and added to the feature rule of the user identifier, and the next time the matching is performed, the data will not be Was missing.
  • the data corresponding to the successfully matched field is considered to be an identification error, and the feature rule of the user identifier may be modified according to the characteristics of the data corresponding to the field.
  • the method for processing the user identifier includes selecting the user identifier as the positive sample data, and the non-user identification data as the negative sample data, calculating the feature of the positive sample data and the feature of the negative sample data, and comparing the feature rules of the positive sample data, and obtaining A more accurate characteristic rule of the user identifier; the feature of the data corresponding to each field is matched with the feature rule of the user identifier by acquiring the feature of the data corresponding to each field in the source data table, and if the matching is successful, the field is matched.
  • the corresponding data is converted into a third-party user account.
  • the matching fails, the data corresponding to the field is kept unchanged, and the matching is performed by the feature rule of the user identifier, thereby improving the accuracy and efficiency of the identification of the user identifier, and the user is
  • the identifier is converted into a third-party user account, and the third-party platform is not allowed to obtain the user identifier, which improves the security of the user identifier.
  • the characteristics of the user identifier are matched by the successfully matched field and the data corresponding to the field in which the user identifier is not successfully matched.
  • the rules are revised to continuously improve the feature rules of the user identification. High recognition accuracy.
  • the method for processing the user identifier further includes: converting the data corresponding to the data corresponding to the field in the source data table to the third-party user account, and the data corresponding to the field in the source data table is not After the step of keeping the data unchanged for the user identifier, obtaining data corresponding to the field in the source data table that includes the user identifier but not successfully matched; and the data pair corresponding to the field that is not successfully matched according to the user identifier The feature rules of the user identification are corrected.
  • the data corresponding to the field that is not successfully matched by the user identifier may be analyzed, and the naming rules and/or types of the user identifier may be obtained and added to the feature rule of the user identifier, and the next time the matching is performed, Will not be missed.
  • the implementation process of the method for processing user identification is described below in conjunction with a specific application scenario.
  • the method of processing by the user identifier is used to identify the instant messaging account in the data of the third-party platform and incorporate it into the library for saving.
  • the specific process includes (1) to (5):
  • the registration data of the instant messaging application and the user behavior data of the third party platform are used as test data.
  • the data of the instant messaging application QQ includes registered user data.
  • the user behavior data of the third-party platform such as the Jingdong website, has 5 data volumes, including advertisement exposure data, product order path data, web browsing data, and user search data.
  • the instant communication account field is selected as the positive sample data from the registration data of the instant messaging application and the user behavior data of the third party platform, and the data corresponding to the non-instant communication account field is selected as the negative sample data.
  • Partition sampling statistics refer to the extraction of a portion of the data for statistical purposes.
  • Full table statistics refers to the analysis and statistics of all data tables.
  • the average value and the standard deviation are respectively obtained for the positive sample data and the negative sample data.
  • the maximum and minimum values can be determined.
  • the feature of the calculated positive sample data and the feature of the negative sample data are taken as a row, and the feature is a summary table of the columns.
  • the characteristics of the instant messaging account and the characteristics of the non-instant messaging account can be summarized, and the bar graphs are displayed for comparison, and the features between the two are obviously distinguished, and the authenticity of the instant communication account data is refined.
  • the scope then forms the characteristic rules for the instant messaging account.
  • the instant messaging account in the source data table of the third-party platform is scanned, and the instant messaging account is converted into a third-party user account, that is, openid, according to the mapping relationship between the instant messaging account and the third-party user account. Then save the complete data table to the library.
  • the third-party platform cannot directly obtain the user ID, but only obtains the openid, that is, implements a secure account open mode.
  • the manual only needs to configure the corresponding feature rules and the list of data tables that need to be migrated, and then automatically identify and convert them by the big data platform, and merge them into the library to save labor.
  • the feature rule of the instant messaging account is corrected according to the data corresponding to the successfully matched field and the data corresponding to the field that is not successfully matched by the instant messaging account.
  • the case where the data is identified may be recorded, and may include that the existing instant messaging account field is identified and the instant messaging account field is not recognized.
  • the feature rules of the instant messaging account are corrected according to the existing instant messaging account field being identified and the instant messaging account field is not recognized, and the improvement is continued to improve the accuracy of the identification.
  • the data corresponding to the field that contains the instant messaging account and is not successfully matched is obtained, and the naming rules and/or types of the instant messaging account are obtained, and added to the feature rule of the instant messaging account, and the next time the matching is performed, Will be missed.
  • the instant messaging account can be a QQ number or WeChat or other instant messaging account.
  • an instant messaging account is identified and processed, but is not limited thereto.
  • the method for processing the user identifier may also be applied to an ID card number, a mobile communication identifier, a payment account, an email address, and the like. Identification processing.
  • FIG. 5 is a structural block diagram of an apparatus for user identification processing in an embodiment.
  • a device for user identification processing includes a scanning module 510, a matching module 520, and a processing module 530. among them:
  • the scanning module 510 is configured to scan the source data table to obtain features of data corresponding to each field of the source data table.
  • the source data table refers to data acquired from the network, which is generally stored in a tabular form. There are one or more fields in the source data table, each field representing a type of data, such as a sequence number field, a name field, a user identification field, a gender field, an age field, an address field, and the like.
  • the matching module 520 is configured to match the feature of the data corresponding to each field with the feature rule of the user identifier. If the feature of the data corresponding to the field matches the feature rule of the user identifier, the data corresponding to the field is the user identifier. If the feature of the data corresponding to the field fails to match the feature rule of the user identifier, the data corresponding to the field is not the user identifier.
  • the feature rule of the user identifier can be obtained according to the statistics of the massive amount in advance.
  • the feature rule of the user identifier may be that the mean and the standard deviation are within a certain range.
  • the feature rules of different user IDs are different. It is necessary to perform statistical analysis based on massive data to obtain the feature rules of the corresponding user identifier.
  • a user ID is an identifier used to indicate the uniqueness of a user's identity.
  • the user ID can be an instant messaging account or a mobile communication logo or an email or ID number or a payment account.
  • the source data table in this embodiment may be data generated by user behavior of each website or the like.
  • the characteristics of the data are calculated for the data corresponding to each field in the source data table.
  • This feature can include average values and standard deviations, and the like. The mean and standard deviation are relatively stable, and the combined check has higher reliability. Additionally, the feature can also include a maximum value and a minimum value.
  • the processing module 530 is configured to convert the data corresponding to the field corresponding to the field in the source data table into a third-party user account, and keep the data corresponding to the field in the source data table from being unchanged by the data of the user identifier.
  • the third-party user identifier is an open user identifier, that is, openid, which is a secure implementation manner that allows the user to log in to the third-party platform by using the user identifier, but does not disclose the user identifier to the third party, and provides openness for the user identifier.
  • the mapping between the user identifier and the third-party user account is established in advance, and the corresponding third-party user account is found in the mapping relationship between the user identifier and the third-party user account according to the user identifier, and the user identifier is replaced with the third-party user account.
  • the source data table is kept unchanged.
  • the device for processing the user identifier matches the feature of the data corresponding to each field in the source data table, and matches the feature of the data corresponding to each field with the feature rule of the user identifier. If the matching succeeds, the field corresponds to the feature rule.
  • the data is converted into a third-party user account. If the matching fails, the data corresponding to the field is kept unchanged, and the matching is performed by the feature rule of the user identifier, thereby improving the accuracy and efficiency of the identification of the user identifier, and converting the user identifier into As a third-party user account, the third-party platform is not allowed to obtain the user ID, which improves the security of the user ID.
  • FIG. 6 is a block diagram showing the structure of an apparatus for user identification processing in another embodiment.
  • a device for user identification processing includes a selection module 540, a calculation module 550, and a feature rule extraction module 560 in addition to the scanning module 510, the matching module 520, and the processing module 530. among them:
  • the selecting module 540 is configured to: before scanning the source data table, acquiring the characteristics of the data corresponding to each field of the source data table, selecting data of the user identification field from the test data as the positive sample data, and selecting the data of the non-user identification field as the data. Negative sample data.
  • test data can be a large amount of network data.
  • a user ID is an identifier used to indicate the uniqueness of a user's identity.
  • the user ID can be an instant messaging account or a mobile communication logo or an email or ID number or a payment account.
  • the user identifier is an instant messaging account
  • the test data may include instant messaging account data and advertisement exposure data, commodity order path data, web browsing data, user search data, and the like.
  • the data corresponding to the instant messaging account field is extracted from the test data as positive sample data.
  • the data corresponding to the non-immediate communication account field is randomly selected as the negative sample data, and the negative sample data is used as the reference group.
  • the calculation module 550 is configured to perform feature calculation on the positive sample data and the negative sample data, respectively.
  • the average value and the standard deviation are respectively obtained for the positive sample data and the negative sample data, and the maximum value and the minimum value can also be obtained.
  • the feature of the calculated positive sample data and the feature of the negative sample data are taken as a row, and the feature is a summary table of the columns.
  • the feature rule extraction module 560 is configured to compare the feature of the positive sample data with the feature of the negative sample data to obtain a feature rule of the positive sample data.
  • the feature of the positive sample data and the feature of the negative sample data may be summarized, and the bar graph is used to display the comparison, and the features between the two are obviously distinguished, and the trusted range of the user identification data is extracted.
  • the feature rules of the user identity are then formed.
  • the user identifier is selected as the positive sample data, and the non-user identification data is used as the negative sample data, and the features of the positive sample data and the characteristics of the negative sample data are calculated, and the feature rules of the positive sample data are compared, and the characteristics of the more accurate user identification can be obtained. rule.
  • FIG. 7 is a structural block diagram of an apparatus for user identification processing in another embodiment.
  • a device for user identification processing includes a first obtaining module 570 and a first correcting module 580 in addition to a scanning module 510, a matching module 520, and a processing module 530. among them:
  • the first obtaining module 570 is configured to convert the data corresponding to the data corresponding to the field in the source data table into a third-party user account, and the data corresponding to the field in the source data table is not the data of the user identifier. After the change, the data corresponding to the field that has been successfully matched in the source data table and the data corresponding to the field that contains the user identifier but are not successfully matched are obtained.
  • the first modification module 580 is configured to correct the feature rule of the user identifier according to the data corresponding to the field that has been successfully matched and the data corresponding to the field that is not successfully matched by the user identifier.
  • the data corresponding to the field that is not successfully matched by the user identifier may be analyzed, and the naming rules and/or types of the user identifier may be obtained and added to the feature rule of the user identifier, and the next time the matching is performed, Will not be missed.
  • FIG. 8 is a structural block diagram of an apparatus for user identification processing in another embodiment.
  • a device for user identification processing includes a second obtaining module 590 and a second correcting module 592 in addition to a scanning module 510, a matching module 520, and a processing module 530. among them:
  • the second obtaining module 590 is configured to convert the data corresponding to the data corresponding to the field in the source data table into a third-party user account, and the data corresponding to the field in the source data table is not the data of the user identifier. After the change, the data corresponding to the field in the source data table that contains the user identifier but is not successfully matched is obtained.
  • the second modification module 592 is configured to modify the feature rule of the user identifier according to the data corresponding to the field that is not successfully matched by the user identifier.
  • the data corresponding to the field that is not successfully matched by the user identifier may be analyzed, and the naming rules and/or types of the user identifier may be obtained and added to the feature rule of the user identifier, and the next time the matching is performed, Will not be missed.
  • the storage medium may be a magnetic disk, an optical disk, or a read-only storage memory (Read-Only) Memory, ROM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种用户标识处理的方法包括:扫描源数据表,获取所述源数据表的各字段所对应的数据的特征;将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则所述字段所对应的数据为用户标识,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则所述字段所对应的数据不为用户标识;将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变。

Description

用户标识处理的方法、终端和非易失性计算可读存储介质
本申请要求于 2015 年 12 月 23 日提交中国专利局、申请号为 201510980369.8 、发明名称为'用户标识处理的方法和装置'的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
【技术领域】
本发明涉及数据识别领域,特别是涉及一种用户标识处理的方法、终端和非易失性计算可读存储介质。
【背景技术】
随着计算机和互联网技术的发展,越来越多的用户享受着互联网技术所带来的便利,同时也产生了海量的数据。海量的数据中涉及到很多用户标识信息,该用户标识信息关乎用户的隐私,需要对用户标识信息进行保护。然而,这些用户标识信息涉及大量的表,表结构复杂,且存储达到百TB(太字节),无法做到人工识别覆盖整体数据。传统的用户标识识别方法主要是通过模糊搜索识别字段,或者通过限制用户标识数据值范围,或者基于全量注册数据进行匹配,然而,采用模糊搜索匹配出现错误率较高,通过数据值范围匹配,因数据范围变化较大,无法准确捕捉正确的用户标识,采用全量注册数据进行匹配,效率低,且用户标识安全性低。
【发明内容】
基于此,有必要提供一种用户标识处理的方法,能提高识别的准确率和效率,且能提高用户标识安全性。
此外,还有必要提供一种终端和非易失性计算机可读存储介质,能提高识别的准确率和效率,且能提高用户标识安全性。
一种用户标识处理的方法,包括以下步骤:
扫描源数据表,获取所述源数据表的各字段所对应的数据的特征;
将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则所述字段所对应的数据为用户标识,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则所述字段所对应的数据不为用户标识;
将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变。
一种终端,包括存储器及处理器,所述存储器中储存有计算机可读指令,所述指令被所述处理器执行时,使得所述处理器执行以下步骤:
扫描源数据表,获取所述源数据表的各字段所对应的数据的特征;
将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则所述字段所对应的数据为用户标识,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则所述字段所对应的数据不为用户标识;
将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变。
一个或多个包含计算机可执行指令的非易失性计算机可读存储介质,当所述计算机可执行指令被一个或多个处理器执行时,使得所述处理器执行以下步骤:
扫描源数据表,获取所述源数据表的各字段所对应的数据的特征;
将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则所述字段所对应的数据为用户标识,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则所述字段所对应的数据不为用户标识;
将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变。
本发明的一个或多个实施例的细节在下面的附图和描述中提出。本发明的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
【附图说明】
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1A为一个实施例中终端的内部结构示意图;
图1B为一个实施例中服务器的内部结构示意图;
图2为一个实施例中用户标识处理的方法的流程图;
图3为一个实施例中源数据表的结构形式示意图;
图4为另一个实施例中用户标识处理的方法的流程图;
图5为一个实施例中用户标识处理的装置的结构框图;
图6为另一个实施例中用户标识处理的装置的结构框图;
图7为另一个实施例中用户标识处理的装置的结构框图;
图8为另一个实施例中用户标识处理的装置的结构框图。
【具体实施方式】
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
可以理解,本发明所使用的术语“第一”、“第二”等可在本文中用于描述各种元件,但这些元件不受这些术语限制。这些术语仅用于将第一个元件与另一个元件区分。举例来说,在不脱离本发明的范围的情况下,可以将第一客户端称为第二客户端,且类似地,可将第二客户端称为第一客户端。
图1A为一个实施例中终端的内部结构示意图。如图1A所示,该终端包括通过系统总线连接的处理器、存储介质、内存、网络接口、显示屏和输入装置。其中,终端的存储介质存储有操作系统,还包括一种用户标识处理的装置,该用户标识处理的装置用于实现一种用户标识处理的方法。该处理器用于提供计算和控制能力,支撑整个终端的运行。终端中的内存为存储介质中的用户标识处理的装置的运行提供环境,网络接口用于与服务器进行网络通信,如发送数据请求至服务器,接收服务器返回的数据等。终端的显示屏可以是液晶显示屏或者电子墨水显示屏等,输入装置可以是显示屏上覆盖的触摸层,也可以是终端外壳上设置的按键、轨迹球或触控板,也可以是外接的键盘、触控板或鼠标等。该终端可以是手机、平板电脑或者个人数字助理。本领域技术人员可以理解,图1A中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
图1B为一个实施例中服务器的内部结构示意图。如图1B所示,该服务器包括通过系统总线连接的处理器、非易失性存储介质、内存、网络接口、显示屏和输入装置。其中,该服务器的非易失性存储介质存储有操作系统、数据库和用户标识处理的装置,数据库中存储有各种数据和用户标识与第三方用户帐号数据及对应关系,该用户标识处理的装置用于实现适用于服务器的一种用户标识处理的方法。该服务器的处理器用于提供计算和控制能力,支撑整个服务器的运行。该服务器的内存为非易失性存储介质中的用户标识处理的装置的运行提供环境。该服务器的显示屏可以是液晶显示屏或者电子墨水显示屏等,输入装置可以是显示屏上覆盖的触摸层,也可以是终端外壳上设置的按键、轨迹球或触控板,也可以是外接的键盘、触控板或鼠标等。该服务器的网络接口用于据以与外部的终端通过网络连接通信,比如接收终端发送的用户标识请求以及向终端返回第三方用户帐号等。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。本领域技术人员可以理解,图1B中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的服务器的限定,具体的服务器可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
图2为一个实施例中用户标识处理的方法的流程图。如图2所示,一种用户标识处理的方法,可运行于图1中的服务器或终端上,包括以下步骤:
步骤202,扫描源数据表,获取该源数据表的各字段所对应的数据的特征。
具体地,源数据表是指从网络中获取的数据,其一般以表格形式存储。在源数据表中存在一个或多个字段,每个字段表示一种数据,例如序号字段、姓名字段、用户标识字段、性别字段、年龄字段、地址字段等。
图3为一个实施例中源数据表的结构形式示意图。如图3所示,在源数据表首行中包括序号字段、姓名字段、性别字段、用户标识字段、年龄字段、地址字段等。每个字段对应一列。序号字段所对应的数据可为以1开始,自增1的自然数。姓名字段所对应的数据可为各种名称,如王小明、李小白、赵小红等。性别字段所对应的数据可为“男”、“女”和“未知”等。用户标识字段所对应的数据可为符合用户标识规则的数据,如即时通信帐号12345至9999999999等。年龄字段所对应的数据可为0至150等。地址字段所对应的数据可为各个地址。
本实施例中的源数据表可为各网站的用户行为所产生的数据等。
对源数据表中每个字段所对应的数据计算数据的特征。该特征可包括平均值和标准差等。平均值和标准差相对稳定,且组合校验具有较高的可靠性。此外,该特征还可包括最大值和最小值。
步骤204,将各字段所对应的数据的特征与用户标识的特征规则进行匹配。
具体地,预先可根据海量的数据统计得出用户标识的特征规则。该用户标识的特征规则可为均值和标准差在某个范围内。不同的用户标识的特征规则不同,需要根据海量的数据进行统计分析得出对应用户标识的特征规则。
用户标识是用于表示用户身份唯一性的标识。用户标识可为即时通信帐号或移动通信标识或电子邮箱或身份证号码或支付帐号等。
将字段所对应的数据的特征与用户标识的特征规则进行匹配,例如用户标识的特征规则是均值在[100000,110000]之间,标准差在[1,2],计算得到的字段所对应的数据的特征,即均值在[100000,110000]之间且标准差在[1,2]之间,则该字段所对应的数据的特征与用户标识的特征规则匹配成功。若计算得到的字段所对应的数据的特征,即均值和标准差中任意一个不在用户标识的特征规则内,则该字段所对应的数据的特征与用户标识的特征规则匹配失败。
步骤206,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则该字段所对应的数据为用户标识,然后执行步骤210。
步骤208,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则该字段所对应的数据不为用户标识,然后执行步骤212。
步骤210,将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户标识。
具体地,第三方用户标识是一个开放用户标识,即openid,是允许用户使用用户标识登录第三方平台,但不会公开用户标识给第三方,为用户标识的开放提供的一种安全的实现方式。预先建立用户标识与第三方用户帐号的映射关系,然后根据用户标识从用户标识与第三方用户帐号的映射关系中查找到对应的第三方用户帐号,将用户标识替换为第三方用户帐号。
步骤212,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变。
若源数据表中各字段所对应的数据均不为用户标识,则保持源数据表不变。
上述用户标识处理的方法,通过获取源数据表中各字段所对应的数据的特征,将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若匹配成功,则将该字段所对应的数据转化为第三方用户帐号,若匹配失败,则保持该字段所对应的数据不变,通过用户标识的特征规则进行匹配,提高了用户标识被识别的准确率和效率,且将用户标识转化为第三方用户帐号,不让第三方平台获取用户标识,提高了用户标识的安全性。
图4为另一个实施例中用户标识处理的方法的流程图。如图4所示,一种用户标识处理的方法,包括以下步骤:
步骤402,从测试数据中选取用户标识字段的数据作为正样本数据,选取非用户标识字段的数据作为负样本数据。
具体地,测试数据可为海量的网络数据。用户标识是用于表示用户身份唯一性的标识。用户标识可为即时通信帐号或移动通信标识或电子邮箱或身份证号码或支付帐号等。
例如用户标识为即时通信帐号,则测试数据可包括即时通信帐号数据和广告曝光数据、商品订单路径数据、网页浏览数据、用户搜索数据等。从测试数据中提取即时通信帐号字段所对应的数据作为正样本数据。随机选取非即时通信帐号字段所对应的数据作为负样本数据,将负样本数据作为参照组。
步骤404,对正样本数据和负样本数据分别进行特征计算。
具体地,对正样本数据和负样本数据分别求取平均值和标准差,还可求取最大值和最小值。将计算得到的正样本数据的特征和负样本数据的特征以字段为行,特征为列的汇总表格。对正样本数据和负样本数据分别进行特征计算可得到用户标识的特征的统计分布状态,进行分析得出分布规则。
步骤406,将正样本数据的特征和负样本数据的特征进行比对,得到该正样本数据的特征规则。
具体地,可将正样本数据的特征和负样本数据的特征进行汇总,通过条形图展示比对,发现两者之间的特征呈现较明显的区别,提炼出用户标识数据的可信范围,然后形成用户标识的特征规则。
步骤408,扫描源数据表,获取该源数据表的各字段所对应的数据的特征。
具体地,源数据表是指从网络中获取的数据,其一般以表格形式存储。在源数据表中存在一个或多个字段,每个字段表示一种数据,例如序号字段、姓名字段、用户标识字段、性别字段、年龄字段、地址字段等。
步骤410,将各字段所对应的数据的特征与用户标识的特征规则进行匹配。
具体地,预先可根据海量的数据统计得出用户标识的特征规则。该用户标识的特征规则可为均值和标准差在某个范围内。不同的用户标识的特征规则不同,需要根据海量的数据进行统计分析得出对应用户标识的特征规则。
用户标识是用于表示用户身份唯一性的标识。用户标识可为即时通信帐号或移动通信标识或电子邮箱或身份证号码或支付帐号等。
将字段所对应的数据的特征与用户标识的特征规则进行匹配,例如用户标识的特征规则是均值在[100000,110000]之间,标准差在[1,2],计算得到的字段所对应的数据的特征,即均值在[100000,110000]之间且标准差在[1,2]之间,则该字段所对应的数据的特征与用户标识的特征规则匹配成功。若计算得到的字段所对应的数据的特征,即均值和标准差中任意一个不在用户标识的特征规则内,则该字段所对应的数据的特征与用户标识的特征规则匹配失败。
步骤412,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则该字段所对应的数据为用户标识,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则该字段所对应的数据不为用户标识。
步骤414,将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户标识,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变,然后执行步骤418。
具体地,第三方用户标识是一个开放用户标识,即openid,是允许用户使用用户标识登录第三方平台,但不会公开用户标识给第三方,为用户标识的开放提供的一种安全的实现方式。预先建立用户标识与第三方用户帐号的映射关系,然后根据用户标识从用户标识与第三方用户帐号的映射关系中查找到对应的第三方用户帐号,将用户标识替换为第三方用户帐号。
步骤416,源数据表中各字段所对应的数据均不为用户标识,则保持源数据表不变。
步骤418,根据已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据对该用户标识的特征规则进行修正。
具体地,字段所对应的数据与用户标识的特征规则进行匹配的过程中可能存在识别错误或漏识别字段的情况,通过获取已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据,然后根据已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据对该用户标识的特征规则进行修正。例如可分析包含用户标识而未被匹配成功的字段所对应的数据,得到用户标识的命名规则和/或类型等,并添加到用户标识的特征规则中,下次再进行匹配时,则不会被漏掉。对于已匹配成功的字段所对应的数据被认为是识别错误的,可根据该字段所对应的数据的特征对用户标识的特征规则进行修正。
上述用户标识处理的方法,选取用户标识作为正样本数据,非用户标识的数据作为负样本数据,计算正样本数据的特征和负样本数据的特征,比较得出正样本数据的特征规则,可得到较为准确的用户标识的特征规则;通过获取源数据表中各字段所对应的数据的特征,将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若匹配成功,则将该字段所对应的数据转化为第三方用户帐号,若匹配失败,则保持该字段所对应的数据不变,通过用户标识的特征规则进行匹配,提高了用户标识被识别的准确率和效率,且将用户标识转化为第三方用户帐号,不让第三方平台获取用户标识,提高了用户标识的安全性;通过已匹配成功的字段及包含用户标识未被匹配成功的字段所对应的数据对用户标识的特征规则进行修正,可持续完善用户标识的特征规则,提高识别的准确率。
在一个实施例中,上述用户标识处理的方法还包括:在该将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变的步骤之后,获取源数据表中包含用户标识而未被匹配成功的字段所对应的数据;根据该包含用户标识而未被匹配成功的字段所对应的数据对该用户标识的特征规则进行修正。
具体地,可分析包含用户标识而未被匹配成功的字段所对应的数据,得到用户标识的命名规则和/或类型等,并添加到用户标识的特征规则中,下次再进行匹配时,则不会被漏掉。
下面结合具体的应用场景描述用户标识处理的方法的实现过程。以用户标识处理的方法用于将第三方平台的数据中即时通信帐号进行识别并入库保存。具体过程包括(1)至(5):
(1)从测试数据中选取即时通信帐号字段所对应的数据作为正样本数据,选取非即时通信帐号字段所对应的数据作为负样本数据。
具体地,将即时通信应用的注册数据和第三方平台的用户行为数据作为测试数据。例如即时通信应用QQ的数据包括注册用户数据。第三方平台的用户行为数据如京东网站有5个数据量,包括广告曝光数据、商品订单路径数据、网页浏览数据、用户搜索数据等。从即时通信应用的注册数据和第三方平台的用户行为数据中选取即时通信帐号字段作为正样本数据,在选取非即时通信帐号字段所对应的数据作为负样本数据。
可采用分区抽样统计或全表统计等方式对特征进行统计计算。分区抽样统计是指抽取一部分数据进行统计。全表统计是指对所有数据表进行分析统计。
(2)对正样本数据和负样本数据分别进行特征计算,特征包括平均值和标准差。
具体地,对正样本数据和负样本数据分别求取平均值和标准差。此外,还可求取最大值和最小值。将计算得到的正样本数据的特征和负样本数据的特征以字段为行,特征为列的汇总表格。
(3)将即时通信帐号的特征和非即时通信帐号的特征进行比对,得到即时通信帐号的特征规则。
具体地,可将即时通信帐号的特征和非即时通信帐号的特征进行汇总,通过条形图展示比对,发现两者之间的特征呈现较明显的区别,提炼出即时通信帐号数据的可信范围,然后形成即时通信帐号的特征规则。
(4)扫描第三方平台的源数据表,获取源数据表的各字段所对应的数据的特征,将各字段所对应的数据的特征与即时通信帐号的特征规则进行匹配,若字段所对应的数据的特征与即时通信帐号的特征规则匹配成功,则该字段所对应的数据为即时通信帐号,若字段所对应的数据的特征与即时通信帐号的特征规则匹配失败,则该字段所对应的数据不为即时通信帐号,将源数据表中字段所对应的数据为即时通信帐号的数据转化为第三方用户标识,将源数据表中字段所对应的数据不为即时通信帐号的数据保持数据不变。
具体地,扫描第三方平台的源数据表中的即时通信帐号,根据即时通信帐号与第三方用户帐号的映射关系,将即时通信帐号转化为第三方用户帐号,即openid。然后将完整的数据表入库保存。第三方平台无法直接获取用户标识,只是获取openid,即实现一种安全的帐号开放模式。
在这一过程中,人工只需配置相应的特征规则和需要迁移的数据表清单,然后由大数据平台自动识别和转化,并入库保存,节省人力。
(5)根据已匹配成功的字段所对应的数据和包含即时通信帐号而未被匹配成功的字段所对应的数据对该即时通信帐号的特征规则进行修正。
具体地,对已转化的数据,会记录数据被识别的情况,可包括已有即时通信帐号字段被识别和包含即时通信帐号字段未被识别。根据已有即时通信帐号字段被识别和包含即时通信帐号字段未被识别对即时通信帐号的特征规则进行修正,持续完善以提高识别的准确率。处理包含即时通信帐号而未被匹配成功的字段所对应的数据,得到即时通信帐号的命名规则和/或类型等,并添加到即时通信帐号的特征规则中,下次再进行匹配时,则不会被漏掉。
通过对正负样本数据得出的特征规则做了几轮修正后,对于QQ号码识别准确率达到了94.5%。该即时通信帐号可为QQ号码或微信或其他即时通信帐号等。
需要说明的是,上述具体应用场景描述了即时通信帐号被识别处理的过程,但不限于此,上述用户标识处理的方法还可应用于身份证号码、移动通信标识、支付帐号、电子邮箱等被识别处理。
图5为一个实施例中用户标识处理的装置的结构框图。如图5所示,一种用户标识处理的装置,包括扫描模块510、匹配模块520和处理模块530。其中:
扫描模块510用于扫描源数据表,获取该源数据表的各字段所对应的数据的特征。
具体地,源数据表是指从网络中获取的数据,其一般以表格形式存储。在源数据表中存在一个或多个字段,每个字段表示一种数据,例如序号字段、姓名字段、用户标识字段、性别字段、年龄字段、地址字段等。
匹配模块520用于将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则该字段所对应的数据为用户标识,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则该字段所对应的数据不为用户标识。
具体地,预先可根据海量的数据统计得出用户标识的特征规则。该用户标识的特征规则可为均值和标准差在某个范围内。不同的用户标识的特征规则不同,需要根据海量的数据进行统计分析得出对应用户标识的特征规则。
用户标识是用于表示用户身份唯一性的标识。用户标识可为即时通信帐号或移动通信标识或电子邮箱或身份证号码或支付帐号等。
本实施例中的源数据表可为各网站的用户行为所产生的数据等。
对源数据表中每个字段所对应的数据计算数据的特征。该特征可包括平均值和标准差等。平均值和标准差相对稳定,且组合校验具有较高的可靠性。此外,该特征还可包括最大值和最小值。
处理模块530用于将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变。
具体地,第三方用户标识是一个开放用户标识,即openid,是允许用户使用用户标识登录第三方平台,但不会公开用户标识给第三方,为用户标识的开放提供的一种安全的实现方式。预先建立用户标识与第三方用户帐号的映射关系,然后根据用户标识从用户标识与第三方用户帐号的映射关系中查找到对应的第三方用户帐号,将用户标识替换为第三方用户帐号。
若源数据表中各字段所对应的数据均不为用户标识,则保持源数据表不变。
上述用户标识处理的装置,通过获取源数据表中各字段所对应的数据的特征,将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若匹配成功,则将该字段所对应的数据转化为第三方用户帐号,若匹配失败,则保持该字段所对应的数据不变,通过用户标识的特征规则进行匹配,提高了用户标识被识别的准确率和效率,且将用户标识转化为第三方用户帐号,不让第三方平台获取用户标识,提高了用户标识的安全性。
图6为另一个实施例中用户标识处理的装置的结构框图。如图6所示,一种用户标识处理的装置,除了包括扫描模块510、匹配模块520和处理模块530,还包括选取模块540、计算模块550和特征规则提取模块560。其中:
选取模块540用于在扫描源数据表,获取该源数据表的各字段所对应的数据的特征之前,从测试数据中选取用户标识字段的数据作为正样本数据,选取非用户标识字段的数据作为负样本数据。
具体地,测试数据可为海量的网络数据。用户标识是用于表示用户身份唯一性的标识。用户标识可为即时通信帐号或移动通信标识或电子邮箱或身份证号码或支付帐号等。
例如用户标识为即时通信帐号,则测试数据可包括即时通信帐号数据和广告曝光数据、商品订单路径数据、网页浏览数据、用户搜索数据等。从测试数据中提取即时通信帐号字段所对应的数据作为正样本数据。随机选取非即时通信帐号字段所对应的数据作为负样本数据,将负样本数据作为参照组。
计算模块550用于对该正样本数据和负样本数据分别进行特征计算。
具体地,对正样本数据和负样本数据分别求取平均值和标准差,还可求取最大值和最小值。将计算得到的正样本数据的特征和负样本数据的特征以字段为行,特征为列的汇总表格。
特征规则提取模块560用于将正样本数据的特征和负样本数据的特征进行比对,得到该正样本数据的特征规则。
具体地,可将正样本数据的特征和负样本数据的特征进行汇总,通过条形图展示比对,发现两者之间的特征呈现较明显的区别,提炼出用户标识数据的可信范围,然后形成用户标识的特征规则。
选取用户标识作为正样本数据,非用户标识的数据作为负样本数据,计算正样本数据的特征和负样本数据的特征,比较得出正样本数据的特征规则,可得到较为准确的用户标识的特征规则。
图7为另一个实施例中用户标识处理的装置的结构框图。如图7所示,一种用户标识处理的装置,除了包括扫描模块510、匹配模块520和处理模块530,还包括第一获取模块570和第一修正模块580。其中:
第一获取模块570用于在该将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变之后,获取源数据表中已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据。
第一修正模块580用于根据已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据对该用户标识的特征规则进行修正。
具体地,可分析包含用户标识而未被匹配成功的字段所对应的数据,得到用户标识的命名规则和/或类型等,并添加到用户标识的特征规则中,下次再进行匹配时,则不会被漏掉。
图8为另一个实施例中用户标识处理的装置的结构框图。如图8所示,一种用户标识处理的装置,除了包括扫描模块510、匹配模块520和处理模块530,还包括第二获取模块590和第二修正模块592。其中:
第二获取模块590用于在该将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变之后,获取源数据表中包含用户标识而未被匹配成功的字段所对应的数据。
第二修正模块592用于根据该包含用户标识而未被匹配成功的字段所对应的数据对该用户标识的特征规则进行修正。
具体地,可分析包含用户标识而未被匹配成功的字段所对应的数据,得到用户标识的命名规则和/或类型等,并添加到用户标识的特征规则中,下次再进行匹配时,则不会被漏掉。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (15)

  1. 一种用户标识处理的方法,包括以下步骤:
    扫描源数据表,获取所述源数据表的各字段所对应的数据的特征;
    将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则所述字段所对应的数据为用户标识,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则所述字段所对应的数据不为用户标识;
    将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变。
  2. 根据权利要求1所述的方法,其特征在于,在扫描源数据表,获取所述源数据表的各字段所对应的数据的特征的步骤之前,还包括:
    从测试数据中选取用户标识字段的数据作为正样本数据,选取非用户标识字段的数据作为负样本数据;
    对所述正样本数据和负样本数据分别进行特征计算;
    将正样本数据的特征和负样本数据的特征进行比对,得到所述正样本数据的特征规则。
  3. 根据权利要求1所述的方法,其特征在于,在所述将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变的步骤之后,所述方法还包括:
    获取源数据表中已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据;
    根据已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据对所述用户标识的特征规则进行修正。
  4. 根据权利要求1所述的方法,其特征在于,在所述将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变的步骤之后,还包括:
    获取源数据表中包含用户标识而未被匹配成功的字段所对应的数据;
    根据所述包含用户标识而未被匹配成功的字段所对应的数据对所述用户标识的特征规则进行修正。
  5. 根据权利要求1所述的方法,其特征在于,所述特征包括平均值和标准差;所述用户标识为即时通信帐号或移动通信标识或电子邮箱或身份证号码或支付帐号。
  6. 一种终端,包括存储器及处理器,所述存储器中储存有计算机可读指令,所述指令被所述处理器执行时,使得所述处理器执行以下步骤:
    扫描源数据表,获取所述源数据表的各字段所对应的数据的特征;
    将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则所述字段所对应的数据为用户标识,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则所述字段所对应的数据不为用户标识;
    将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变。
  7. 根据权利要求6所述的终端,其特征在于,在扫描源数据表,获取所述源数据表的各字段所对应的数据的特征的步骤之前,所述处理器还执行以下步骤:
    从测试数据中选取用户标识字段的数据作为正样本数据,选取非用户标识字段的数据作为负样本数据;
    对所述正样本数据和负样本数据分别进行特征计算;
    将正样本数据的特征和负样本数据的特征进行比对,得到所述正样本数据的特征规则。
  8. 根据权利要求6所述的终端,其特征在于,在所述将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变的步骤之后,所述处理器还执行以下步骤:
    获取源数据表中已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据;
    根据已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据对所述用户标识的特征规则进行修正。
  9. 根据权利要求6所述的终端,其特征在于,在所述将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变的步骤之后,所述处理器还执行以下步骤:
    获取源数据表中包含用户标识而未被匹配成功的字段所对应的数据;
    根据所述包含用户标识而未被匹配成功的字段所对应的数据对所述用户标识的特征规则进行修正。
  10. 根据权利要求6所述的终端,其特征在于,所述特征包括平均值和标准差;所述用户标识为即时通信帐号或移动通信标识或电子邮箱或身份证号码或支付帐号。
  11. 一个或多个包含计算机可执行指令的非易失性计算机可读存储介质,当所述计算机可执行指令被一个或多个处理器执行时,使得所述处理器执行以下步骤:
    扫描源数据表,获取所述源数据表的各字段所对应的数据的特征;
    将各字段所对应的数据的特征与用户标识的特征规则进行匹配,若字段所对应的数据的特征与用户标识的特征规则匹配成功,则所述字段所对应的数据为用户标识,若字段所对应的数据的特征与用户标识的特征规则匹配失败,则所述字段所对应的数据不为用户标识;
    将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变。
  12. 根据权利要求11所述的非易失性计算机可读存储介质,其特征在于,在扫描源数据表,获取所述源数据表的各字段所对应的数据的特征的步骤之前,还包括:
    从测试数据中选取用户标识字段的数据作为正样本数据,选取非用户标识字段的数据作为负样本数据;
    对所述正样本数据和负样本数据分别进行特征计算;
    将正样本数据的特征和负样本数据的特征进行比对,得到所述正样本数据的特征规则。
  13. 根据权利要求11所述的非易失性计算机可读存储介质,其特征在于,在所述将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变的步骤之后,还包括:
    获取源数据表中已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据;
    根据已匹配成功的字段所对应的数据和包含用户标识而未被匹配成功的字段所对应的数据对所述用户标识的特征规则进行修正。
  14. 根据权利要求11所述的非易失性计算机可读存储介质,其特征在于,在所述将源数据表中字段所对应的数据为用户标识的数据转化为第三方用户帐号,将源数据表中字段所对应的数据不为用户标识的数据保持数据不变的步骤之后,还包括:
    获取源数据表中包含用户标识而未被匹配成功的字段所对应的数据;
    根据所述包含用户标识而未被匹配成功的字段所对应的数据对所述用户标识的特征规则进行修正。
  15. 根据权利要求11所述的非易失性计算机可读存储介质,其特征在于,所述特征包括平均值和标准差;所述用户标识为即时通信帐号或移动通信标识或电子邮箱或身份证号码或支付帐号。
PCT/CN2016/082414 2015-12-23 2016-05-17 用户标识处理的方法、终端和非易失性计算可读存储介质 WO2017107367A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP16877181.4A EP3396558B1 (en) 2015-12-23 2016-05-17 Method for user identifier processing, terminal and nonvolatile computer readable storage medium thereof
US15/667,023 US10878121B2 (en) 2015-12-23 2017-08-02 Method and device for converting data containing user identity

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510980369.8 2015-12-23
CN201510980369.8A CN106909811B (zh) 2015-12-23 2015-12-23 用户标识处理的方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/667,023 Continuation US10878121B2 (en) 2015-12-23 2017-08-02 Method and device for converting data containing user identity

Publications (1)

Publication Number Publication Date
WO2017107367A1 true WO2017107367A1 (zh) 2017-06-29

Family

ID=59088907

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/082414 WO2017107367A1 (zh) 2015-12-23 2016-05-17 用户标识处理的方法、终端和非易失性计算可读存储介质

Country Status (4)

Country Link
US (1) US10878121B2 (zh)
EP (1) EP3396558B1 (zh)
CN (1) CN106909811B (zh)
WO (1) WO2017107367A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634951A (zh) * 2018-10-23 2019-04-16 平安科技(深圳)有限公司 大数据采集方法、装置、计算机设备及存储介质
US10922306B2 (en) * 2016-12-21 2021-02-16 Aon Global Operations Plc, Singapore Branch Systems and methods for automated bulk user registration spanning both a content management system and any software applications embedded therein
US10990642B2 (en) 2016-12-21 2021-04-27 Aon Global Operations Se, Singapore Branch Methods and systems for securely embedding dashboards into a content management system
US11537272B2 (en) 2016-12-21 2022-12-27 Aon Global Operations Se, Singapore Branch Content management system extensions

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180285596A1 (en) * 2017-03-30 2018-10-04 Cisco Technology, Inc. System and method for managing sensitive data
CN109961080B (zh) * 2017-12-26 2022-09-23 腾讯科技(深圳)有限公司 终端识别方法及装置
CN109388675A (zh) * 2018-10-12 2019-02-26 平安科技(深圳)有限公司 数据分析方法、装置、计算机设备及存储介质
CN110716927B (zh) * 2019-09-06 2023-10-24 中国平安财产保险股份有限公司 数据修正方法、装置、设备及计算机可读存储介质
CN111581512B (zh) * 2020-05-08 2023-06-02 孙颐 一种网页访客数量统计方法及装置
CN114253951B (zh) * 2020-09-21 2023-09-19 腾讯科技(深圳)有限公司 数据处理方法、系统及第二服务器

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020073138A1 (en) * 2000-12-08 2002-06-13 Gilbert Eric S. De-identification and linkage of data records
CN1670746A (zh) * 2004-03-19 2005-09-21 株式会社日立制作所 名册控制方法
US20100114607A1 (en) * 2008-11-04 2010-05-06 Sdi Health Llc Method and system for providing reports and segmentation of physician activities
JP2011022826A (ja) * 2009-07-16 2011-02-03 Nippon Telegr & Teleph Corp <Ntt> サービス提供システム、利用者id管理方法および利用者id管理プログラム
CN103067398A (zh) * 2012-12-31 2013-04-24 北京百度网讯科技有限公司 一种用于实现第三方应用访问用户数据的方法和设备

Family Cites Families (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6011849A (en) * 1997-08-28 2000-01-04 Syndata Technologies, Inc. Encryption-based selection system for steganography
US7770016B2 (en) * 1999-07-29 2010-08-03 Intertrust Technologies Corporation Systems and methods for watermarking software and other media
US6397224B1 (en) * 1999-12-10 2002-05-28 Gordon W. Romney Anonymously linking a plurality of data records
US7269578B2 (en) * 2001-04-10 2007-09-11 Latanya Sweeney Systems and methods for deidentifying entries in a data source
WO2003021473A1 (en) * 2001-08-30 2003-03-13 Privasource, Inc. Data source privacy screening systems and methods
US7206791B2 (en) * 2002-01-17 2007-04-17 International Business Machines Corporation System and method for managing and securing meta data
US7024409B2 (en) * 2002-04-16 2006-04-04 International Business Machines Corporation System and method for transforming data to preserve privacy where the data transform module suppresses the subset of the collection of data according to the privacy constraint
US20060082592A1 (en) * 2004-10-19 2006-04-20 International Business Machines Corporation Mapping of a color to a treemap
US7672967B2 (en) * 2005-02-07 2010-03-02 Microsoft Corporation Method and system for obfuscating data structures by deterministic natural data substitution
US8015214B2 (en) * 2006-06-30 2011-09-06 Encapsa Technology, Llc Method of encapsulating information in a database and an encapsulated database
US20080240425A1 (en) * 2007-03-26 2008-10-02 Siemens Medical Solutions Usa, Inc. Data De-Identification By Obfuscation
US20090132419A1 (en) * 2007-11-15 2009-05-21 Garland Grammer Obfuscating sensitive data while preserving data usability
WO2009139650A1 (en) * 2008-05-12 2009-11-19 Business Intelligence Solutions Safe B.V. A data obfuscation system, method, and computer implementation of data obfuscation for secret databases
US8719233B2 (en) * 2008-06-24 2014-05-06 Emc Corporation Generic method and apparatus for database sanitizing
AU2010250042B2 (en) * 2009-05-21 2015-03-26 Intertrust Technologies Corporation Content delivery systems and methods
WO2011048551A1 (en) * 2009-10-19 2011-04-28 Nokia Corporation User identity management for permitting interworking of a bootstrapping architecture and a shared identity service
US20110113049A1 (en) * 2009-11-09 2011-05-12 International Business Machines Corporation Anonymization of Unstructured Data
US9442980B1 (en) * 2010-04-21 2016-09-13 Stan Trepetin Mathematical method for performing homomorphic operations
US8626749B1 (en) * 2010-04-21 2014-01-07 Stan Trepetin System and method of analyzing encrypted data in a database in near real-time
US8544104B2 (en) * 2010-05-10 2013-09-24 International Business Machines Corporation Enforcement of data privacy to maintain obfuscation of certain data
US20150088706A1 (en) * 2010-05-12 2015-03-26 Ontario Systems, Llc Method, system, and computer-readable medium for managing and collecting receivables
WO2012094602A1 (en) * 2011-01-07 2012-07-12 Interdigital Patent Holdings, Inc. Client and server group sso with local openid
US8972747B2 (en) * 2011-01-26 2015-03-03 Hewlett-Packard Development Company, L.P. Managing information in a document serialization
US8806223B2 (en) * 2011-05-03 2014-08-12 Douglas Norman Crowe System and method for management of encrypted data
US20120297017A1 (en) * 2011-05-20 2012-11-22 Microsoft Corporation Privacy-conscious personalization
US10044713B2 (en) * 2011-08-19 2018-08-07 Interdigital Patent Holdings, Inc. OpenID/local openID security
US8856157B2 (en) * 2011-08-23 2014-10-07 Business Objects Software Limited Automatic detection of columns to be obfuscated in database schemas
GB201115866D0 (en) * 2011-09-14 2011-10-26 Royal Holloway & Bedford New College Method and apparatus for enabling authorised users to access computer resources
WO2013075661A1 (zh) * 2011-11-23 2013-05-30 腾讯科技(深圳)有限公司 登录及开放平台标识方法、开放平台及系统
ES2565842T3 (es) * 2011-12-27 2016-04-07 Telecom Italia S.P.A. Método de asignación de seudónimos dinámicos para redes de creación de perfiles de datos de usuarios, y red de creación de perfiles de datos de usuarios que implementa el método
JP2015521406A (ja) * 2012-04-27 2015-07-27 インターデイジタル パテント ホールディングス インコーポレイテッド サービスインターフェースを個人化および/または調整するためのシステムおよび方法
US9197619B2 (en) * 2012-09-06 2015-11-24 Intel Corporation Management of multiple devices registered to a user
WO2014080297A2 (en) * 2012-11-12 2014-05-30 EPI-USE Systems, Ltd. Secure data copying
EP2932680A1 (en) * 2012-12-12 2015-10-21 Interdigital Patent Holdings, Inc. Independent identity management systems
US8893230B2 (en) * 2013-02-22 2014-11-18 Duo Security, Inc. System and method for proxying federated authentication protocols
CN104052612B (zh) * 2013-03-13 2017-08-25 中国移动通信集团广东有限公司 一种电信业务的故障识别与定位的方法及系统
US9594878B2 (en) * 2013-03-15 2017-03-14 Rush University Medical Center Geographic utilization of artificial intelligence in real-time for disease identification and alert notification
US9721086B2 (en) * 2013-03-15 2017-08-01 Advanced Elemental Technologies, Inc. Methods and systems for secure and reliable identity-based computing
US9191384B2 (en) * 2013-06-26 2015-11-17 Vmware, Inc. Maintaining privacy in a multi-tenant cloud service participating in a federated identity platform
US9223995B1 (en) * 2013-12-10 2015-12-29 Progress Software Corporation Semantic obfuscation of data in real time
CN104601436B (zh) * 2013-12-31 2016-11-16 腾讯科技(深圳)有限公司 一种账号的生成方法、终端及后台服务器
US20160323248A1 (en) * 2013-12-31 2016-11-03 Interdigital Patent Holdings Inc. Methods, apparatus, systems and mechanisms for secure attribute based friend find and proximity discovery
US10074374B2 (en) * 2014-04-07 2018-09-11 Barco N.V. Ad hoc one-time pairing of remote devices using online audio fingerprinting
US9641512B2 (en) * 2014-04-10 2017-05-02 EMC IP Holding Company LLC Identity protocol translation gateway
US10339341B2 (en) * 2014-05-07 2019-07-02 Hush Hush Methods and systems for obfuscating sensitive information in computer systems
US9419962B2 (en) * 2014-06-16 2016-08-16 Adobe Systems Incorporated Method and apparatus for sharing server resources using a local group
CN104184654A (zh) * 2014-07-30 2014-12-03 小米科技有限责任公司 一种基于用户标识的匹配方法及装置
WO2016049227A1 (en) * 2014-09-23 2016-03-31 FHOOSH, Inc. Secure high speed data storage, access, recovery, and transmission
US10013576B2 (en) * 2014-12-12 2018-07-03 Panasonic Intellectual Property Management Co., Ltd. History information anonymization method and history information anonymization device for anonymizing history information
WO2016105553A1 (en) * 2014-12-26 2016-06-30 Interdigital Patent Holdings, Inc. Continuous device/uicc based authentication for lte systems
CN104573094B (zh) * 2015-01-30 2018-05-29 深圳市华傲数据技术有限公司 网络账号识别匹配方法
US10108306B2 (en) * 2015-02-24 2018-10-23 Axon Enterprise, Inc. Systems and methods for bulk redaction of recorded data
US9824236B2 (en) * 2015-05-19 2017-11-21 Accenture Global Services Limited System for anonymizing and aggregating protected information
CN104883259B (zh) * 2015-06-11 2018-05-01 郑存粮 一种手机号作为网络应用账号自动注册的方法
IN2015DE01753A (zh) * 2015-06-11 2015-08-28 Pradeep Varma
GB201512283D0 (en) * 2015-07-14 2015-08-19 Apical Ltd Track behaviour events
US9953176B2 (en) * 2015-10-02 2018-04-24 Dtex Systems Inc. Method and system for anonymizing activity records
US10754984B2 (en) * 2015-10-09 2020-08-25 Micro Focus Llc Privacy preservation while sharing security information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020073138A1 (en) * 2000-12-08 2002-06-13 Gilbert Eric S. De-identification and linkage of data records
CN1670746A (zh) * 2004-03-19 2005-09-21 株式会社日立制作所 名册控制方法
US20100114607A1 (en) * 2008-11-04 2010-05-06 Sdi Health Llc Method and system for providing reports and segmentation of physician activities
JP2011022826A (ja) * 2009-07-16 2011-02-03 Nippon Telegr & Teleph Corp <Ntt> サービス提供システム、利用者id管理方法および利用者id管理プログラム
CN103067398A (zh) * 2012-12-31 2013-04-24 北京百度网讯科技有限公司 一种用于实现第三方应用访问用户数据的方法和设备

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10922306B2 (en) * 2016-12-21 2021-02-16 Aon Global Operations Plc, Singapore Branch Systems and methods for automated bulk user registration spanning both a content management system and any software applications embedded therein
US10990642B2 (en) 2016-12-21 2021-04-27 Aon Global Operations Se, Singapore Branch Methods and systems for securely embedding dashboards into a content management system
US11537272B2 (en) 2016-12-21 2022-12-27 Aon Global Operations Se, Singapore Branch Content management system extensions
CN109634951A (zh) * 2018-10-23 2019-04-16 平安科技(深圳)有限公司 大数据采集方法、装置、计算机设备及存储介质
CN109634951B (zh) * 2018-10-23 2023-12-22 平安科技(深圳)有限公司 大数据采集方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
US20170329993A1 (en) 2017-11-16
US10878121B2 (en) 2020-12-29
EP3396558A1 (en) 2018-10-31
EP3396558B1 (en) 2020-12-23
CN106909811B (zh) 2020-07-03
CN106909811A (zh) 2017-06-30
EP3396558A4 (en) 2018-10-31

Similar Documents

Publication Publication Date Title
WO2017107367A1 (zh) 用户标识处理的方法、终端和非易失性计算可读存储介质
CN106874389B (zh) 数据的迁移方法和装置
WO2018113241A1 (zh) 页面展示方法、装置、服务器和存储介质
WO2018205373A1 (zh) 人伤理赔定损费用测算方法、装置、服务器和介质
CN107688664B (zh) 图表生成方法、装置、计算机设备和存储介质
WO2020215681A1 (zh) 指示信息生成方法、装置、终端及存储介质
US11438360B2 (en) Determining the intersection of a set of compromised credentials with a set of active credentials with data structures and architectures that expedite comparisons
WO2018058959A1 (zh) Sql审核方法、装置、服务器及存储设备
US10382461B1 (en) System for determining anomalies associated with a request
US9632911B2 (en) Stack trace clustering
WO2020119384A1 (zh) 基于大数据分析的医保异常检测方法、装置、设备和介质
WO2020087981A1 (zh) 风控审核模型生成方法、装置、设备及可读存储介质
WO2021072881A1 (zh) 基于对象存储的请求处理方法、装置、设备及存储介质
WO2021135373A1 (zh) 关联冲突块呈现方法和设备
WO2018035929A1 (zh) 一种验证码的处理方法及装置
US11178160B2 (en) Detecting and mitigating leaked cloud authorization keys
WO2020073494A1 (zh) 网页后门检测方法、设备、存储介质及装置
WO2020233089A1 (zh) 测试用例生成方法、装置、终端及计算机可读存储介质
CN112559526A (zh) 数据表导出方法、装置、计算机设备及存储介质
US20200285625A1 (en) Data selection system and data selection method
WO2020186780A1 (zh) 用户操作录制还原方法、装置、设备及可读存储介质
US20230267228A1 (en) Detection method and apparatus, and non-transitory computer readable storage medium
WO2018149081A1 (zh) 回访语音信息的处理方法、装置、终端和存储介质
WO2019198950A1 (ko) 컨텐츠 정보 제공 장치 및 그 방법
WO2020237858A1 (zh) 断点数据传输方法、装置、设备及非易失性存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16877181

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE