CN105894028B - User identification method and device - Google Patents

User identification method and device Download PDF

Info

Publication number
CN105894028B
CN105894028B CN201610197077.1A CN201610197077A CN105894028B CN 105894028 B CN105894028 B CN 105894028B CN 201610197077 A CN201610197077 A CN 201610197077A CN 105894028 B CN105894028 B CN 105894028B
Authority
CN
China
Prior art keywords
feature
user
attribute value
information set
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610197077.1A
Other languages
Chinese (zh)
Other versions
CN105894028A (en
Inventor
刘坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201610197077.1A priority Critical patent/CN105894028B/en
Publication of CN105894028A publication Critical patent/CN105894028A/en
Application granted granted Critical
Publication of CN105894028B publication Critical patent/CN105894028B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a user identification method and device. One embodiment of the method comprises: acquiring an attribute value of at least one characteristic of a user to be identified from a pre-collected user information set; obtaining a weight matched with each feature of the at least one feature of the user to be identified and an attribute value of the feature from a pre-established model, wherein the model comprises the following information: the system comprises a feature, a candidate attribute value associated with the feature, and a weight associated with the feature and the candidate attribute value, wherein the weight is determined by comparing the proportions of users in a pre-stored target user information set and a pre-stored basic user information set, wherein the attribute value of the feature is equal to the candidate attribute value; acquiring the sum of the weight matched with each characteristic of the user to be identified and the attribute value of the characteristic; and identifying whether the user to be identified is a potential target user or not according to the size of the sum of the weights. This embodiment enables more potential target users to be accurately identified.

Description

User identification method and device
Technical Field
The application relates to the technical field of computers, in particular to the technical field of user portrait, and particularly relates to a user identification method and device.
Background
With the rapid development of the internet, the requirement for accurately analyzing the attribute and the relationship of each user through the user portrait data is more and more clear. A user representation is a virtual representation of a real user, a target user model built on top of a series of real data. By researching and learning users through user calls, the users are distinguished into different types according to the difference of the targets, behaviors and viewpoints of the users, then typical features are extracted from each type, and descriptions of some demographic elements, scenes and the like are given, so that user portrait data is formed. The user portrait enables enterprises to conveniently acquire more extensive feedback information of the users through the Internet, and provides a sufficient data basis for further accurately and quickly analyzing important business information such as user behavior habits, consumption habits and the like.
At present, user portrait data has a relatively successful application experience in the aspects of information recommendation, message push and the like. Before information recommendation and message pushing are carried out, potential target users need to be identified in a basic user information set, so that the information recommendation and the message pushing can be carried out more specifically. Prior art methods of identifying potential target users are typically based on how often a user uses a predetermined product.
However, the potential target users identified by the above prior art are generally small in size, and have certain limitations, so that more potential target users cannot be accurately identified.
Disclosure of Invention
The present application aims to provide a user identification method and apparatus to solve the technical problems mentioned in the above background.
In a first aspect, the present application provides a user identification method, where the method includes: acquiring an attribute value of at least one characteristic of a user to be identified from a pre-collected user information set; obtaining a weight matched with each feature of the at least one feature of the user to be identified and the attribute value of the feature from a pre-established model, wherein the model comprises the following information: the system comprises a feature, a candidate attribute value associated with the feature, and a weight associated with the feature and the candidate attribute value, wherein the weight is determined by comparing the proportions of users in a pre-stored target user information set and a pre-stored basic user information set, wherein the attribute value of the feature is equal to the candidate attribute value; acquiring the sum of weights matched with the attribute values of each feature and at least one feature of the user to be identified; and identifying whether the user to be identified is a potential target user or not according to the size of the sum of the weights.
In some embodiments, the comparing the pre-stored target user information set and the basic user information set, where the attribute value of the feature is equal to the ratio of the users with the candidate attribute value, includes: respectively acquiring the proportion of users with the attribute values of the features in the target user information set and the basic user information set equal to the candidate attribute values in the sets; obtaining an absolute value of a difference between the proportions of the target user information set and the base user information set as weights associated with the features and the attribute values.
In some embodiments, the comparing the pre-stored target user information set and the basic user information set, where the attribute value of the feature is equal to the ratio of the users with the candidate attribute value, further includes: and correcting the weight according to the number of candidate attribute values associated with the feature, wherein a positive correlation exists between the corrected weight and the number of candidate attribute values.
In some embodiments, the model further comprises precondition information, and the weight associated with the feature and the attribute value in the model is a weight associated with the precondition, the feature and the attribute value, wherein the weight is determined by comparing the proportions of the pre-stored users with the attribute value of the feature equal to the candidate attribute value in the target user information set and the base user information set which satisfy the precondition; and the user to be identified is a user to be identified which meets a predetermined precondition; and the obtaining of the weight matching each of the at least one feature of the user to be identified and the attribute value of the feature from the pre-established model comprises: and acquiring a weight matched with the preset precondition, each feature of the at least one feature of the user to be identified and the attribute value of the feature from a pre-established model.
In some embodiments, the method further comprises: after the potential target users are determined, acquiring a successful user set for identifying the target users in the potential target users; recalculating and updating the weights in the model by comparing the proportions of the users of each candidate attribute value associated with each feature in the successfully identified user set and the basic user information set.
In some embodiments, the obtaining an attribute value of at least one feature of the user to be identified from the pre-collected user information set includes: performing at least one of the following processes on a pre-collected set of user information: updating discrete original attribute values in the user information set into attribute values in each preset interval range for representing the original attribute values; setting the attribute value of the characteristic that the attribute value of the user in the user information set is empty as a preset default value; for each of the features, deleting attribute values whose corresponding weights are less than a predetermined threshold; and acquiring the attribute value of at least one characteristic of the user to be identified from the processed user information set.
In a second aspect, the present application provides a user identification device, the device comprising: the characteristic information acquisition unit is used for acquiring an attribute value of at least one characteristic of a user to be identified from a user information set collected in advance; a weight obtaining unit, configured to obtain, from a pre-established model, a weight that matches each of the at least one feature of the user to be identified and an attribute value of the feature, where the model includes the following information: the system comprises a feature, a candidate attribute value associated with the feature, and a weight associated with the feature and the candidate attribute value, wherein the weight is determined by comparing the proportions of users in a pre-stored target user information set and a pre-stored basic user information set, wherein the attribute value of the feature is equal to the candidate attribute value; the adding unit is used for obtaining the sum of weights matched with the attribute values of each feature and each feature in the at least one feature of the user to be identified; and the identification unit is used for identifying whether the user to be identified is a potential target user or not according to the size of the sum of the weights.
In some embodiments, the apparatus further comprises: a weight determination unit, configured to determine, for each feature of the at least one feature and each attribute value associated with the feature, a weight associated with the feature and a candidate attribute value by comparing ratios of users whose attribute values of the features in a pre-stored target user information set and a base user information set are equal to the candidate attribute value, the weight determination unit including: a proportion obtaining subunit, configured to obtain proportions, in the set, of users whose attribute values of the features in the target user information set and the basic user information set are equal to the candidate attribute values, respectively; a weight determining subunit, configured to obtain an absolute value of a difference between the proportions of the target user information set and the base user information set, as a weight associated with the feature and the attribute value.
In some embodiments, the weight determination unit further comprises: a weight modifying subunit, configured to modify the weight according to a number of candidate attribute values associated with the feature, where a positive correlation exists between the modified weight and the number of candidate attribute values.
In some embodiments, the model further includes precondition information, and the weight associated with the feature and the attribute value in the model is a weight associated with the precondition, the feature and the attribute value, wherein the weight is determined by the weight determination unit by comparing the occupation ratios of pre-stored users whose attribute values of the feature in the target user information set and the base user information set satisfy the precondition are equal to the candidate attribute value; and the user to be identified is a user to be identified which meets a predetermined precondition; and the weight obtaining unit is further used for obtaining the weight matched with the preset precondition, each feature of the at least one feature of the user to be identified and the attribute value of the feature from a pre-established model.
In some embodiments, the apparatus further comprises: the successful sample acquisition unit is used for acquiring an identification successful user set of the target users in the potential target users after the potential target users are determined; and the weight updating unit is used for recalculating and updating the weight in the model by comparing the proportion of the users of each candidate attribute value associated with each feature in the successfully identified user set and the basic user information set.
In some embodiments, the feature information acquiring unit includes: a preprocessing subunit, configured to perform at least one of the following processing on a pre-collected user information set: updating discrete original attribute values in the user information set into attribute values in each preset interval range for representing the original attribute values; setting the attribute value of the characteristic that the attribute value of the user in the user information set is empty as a preset default value; for each of the features, deleting attribute values whose corresponding weights are less than a predetermined threshold; and the characteristic information extraction subunit is used for acquiring the attribute value of at least one characteristic of the user to be identified from the user information set after the processing of the preprocessing subunit.
According to the user identification method and device, the weight matched with each feature in the at least one feature of the user to be identified and the attribute value of the feature is obtained from the pre-established model, whether the user to be identified is a potential target user is identified according to the sum of the weights, the user can be identified based on more feature attribute values, and therefore more potential target users can be accurately identified.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a user identification method according to the present application;
FIG. 3 is an exemplary diagram of data processing according to one embodiment of a user identification method of the present application;
FIG. 4 is a schematic diagram of an embodiment of a subscriber identity device according to the present application;
FIG. 5 is a block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the user identification method or user identification apparatus of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications, such as a car-typing application, a map search service application, and the like, may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices that support communication of information, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, for example, performs processing such as storing, analyzing, and the like on user information transmitted by a car-typing application, a map search service application, and the like on the terminal apparatuses 101, 102, 103, and may push a message to a corresponding user according to a processing result.
It should be noted that the user identification method provided in the embodiment of the present application is generally performed by the server 105. Accordingly, the user identification means is typically provided in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, FIG. 2 illustrates a flow 200 of one embodiment of a user identification method according to the present application.
As shown in fig. 2, the user identification method of the present embodiment includes the following steps:
step 201, obtaining an attribute value of at least one feature of a user to be identified from a pre-collected user information set.
In this embodiment, the electronic device (e.g. the server shown in fig. 1) on which the user identification method operates may obtain, locally or remotely, the attribute value of the at least one feature of the user to be identified from a pre-collected set of user information. Wherein, the user information set can be user portrait data; the user to be identified may be one or more, and the at least one characteristic may include, but is not limited to: gender, age, income level, education level, industry in which, consumption habits, personal interests, etc. may affect one or more of the features identifying the result to be identified.
In some optional implementations of the embodiment, the electronic device may first perform at least one of the following processes on the pre-collected user information set: updating discrete original attribute values in the user information set into attribute values in each preset interval range for representing the original attribute values; setting the attribute value of the characteristic that the attribute value of the user in the user information set is empty as a preset default value; for each of the above features, attribute values having a corresponding weight less than a predetermined threshold are deleted. And then, acquiring an attribute value of at least one characteristic of the user to be identified from the user information set after the processing. Taking the age characteristic as an example, the discrete original attribute value is usually a specific age value (e.g. 21, 22, 30), the original attribute value is updated to an attribute value in each predetermined interval range for representing the original attribute value, and then, the attribute value associated with the age characteristic may include, for example: 20 to 25, 25 to 29, and 30 or more. Through the implementation mode, the method can be greatly improved in the aspects of identifiability, hit rate, stability and the like.
Step 202, obtaining a weight matched with each feature of the at least one feature of the user to be identified and the attribute value of the feature from a pre-established model.
Wherein the model comprises the following information: the user information set comprises a feature, a candidate attribute value associated with the feature and a weight associated with the feature and the candidate attribute value, wherein the weight is determined by comparing the occupation ratio of users in a pre-stored target user information set and a pre-stored basic user information set, wherein the attribute value of the feature is equal to the candidate attribute value.
In this embodiment, the feature and the attribute value in the model may be represented by a feature identifier and an attribute value identifier, respectively. For example, the ith feature is denoted by i, and the jth attribute value associated with the feature i is denoted by jWherein i and j are positive integers, i belongs to {1, … M }, and j belongs to {1, … N }i+1}, M being the number of features in the at least one feature mentioned above, NiIs the number of attribute values associated with the feature i. Let the weight associated with the feature i and the attribute value j associated with the feature i in the model be SijThe electronic device may match S matched with each feature and each attribute value of the feature in the model according to each feature and each attribute value of the feature in the at least one featureijTo obtain a weight matching each of the above-mentioned at least one feature of the user to be identified and the attribute value of the feature.
Wherein S isijThe method is determined in advance by comparing the pre-stored target user information set with the user ratios of users with the attribute values of the characteristics i equal to the attribute value j in the basic user information set. The target users in the target user information set may be users who have reached to actually use a certain product (e.g., a certain car-driving application), and the users in the basic user information set are general users who are not determined to be potential target users. Taking the consumption level as an example, the attribute value associated with the feature may include: high, medium, low. The electronic device may first obtain the user ratios with high consumption level, the user ratios with medium consumption level, and the user ratios with low consumption level in the target user information set and the basic user information set, respectively, for example:
Figure BDA0000955226550000071
Figure BDA0000955226550000081
and then respectively comparing the occupation ratio of the users with high consumption level in the target user information set with the occupation ratio of the users with high consumption level in the basic user information set, wherein if the difference is larger, the weight associated with the high attribute value of the consumption level characteristic is relatively larger. For the middle and low attribute values, the associated weights are also obtained respectively by adopting the method.
In some optional implementation manners of this embodiment, the comparing the pre-stored target user information set and the pre-stored base user information set, where the attribute value of the feature is equal to the ratio of the users with the candidate attribute value, may include: respectively acquiring the proportion of users in the target user information set and the basic user information set, wherein the attribute value of the characteristic is equal to the candidate attribute value; and acquiring an absolute value of a difference between the ratios of the target user information set and the basic user information set as a weight associated with the feature and the attribute value.
In addition, since the weight of a user associated with a feature at a multi-attribute value is the same as the weight associated with a feature at a low-attribute value, the weight of the multi-attribute value feature should be greater. Therefore, optionally, the comparing the pre-stored target user information set and the pre-stored base user information set, where the attribute value of the feature is equal to the candidate attribute value, may further include: and correcting the weight according to the number of candidate attribute values associated with the feature, wherein the corrected weight and the number of candidate attribute values have positive correlation. For example, the original weight is S'ijThen S after correctionijCan be as follows: log (N)i)×S’ij. Through the correction of the weight, the number of the candidate attribute values of the weight in the model is in positive correlation, so that the weight in the model is more reasonable and accurate.
Step 203, obtaining the sum of the weight matched with each feature in the at least one feature of the user to be identified and the attribute value of the feature.
In this embodiment, the electronic device may sum each of the at least one feature of the user to be identified obtained in step 202 and the weight matching the attribute value of the feature to obtain a sum of the weights, where the sum may be used to indicate the significance and the possibility that the user to be identified is a potential target user.
And step 204, identifying whether the user to be identified is a potential target user or not according to the sum of the weights.
In this embodiment, the electronic device may compare the sum of the weights with a preset threshold, and if the sum of the weights is greater than the threshold, may identify the user to be identified as the potential target user. In addition, if there are multiple users to be identified, the electronic device may select a predetermined number of users to be identified as potential target users from large to small according to the sum of the weights.
In some optional implementations of this embodiment, the model further includes precondition information, and the weight associated with the feature and the attribute value in the model is a weight associated with the precondition, the feature, and the attribute value. The weight is determined by comparing the pre-stored proportions of the users whose attribute values of the features are equal to the candidate attribute values in the target user information set and the basic user information set which meet the precondition. And the user to be identified is the user to be identified which meets the preset precondition. And, the obtaining of the weight matching with each of the at least one feature of the user to be identified and the attribute value of the feature from the pre-established model may include: and acquiring a weight matched with the preset precondition, each feature of the at least one feature of the user to be identified and the attribute value of the feature from a pre-established model.
The precondition may be, for example, a region condition, a user information collection time condition, or the like. For example, the proportion distribution of features such as consumption levels, education degrees and the like of users in different cities generally has differences, and if the users in the small city are identified according to the weights obtained by the information of the target user information set and the basic user information set in the first-line city, the effect is generally lower than the accuracy of the identification of the users in the first-line city. Therefore, by the implementation mode, the precondition of the user to be verified is the same as the potential condition of the user information set on which the generation model is based, and the accuracy of user identification is higher.
In some optional implementation manners of this embodiment, the user identification method of this embodiment may further include: after the potential target users are determined, acquiring a successful user set for identifying the target users in the potential target users; and recalculating and updating the weight in the model by comparing the proportion of the users of each candidate attribute value associated with each feature in the successfully identified user set and the basic user information set. Taking the target user as a user who used a product as an example, after a certain period of time (for example, after 1 month) after step 204, the electronic device may first obtain a set of successfully identified users by acquiring users who used the product in the period of time among the potential target users identified in step 204. Then, by the method of step 202, the user ratios of each candidate attribute value associated with each feature in the successfully identified user set and the basic user information set are compared, and the weight in the model is recalculated and updated. Through the implementation mode, the model is optimized, and the accuracy of subsequent user identification is improved.
Referring now to fig. 3, fig. 3 illustrates an exemplary schematic diagram of a user identification method according to the present embodiment.
As shown in fig. 3, the electronic device may obtain, at the model layer, a base user sample 304 and a target user sample 305 through a filtering of preconditions in the precondition set 303 (or not through preconditions) based on a target user information set 301 and a base user information set 302 of the base data layer; then, the model 306 is obtained by performing the user feature extraction, the contrast ratio distribution calculation, the weight calculation, and the like (i.e., step 202 of the present embodiment) as shown in fig. 3. When the user is identified, the user to be identified 307 is subjected to the user feature extraction process shown in fig. 3 (i.e. step 201 of the embodiment); performing target user saliency calculation shown in fig. 3 based on the extracted attribute values of the features of the user to be identified and the model (i.e., step 203 in this embodiment); the set of potential target users 308 shown in fig. 3 is finally obtained (via step 204 of the present embodiment).
In the user identification method provided by this embodiment, the weight matched with each feature in the at least one feature of the user to be identified and the attribute value of the feature is obtained from the pre-established model, and based on the sum of the weights, whether the user to be identified is a potential target user is identified, so that the user can be identified based on more attribute values of the features, and more potential target users can be accurately identified.
With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of a user identification apparatus, which corresponds to the method embodiment shown in fig. 2, and which can be specifically applied in a server.
As shown in fig. 4, the user identification apparatus 400 provided in this embodiment includes: a feature information acquisition unit 401, a weight acquisition unit 402, a summation unit 403, and a recognition unit 404. The feature information acquiring unit 401 is configured to acquire an attribute value of at least one feature of a user to be identified from a user information set collected in advance; the weight obtaining unit 402 is configured to obtain a weight matching each of the at least one feature of the user to be identified and an attribute value of the feature from a pre-established model, where the model includes the following information: the system comprises features, candidate attribute values associated with the features and weights associated with the features and the candidate attribute values, wherein the weights are determined by comparing the proportions of users in a pre-stored target user information set and a pre-stored basic user information set, wherein the attribute value of the features is equal to the candidate attribute value; the adding unit 403 is configured to obtain a sum of weights that each feature of the at least one feature of the user to be identified matches with the attribute value of the feature; the identifying unit 404 is configured to identify whether the user to be identified is a potential target user according to the magnitude of the sum of the weights.
In this embodiment, the specific processing of the feature information obtaining unit 401, the weight obtaining unit 402, the summing unit 403, and the identifying unit 404 may refer to the related descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of this embodiment, the feature information obtaining unit 401 may include: a preprocessing sub-unit 4011, and a feature information extraction sub-unit 4012. Wherein the preprocessing sub-unit 4011 is configured to perform at least one of the following processing on the pre-collected user information set: updating discrete original attribute values in the user information set into attribute values in each preset interval range for representing the original attribute values; setting the attribute value of the characteristic that the attribute value of the user in the user information set is empty as a preset default value; for each of the above features, attribute values having a corresponding weight less than a predetermined threshold are deleted. The feature information extraction sub-unit 4012 is configured to obtain an attribute value of at least one feature of the user to be identified from the user information set after being processed by the preprocessing sub-unit. For the specific processing of this implementation and the technical effects brought by the processing, reference may be made to the description of relevant parts of the optional implementation in step 201 in the corresponding embodiment of fig. 2, and details are not described here again.
In some optional implementation manners of this embodiment, the user identification apparatus of this embodiment may further include: a weight determining unit 405, configured to determine, for each feature of the at least one feature and each attribute value associated with the feature, a weight associated with the feature and the candidate attribute value by comparing the ratios of users whose attribute values of the features in the pre-stored target user information set and the base user information set are equal to the candidate attribute value. Wherein, the weight determination unit 405 may include: a proportion obtaining subunit 4051, configured to obtain proportions, in the set, of users whose attribute values of the features in the target user information set and the basic user information set are equal to the candidate attribute values, respectively; a weight determination subunit 4052, configured to obtain an absolute value of a difference between the proportions of the target user information set and the base user information set as a weight associated with the feature and the attribute value. The specific processing of this implementation may refer to the related description of the corresponding optional implementation in step 202 in the corresponding embodiment of fig. 2, and is not described herein again.
Based on the foregoing implementation manner, in some optional implementation manners of this embodiment, the weight determining unit 405 may further include: a weight correction subunit 4053, configured to correct the weight according to the number of candidate attribute values associated with the feature, where there is a positive correlation between the corrected weight and the number of candidate attribute values. For the specific processing of this implementation and the technical effects brought by the processing, reference may be made to the description of relevant parts of the corresponding optional implementation in step 202 in the corresponding embodiment of fig. 2, and details are not described here again.
In some optional implementation manners of this embodiment, the model may further include precondition information, and the weight associated with the feature and the attribute value in the model may be a weight associated with the precondition, the feature, and the attribute value. The weight may be determined by comparing the ratios of users who have the attribute values of the features equal to the candidate attribute values in the pre-stored target user information set and the pre-stored basic user information set, which satisfy the precondition. And the user to be identified is the user to be identified which meets the preset precondition. And the weight obtaining unit 402 may be further configured to obtain, from a pre-established model, a weight that matches the predetermined precondition, each of the at least one feature of the user to be identified, and an attribute value of the feature. For the specific processing of the implementation and the technical effects thereof, reference may be made to the description of relevant parts of the corresponding optional implementation in the embodiment corresponding to fig. 2, which is not described herein again.
In some optional implementation manners of this embodiment, the user identification apparatus of this embodiment may further include: a successful sample obtaining unit 406, configured to obtain, after the potential target users are determined, a successful user set for identification of the target user from among the potential target users; a weight updating unit 407, configured to recalculate and update the weight in the model by comparing the proportions of the users of each candidate attribute value associated with each feature in the successfully identified user set and the basic user information set. For the specific processing of the implementation and the technical effects thereof, reference may be made to the description of relevant parts of the corresponding optional implementation in the embodiment corresponding to fig. 2, which is not described herein again.
In the user identification apparatus provided in this embodiment, the weight obtaining unit 402 obtains, from the pre-established model, a weight that matches each of the at least one feature of the user to be identified and an attribute value of the feature, and the identifying unit 404 identifies whether the user to be identified is a potential target user based on the sum of the weights calculated by the summing unit 403, so that the user can be identified based on more attribute values of the features, and more potential target users can be accurately identified.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a server according to embodiments of the present application is shown.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: a storage section 506 including a hard disk and the like; and a communication section 507 including a network interface card such as a LAN card, a modem, or the like. The communication section 507 performs communication processing via a network such as the internet. The driver 508 is also connected to the I/O interface 505 as necessary. A removable medium 509 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 508 as necessary, so that a computer program read out therefrom is mounted into the storage section 506 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 507 and/or installed from the removable medium 509. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 501.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a feature information acquisition unit, a weight acquisition unit, a summation unit, and an identification unit. Where the names of these units do not constitute a limitation on the unit itself in some cases, for example, the feature information acquisition unit may also be described as a "unit that acquires an attribute value of at least one feature of a user to be identified from a collection of user information collected in advance".
As another aspect, the present application also provides a non-volatile computer storage medium, which may be the non-volatile computer storage medium included in the apparatus in the above embodiment; or it may be a non-volatile computer storage medium that exists separately and is not incorporated into the terminal. The non-volatile computer storage medium stores one or more programs that, when executed by a device, cause the device to: acquiring an attribute value of at least one characteristic of a user to be identified from a pre-collected user information set; acquiring a weight matched with each feature of the at least one feature of the user to be identified and the attribute value of the feature from a pre-established model, wherein the model comprises the following information: the system comprises features, candidate attribute values associated with the features and weights associated with the features and the candidate attribute values, wherein the weights are determined by comparing the proportions of users in a pre-stored target user information set and a pre-stored basic user information set, wherein the attribute value of the features is equal to the candidate attribute value; acquiring the sum of weights matched with the attribute values of each feature and the feature in the at least one feature of the user to be identified; and identifying whether the user to be identified is a potential target user or not according to the sum of the weights.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. A method for identifying a user, the method comprising:
acquiring an attribute value of at least one characteristic of a user to be identified from pre-collected user portrait data;
obtaining a weight matched with each feature of the at least one feature of the user to be identified and the attribute value of the feature from a pre-established model, wherein the model comprises the following information: a feature, a candidate attribute value associated with the feature, a weight associated with the feature and the candidate attribute value, the weight determined by comparing a proportion of users in pre-stored target user representation data and base user representation data for which the attribute value of the feature is equal to the candidate attribute value;
acquiring the sum of weights matched with the attribute values of each feature and at least one feature of the user to be identified;
identifying whether the user to be identified is a potential target user or not by taking the size of the sum of the weights as a basis;
and pushing a message to the user based on the identification result.
2. The method according to claim 1, wherein comparing the pre-stored target user information set and the base user information set with the user ratios of the users whose attribute values of the features are equal to the candidate attribute values comprises:
respectively acquiring the proportion of users with the attribute values of the features in the target user information set and the basic user information set equal to the candidate attribute values in the sets;
obtaining an absolute value of a difference between the proportions of the target user information set and the base user information set as weights associated with the features and the attribute values.
3. The method according to claim 2, wherein comparing the pre-stored target user information set and the base user information set by the user having the attribute value of the feature equal to the candidate attribute value further comprises:
and correcting the weight according to the number of candidate attribute values associated with the feature, wherein a positive correlation exists between the corrected weight and the number of candidate attribute values.
4. The method according to any one of claims 1-3, wherein the model further comprises precondition information, and the weights associated with the features and the attribute values in the model are weights associated with the preconditions, the features and the attribute values, wherein the weights are determined by comparing the ratios of pre-stored users in the target user information set and the base user information set, which satisfy the preconditions, whose attribute values of the features are equal to the candidate attribute values; and
the user to be identified is a user to be identified which meets a preset precondition; and
the obtaining of the weight matching each feature of the at least one feature of the user to be identified and the attribute value of the feature from the pre-established model includes:
and acquiring a weight matched with the preset precondition, each feature of the at least one feature of the user to be identified and the attribute value of the feature from a pre-established model.
5. The method according to any one of claims 1-3, further comprising:
after the potential target users are determined, acquiring a successful user set for identifying the target users in the potential target users;
recalculating and updating the weights in the model by comparing the proportions of the users of each candidate attribute value associated with each feature in the successfully identified user set and the basic user information set.
6. The method according to any one of claims 1 to 3, wherein the obtaining of the attribute value of the at least one feature of the user to be identified from the pre-collected user information set comprises:
performing at least one of the following processes on a pre-collected set of user information: updating discrete original attribute values in the user information set into attribute values in each preset interval range for representing the original attribute values; setting the attribute value of the characteristic that the attribute value of the user in the user information set is empty as a preset default value; for each of the features, deleting attribute values whose corresponding weights are less than a predetermined threshold;
and acquiring the attribute value of at least one characteristic of the user to be identified from the processed user information set.
7. A user identification device, the device comprising:
the characteristic information acquisition unit is used for acquiring an attribute value of at least one characteristic of a user to be identified from pre-collected user portrait data;
a weight obtaining unit, configured to obtain, from a pre-established model, a weight that matches each of the at least one feature of the user to be identified and an attribute value of the feature, where the model includes the following information: a feature, a candidate attribute value associated with the feature, a weight associated with the feature and the candidate attribute value, the weight determined by comparing a proportion of users in pre-stored target user representation data and base user representation data for which the attribute value of the feature is equal to the candidate attribute value;
the adding unit is used for obtaining the sum of weights matched with the attribute values of each feature and each feature in the at least one feature of the user to be identified;
the identification unit is used for identifying whether the user to be identified is a potential target user or not according to the size of the sum of the weights;
and the pushing unit is used for pushing the message to the user based on the identification result.
8. The apparatus of claim 7, further comprising:
a weight determination unit, configured to determine, for each feature of the at least one feature and each attribute value associated with the feature, a weight associated with the feature and a candidate attribute value by comparing ratios of users whose attribute values of the features in a pre-stored target user information set and a base user information set are equal to the candidate attribute value, the weight determination unit including:
a proportion obtaining subunit, configured to obtain proportions, in the set, of users whose attribute values of the features in the target user information set and the basic user information set are equal to the candidate attribute values, respectively;
a weight determining subunit, configured to obtain an absolute value of a difference between the proportions of the target user information set and the base user information set, as a weight associated with the feature and the attribute value.
9. The apparatus of claim 8, wherein the weight determination unit further comprises:
a weight modifying subunit, configured to modify the weight according to a number of candidate attribute values associated with the feature, where a positive correlation exists between the modified weight and the number of candidate attribute values.
10. The apparatus according to any one of claims 7-9, wherein the model further comprises precondition information, and the weights associated with the feature and the attribute value in the model are weights associated with the precondition, the feature and the attribute value, wherein the weights are determined by the weight determination unit by comparing the pre-stored ratios of users whose attribute values of the feature in the target user information set and the base user information set satisfy the precondition are equal to the candidate attribute value; and
the user to be identified is a user to be identified which meets a preset precondition; and
the weight obtaining unit is further used for obtaining a weight matched with the preset precondition, each feature of the at least one feature of the user to be identified and the attribute value of the feature from a pre-established model.
11. The apparatus of any of claims 7-9, further comprising:
the successful sample acquisition unit is used for acquiring an identification successful user set of the target users in the potential target users after the potential target users are determined;
and the weight updating unit is used for recalculating and updating the weight in the model by comparing the proportion of the users of each candidate attribute value associated with each feature in the successfully identified user set and the basic user information set.
12. The apparatus according to any one of claims 7 to 9, wherein the characteristic information acquiring unit includes:
a preprocessing subunit, configured to perform at least one of the following processing on a pre-collected user information set: updating discrete original attribute values in the user information set into attribute values in each preset interval range for representing the original attribute values; setting the attribute value of the characteristic that the attribute value of the user in the user information set is empty as a preset default value; for each of the features, deleting attribute values whose corresponding weights are less than a predetermined threshold;
and the characteristic information extraction subunit is used for acquiring the attribute value of at least one characteristic of the user to be identified from the user information set after the processing of the preprocessing subunit.
CN201610197077.1A 2016-03-31 2016-03-31 User identification method and device Active CN105894028B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610197077.1A CN105894028B (en) 2016-03-31 2016-03-31 User identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610197077.1A CN105894028B (en) 2016-03-31 2016-03-31 User identification method and device

Publications (2)

Publication Number Publication Date
CN105894028A CN105894028A (en) 2016-08-24
CN105894028B true CN105894028B (en) 2020-01-10

Family

ID=57011752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610197077.1A Active CN105894028B (en) 2016-03-31 2016-03-31 User identification method and device

Country Status (1)

Country Link
CN (1) CN105894028B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294881A (en) * 2016-08-30 2017-01-04 五八同城信息技术有限公司 information identifying method and device
CN108768743B (en) * 2018-06-11 2021-07-20 北京奇艺世纪科技有限公司 User identification method and device and server
CN109377284B (en) * 2018-11-05 2021-08-24 南京尚网网络科技有限公司 Method and electronic equipment for pushing information
CN110059244A (en) * 2019-02-01 2019-07-26 阿里巴巴集团控股有限公司 Audient's extended method and device
CN110110084A (en) * 2019-04-23 2019-08-09 北京科技大学 The recognition methods of high quality user-generated content
CN111831894B (en) * 2019-04-23 2024-07-16 北京嘀嘀无限科技发展有限公司 Information matching method and device
CN110334936B (en) * 2019-06-28 2023-09-29 创新先进技术有限公司 Method, device and equipment for constructing credit qualification scoring model
CN111582906A (en) * 2020-03-26 2020-08-25 口碑(上海)信息技术有限公司 Target user information acquisition method and device and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104244314B (en) * 2014-07-30 2018-03-02 北京拓明科技有限公司 A kind of potential group customer recognition methods based on Mc interface signaling
CN105610768A (en) * 2014-11-25 2016-05-25 阿里巴巴集团控股有限公司 Method and device for processing network operation
CN104866626B (en) * 2015-06-15 2018-06-26 中国移动通信集团黑龙江有限公司 A kind of recommendation method and device of telecommunication service
CN106095916B (en) * 2016-06-08 2019-07-23 百度在线网络技术(北京)有限公司 Information-pushing method and device

Also Published As

Publication number Publication date
CN105894028A (en) 2016-08-24

Similar Documents

Publication Publication Date Title
CN105894028B (en) User identification method and device
US10777207B2 (en) Method and apparatus for verifying information
CN112613917B (en) Information pushing method, device, equipment and storage medium based on user portrait
CN109450771B (en) Method and device for adding friends, computer equipment and storage medium
US11038975B2 (en) Information pushing method and device
CN106354856B (en) Artificial intelligence-based deep neural network enhanced search method and device
US11244153B2 (en) Method and apparatus for processing information
CN113656699B (en) User feature vector determining method, related equipment and medium
CN111651666A (en) User theme recommendation method and device, computer equipment and storage medium
CN107767155B (en) Method and system for evaluating user portrait data
CN108932646A (en) User tag verification method, device and electronic equipment based on operator
CN114119123A (en) Information pushing method and device
CN109377284B (en) Method and electronic equipment for pushing information
CN113850669A (en) User grouping method and device, computer equipment and computer readable storage medium
CN112559868A (en) Information recall method and device, storage medium and electronic equipment
CN110163701B (en) Method and device for pushing information
CN115329214A (en) User recommendation method and device, electronic equipment and storage medium
CN113657552A (en) Data processing method and device, electronic equipment and storage medium
CN114186039A (en) Visual question answering method and device and electronic equipment
CN111784377B (en) Method and device for generating information
CN113076450A (en) Method and device for determining target recommendation list
CN113987328A (en) Topic recommendation method, equipment, server and storage medium
CN107895335B (en) Rights and interests protection method and application server
CN111897951A (en) Method and apparatus for generating information
CN111898033A (en) Content pushing method and device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant