CN112287208B - User portrait generation method, device, electronic equipment and storage medium - Google Patents

User portrait generation method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112287208B
CN112287208B CN201910940066.1A CN201910940066A CN112287208B CN 112287208 B CN112287208 B CN 112287208B CN 201910940066 A CN201910940066 A CN 201910940066A CN 112287208 B CN112287208 B CN 112287208B
Authority
CN
China
Prior art keywords
data
log data
weight
determining
target user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910940066.1A
Other languages
Chinese (zh)
Other versions
CN112287208A (en
Inventor
余鑫
王蒙
王发庆
于亚男
阚景森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN201910940066.1A priority Critical patent/CN112287208B/en
Publication of CN112287208A publication Critical patent/CN112287208A/en
Application granted granted Critical
Publication of CN112287208B publication Critical patent/CN112287208B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a user portrait generation method, a user portrait generation device, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring log data of a target user in one or more dimensions; determining feature labels of the target user in the one or more dimensions according to the time distribution of the log data; generating the portrait of the target user according to the feature tag; wherein said determining feature labels of said target user in said one or more dimensions from a temporal distribution of said log data comprises: under each dimension, determining the weight of each data in the log data according to the time distribution of the log data; and determining the feature labels of the target user in each dimension according to the weight of each data in the log data. The user portrait generating method and device can improve the accuracy of user portrait generation, and therefore the generated user portrait is comprehensive.

Description

User portrait generation method, device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a user portrait generating method, apparatus, electronic device, and computer readable storage medium.
Background
Along with the increasingly wide popularization and application of the Internet in various industries, enterprises in multiple fields such as electronic commerce, internet finance, life service, games and the like aim to collect and analyze information data such as static attribute, social attribute, behavior attribute and the like of users through the Internet so as to abstract user portraits, thereby mining user demands and providing more targeted products or services for the users.
In the existing user portrait generation methods, common information of users is combined, for example, common payment modes, common ordering equipment, common harvest addresses and the like of the users are counted, and the common payment modes, the common ordering equipment, the common harvest addresses and the like are used as characteristic labels of the users and combined into a portrait of the users. However, the method characterizes the characteristics of the user through common information, and the common information cannot represent the current state of the user in some cases, so that the generated user portrait is relatively one-sided and lacks objectivity; in addition, the method processes the user information only by counting the common information, the method is single, the flexibility is poor, the change rule of the user information cannot be mined, the generated user portrait is too superficial, and the accuracy is low.
It should be noted that the information disclosed in the foregoing background section is only for enhancing understanding of the background of the present application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
The application provides a user portrait generation method, a device, electronic equipment and a computer readable storage medium, which solve the problem of poor accuracy of a user portrait generated by the existing user portrait generation method.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned in part by the practice of the application.
According to one aspect of the present application, there is provided a user portrait creation method including: acquiring log data of a target user in one or more dimensions; determining feature tags of the target user in the one or more dimensions according to the time distribution of the log data; and generating the portrait of the target user according to the feature tag.
In an exemplary embodiment of the present application, the determining, according to the time distribution of the log data, the feature labels of the target user in the one or more dimensions includes: under each dimension, determining the weight of each data in the log data according to the time distribution of the log data; determining a feature tag of the target user in each dimension according to the weight of each data in the log data; wherein the determining the feature tag of the target user in each dimension according to the weight of each data in the log data includes: and under each dimension, determining the feature tag of the target user under the dimension according to the data with the largest weight in each piece of data of the log data.
In an exemplary embodiment of the present application, determining, in each dimension, a weight of each data in the log data according to a time distribution of the log data includes: under each dimension, counting the log data according to a preset period to obtain a period ordinal corresponding to each data in the log data; and determining the weight of each data according to the period ordinal corresponding to each data in the log data.
In an exemplary embodiment of the present application, the determining the weight of the log data according to the ordinal number of the preset period corresponding to each data in the log data includes: determining the weight of each data through an exponential function based on the period ordinal corresponding to each data in the log data; wherein the period ordinal number is an index of the exponential function, and the base of the exponential function is a constant.
In an exemplary embodiment of the present application, the method further comprises: determining occurrence frequency of each data in the log data; the determining the weight of each data according to the period ordinal corresponding to each data in the log data comprises the following steps: for any one of the log data D i Data D is calculated by the following formula i Weight of (2): wherein B represents a weight, S i1 、S i2 、…、S im For data D i Corresponding cycle number, freq (D i ) For data D i And the occurrence frequency in the log data is k which is an exponential constant.
In an exemplary embodiment of the present application, the obtaining log data of the target user in one or more dimensions includes: and acquiring log data of the target user in the one or more dimensions and within a preset time range.
According to one aspect of the present application, there is provided a user portrait creation apparatus including: determining a plurality of feature labels corresponding to each dimension of the target user under the one or more dimensions according to the time distribution of the log data; the generating the representation of the target user according to the feature tag comprises: generating a portrait of the target user according to the plurality of feature labels corresponding to each dimension; wherein, the label determining module includes: the weight determining unit is used for determining the weight of each data in the log data according to the time distribution of the log data under each dimension; and the tag processing unit is used for determining the characteristic tag of the target user in each dimension according to the weight of each data in the log data.
According to one aspect of the present application, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any of the above via execution of the executable instructions.
According to one aspect of the present application, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the above.
Exemplary embodiments of the present application have the following beneficial effects:
and obtaining log data of the target user, and determining characteristic labels according to time distribution of the log data, so as to generate portrait of the target user. On the one hand, in the present exemplary embodiment, because the time distribution corresponding to each data may be different, determining the feature tag according to the time distribution of the log data may enable the generated feature tag to objectively reflect the difference of the user feature data under different time distributions, and generate a relatively comprehensive user portrait; on the other hand, the user portrait is determined by combining the time distribution, so that the factors considered by the user portrait during generation are more abundant, and the change rule of each data in the log data of the target user can be effectively determined according to the time distribution, so that the feature tag closest to the current state is determined, and the accuracy of generating the user portrait is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 schematically shows a flowchart of a user portrait creation method in the present exemplary embodiment;
FIG. 2 schematically illustrates a sub-flowchart of a user portrait creation method in the present exemplary embodiment;
fig. 3 schematically shows a flowchart of another user portrait creation method in the present exemplary embodiment;
fig. 4 is a block diagram schematically showing a configuration of a user portrait creation apparatus in the present exemplary embodiment;
fig. 5 schematically shows an electronic device for implementing the above method in the present exemplary embodiment;
fig. 6 schematically shows a computer readable storage medium for implementing the above method in the present exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The exemplary embodiment of the application provides a user portrait generation method, wherein user portrait refers to tagging of user information, and user overall view, namely user portrait, is abstracted by collecting and analyzing various information data of a user. The user portraits can be used in big data applications such as personalized recommendations, and the application scenes can be used in various application programs such as financial services, electronic commerce, life services, games, social networks, music and the like, and the application is not particularly limited.
The following describes the present exemplary embodiment with reference to fig. 1, and as shown in fig. 1, the user portrait creation method may include the following steps S110 to S130:
Step S110, obtaining log data of the target user in one or more dimensions.
The target user may be a user who needs to perform portrayal, and the log data may be raw information data about the target user, such as age, address, occupation, payment mode, hobbies, etc. of the target user. Dimensions refer to categories of target user information contained or reflected in the log data, typically one dimension is an aspect reflecting target user attributes, e.g., dimensions may include personal information attributes, credit attributes, consumption feature attributes, or social information attributes, among others. In the present exemplary embodiment, the server may acquire log data of the user terminal in real time, or may acquire log data from a specific database, for example, HDFS (Hadoop Distributed File System, distributed file system). In log data, the target user is typically characterized by a unique identification, such as the user's cell phone number, app (Application) account number, IP address (Internet Protocol, internet protocol address), etc. For example, after the user a logs in to the App, in order to generate a user portrait of the user a, an App account of the user a may be identified, and corresponding log data may be searched in the log database according to the account. When the log data is acquired, all the log data can be acquired so as to comprehensively count the information of the target user, and the log data in a certain time range can also be acquired so as to reduce the data quantity required to be processed subsequently.
Step S120, determining feature labels of the target user in one or more dimensions according to the time distribution of the log data.
The time distribution of the log data refers to the distribution characteristics of the log data on different time nodes, and may be represented as a time-value change trend of the log data, or may be represented as occurrence frequency, timeliness, and the like of each data in the log data. In this exemplary embodiment, the occurrence frequency refers to the frequency of occurrence of each data in all log data, and the timeliness refers to the distance between each data and the current time, and the closer to the current time, the more timeliness of the data is represented, and the higher the occurrence frequency, the more timeliness of the data is generally represented by the more current state of the target user. Feature labels refer to data information that is most capable of representing real information or current state of a target user in each dimension, and are generally abstract summaries of features of the target user. The feature tag may be original data in the log data, for example, a receiving address "xx-city xx region xx-way x number" of the target user, or key words or abstract information of the log data, for example, "xx region" or "xx-way", or information data calculated based on the log data, for example, a month average consumption of the user, and the like.
In the present exemplary embodiment, the log data may be processed by a preset processing manner according to the time distribution of the log data. For example, the log data may be converted into feature vectors according to time sequence, and the feature vectors are input into a pre-trained LSTM (Long Short-Term Memory network) to output corresponding feature labels; the log data can be drawn into a graph line with time and numerical values as coordinates, and the graph line is subjected to function fitting, and corresponding characteristic labels are determined according to the fitting result; the log data can be calculated by using a preset calculation formula to obtain corresponding characteristic labels and the like.
In an exemplary embodiment, the time granularity may be determined according to the time distribution of the log data of the target user, and the log data may be divided, where the divided log data belongs to a respective corresponding time granularity, for example, the time granularity is taken as a month, and each log data may be divided into corresponding months according to the time distribution of the log data, so that the time distribution of the log data is represented as a distribution feature of the data in different months. And then analyzing the log data of each month, mining the distribution rule of the log data, and further determining the characteristic labels of the target user in one or more dimensions.
In an exemplary embodiment, the occurrence frequency and timeliness of each data in the log data may be counted, for example, the proportion of each data in all the data may be counted, as the occurrence frequency, the current time length of each data from the current time may be counted, as the timeliness, then the weight of each data is obtained by combining two indexes through calculation modes such as addition, multiplication, averaging and the like, then the weight of each data is calculated, and the result of the weight calculation is determined as a feature tag, for example, if the log data is the consumption level of the target user, the obtained calculation result may be used as the feature tag of the target user by carrying out the weight calculation on the consumption level under each time distribution. When the log data is text information, the log data may be subjected to a numerical processing and then subjected to a weight calculation.
And step S130, generating the portrait of the target user according to the feature tag.
In the present exemplary embodiment, the determined feature tags may be clustered according to different classification criteria to generate a representation of the target user. For example, sorting from the content of the feature tag may include address, category of merchandise purchased, rating, etc.; classification from the format of the feature tag may include short tags (e.g., within 5 characters), long tags (e.g., above 5 characters), and so forth; the classification from the form of the feature tag may include an english tag, a simplified chinese tag, a traditional chinese tag, etc., and the classification may have other standards, which are not particularly limited in this application.
In an exemplary embodiment, the step S120 may include the steps of:
step S121, under each dimension, determining the weight of each data in the log data according to the time distribution of the log data;
step S122, determining the feature labels of the target user in each dimension according to the weight of each data in the log data.
In order to effectively determine the feature tag of the target user, analysis of log data may be performed in the same dimension, for example, analysis of log data representing an address, analysis of log data of a payment method, and the like. In other embodiments, log data of different dimensions may be analyzed, for example, log data of multiple dimensions such as address, payment method, or hobbies.
The data in the log data may be specific data of original information data of the target user, for example, the data in the age log data may be specific data information of 28 years old, 30 years old, etc., or the data in the preference log data may be specific data information of basketball, game, running, etc. The determination of the target user feature tag can be influenced by considering the different weights occupied by each data in log data distributed at different times. Therefore, in the present exemplary embodiment, the weight of each data in the log data may be set according to the time distribution of the log data. The data with shorter time distribution in the log data can be provided with larger weight, and the data with longer time distribution can be provided with smaller weight; or the weight of the log data distributed in a certain interval can be set to be increased according to a certain function type, the weight of the log data distributed in a more recent time is increased more greatly, and the weight of the log data can be uniformly set to be smaller when the time is distributed in a more distant interval. The weight calculation method may include various forms, and may be set by a specific calculation formula or function, such as a negative exponential function, a step function, etc., or by artificial experience.
The feature labels of the target user in each dimension can be determined in a variety of ways according to the calculated weights. In this exemplary embodiment, each data in the log data has a weight corresponding to the data, a preset standard may be directly set, and data with a weight reaching the preset standard may be selected as a feature tag, for example, data with the preset standard being the largest weight may be set as the feature tag, or after ranking according to the weight, data with the top three ranks may be set as the feature tag. A weight mapping table may also be determined to record the mapping between data and weights. Through the weight mapping table, feature tags corresponding to weights can be determined according to the weight size ordering, and when the weights are weighted, data corresponding to calculation results can be searched in the weight mapping table and determined as feature tags and the like.
Based on the above description, in the present exemplary embodiment, a portrait of a target user is generated by acquiring log data of the target user and determining feature tags from the time distribution of the log data. On the one hand, in the present exemplary embodiment, because the time distribution corresponding to each data may be different, determining the feature tag according to the time distribution of the log data may enable the generated feature tag to objectively reflect the difference of the user feature data under different time distributions, and generate a relatively comprehensive user portrait; on the other hand, the user portrait is determined by combining the time distribution, so that the factors considered by the user portrait during generation are more abundant, and the change rule of each data in the log data of the target user can be effectively determined according to the time distribution, so that the feature tag closest to the current state is determined, and the accuracy of generating the user portrait is improved.
In an exemplary embodiment, the step S122 may include:
and under each dimension, determining the feature tag of the target user under the dimension according to the data with the largest weight in each piece of data of the log data.
In consideration of the fact that the log data of the target user changes with time, in step S121, the weight of each data can be determined according to the time distribution, and then the feature tag of the target user in each dimension can be determined according to the weight, so that the reliability of generating the user portrait is higher. In the present exemplary embodiment, the data with the greatest weight may be determined as the feature tag of the target user in each dimension. For example, when the log data of the target user is the receiving address, if only the receiving address is considered, the target user may need to change the receiving address with time going to the house or going to business for a long time, as shown in table 1, the receiving address of "beijing city sealake area" before 2018-06 of the user a occurs more frequently, and the new address "beijing city west city area" starts to occur at 2018-06, but the frequency of occurrence is lower than that of "beijing city sealake area", and if the frequency of occurrence is only used as the standard for determining the feature tag, the generation of the user portrait is inaccurate. Thus, the present exemplary embodiment may set higher weights for the receiving addresses that are closer in time according to the time distribution of the receiving addresses, where the receiving address with the largest weight may be determined as the feature tag, for example, in table 1, "the western city of beijing city" may be determined as the feature tag of the target user in the receiving address dimension.
TABLE 1
Goods receiving address Time of order Number of times of order
User A Haidian District, Beijing 2017-10 11
User A Haidian District, Beijing 2017-11 9
User A Haidian District, Beijing 2018-02 4
User A Xicheng District, Beijing 2018-06 7
User A Xicheng District, Beijing 2018-08 4
In an exemplary embodiment, step S121 may include the steps of:
step S210, under each dimension, counting log data according to a preset period to obtain a period ordinal corresponding to each data in the log data;
step S220, the weight of each data is determined according to the corresponding periodic number of each data in the log data.
The preset period may be a time granularity of the log data, for example, the preset period may be one week or one month. According to the time distribution of the log data, the log data can be counted into corresponding periods, for example, in table 1, when the preset period is a month, the log data with the receiving address of "the western city area of beijing city" can be counted into the periods of 2018-08 and 2018-06 months, the log data with the receiving address of "the sealake area of beijing city" can be counted into the periods of 2018-02, 2017-11 and 2017-10 months respectively, and the data of other months are not counted specifically here. The cycle number is a cycle number obtained by sorting the preset cycles, and may be an inverse order or a positive order, for example, when the time period is a month, the nearest month may be set to the minimum number, and according to the time sequence, the further the month is, the larger the cycle number is, or may be sequentially sorted from the month with the farthest time in a certain time interval until the nearest month is, for example, when the time period is a month in a time interval of one year, the number of the first month may be set to 1, and the number of the second month may be set to 12.
In the present exemplary embodiment, the weight of each data of the log data is determined according to a preset period number, and there may be various manners. Wherein the weight of the log data may be calculated according to a specific function, such as a negative exponential function; or may be empirically assigned, weights may be calculated for each cycle ordinal, and so on. For example, a set of six-month period ordinal numbers is taken, the period ordinal numbers can be ordered from far to near to 1-6, and initial values a are respectively assigned to each period ordinal number i =i,i∈[1,6]Considering that the weight of the month closest to the month is large, the initial value of each period ordinal number is inverted, the weight of each cycle ordinal is then calculated by the following formula.
In an exemplary embodiment, the weight of the log data may also be determined by:
in consideration of the influence of the time distribution of each data in the user log data on the user portraits, the time factor is increased when the weight of each data is calculated. For example, the user has changed the shipping address due to moving the home, and therefore, the most recent shipping address may be recently updated and used frequently later. In order to avoid the problem of feature tag extraction caused by the fact that the previous historical log data occupy larger weight, the data with the closer time can be set to have higher weight, the period ordinal number of the time is set to be x, the period ordinal number which is closer to the current time is smaller, the period ordinal number which is farther away is larger, the user log data in two years is analyzed, and the weight of each data in the log data can meet the formula:
B(x)>B(x+1); (2)
I.e. for data that is farther away from now, the smaller its weight, while for data that is closer, the greater its weight.
Meanwhile, the influence of data which occasionally occurs to the user on the weight calculation result is considered, for example, the user occasionally helps other people to purchase goods so that the receiving address changes. In calculating the weights, this should be excluded, and the weights of the latest time period may be limited to be lower than the sum of weights of the data that continuously appear before that, and the weights of the data may be made to satisfy the formula:
even if the weight of each data in the log data meeting the time period of the latest data is smaller than the sum of the weights of the time periods of the near t, t depends on the specific characteristics of the log data, for example, when the log data is a receiving address, the setting time is too long, the historical receiving address has no reference meaning, and the time is too short, so that the judged characteristic label can be possibly wrong, and therefore, a moderate tolerance time, such as 6 months, can be comprehensively set.
Further, if the same data occurs in the most recent consecutive time period in the log data, the weight of the data may be greater than the weight of the index of the previous consecutive T time periods. Therefore, for any data continuously appearing in two time periods, the weight of the data is larger than the sum of the weights of the previous time periods, and the degree of influence of the time factors on weight calculation cannot be too small, so that the decreasing speed of the weight calculation function cannot be too slow, and the weight of the data can meet the formula:
In the present exemplary embodiment, data of a longer time period than a certain time range may be also set, and the weight thereof may be substantially unchanged, for example, the weight of the receiving address 20 months from now and the weight 21 months from now may be considered to be substantially identical. Therefore, the above formula (4) can be satisfied not for all the ordinals of the time periods, but only for ordinals of the time periods within a certain time range, and for ordinals of the time periods larger than the certain time range, the decreasing function can be satisfied
According to the above description, in an exemplary embodiment, step S320 may include:
determining the weight of each data through an exponential function based on the period ordinal corresponding to each data in the log data; wherein the period ordinal number is an index of an exponential function, and the base of the exponential function is a constant.
In the present exemplary embodiment, the weight of each data in the log data can be calculated by the formula (5):
B(x)=a -bx+c ; (5)
namely, the conditions described in the above formulas (2), (3) and (4) can be satisfied. The above x is a period number, a, b, and c are constant parameters, where the period number may be set in a self-defined manner, if each month is set to be a time period, the time is ordered from far to near in half a year, and the value range of x may be [1,6], where it is to be noted that the value range is only schematically illustrated, and the value of a specific period number may be determined according to a time period calculated in actual need, which is not specifically limited in the present disclosure.
In an exemplary embodiment, the user portrait creation method may further include the steps of:
determining occurrence frequency of each data in the log data;
step S220 may include:
for any one of the log data D i Data D is calculated by the following equation (6) i Weight of (2):
wherein B represents a weight, S i1 、S i2 、…、S im For data D i Corresponding cycle number, freq (D i ) For data D i In the log data, k is an exponential constant, and in the present exemplary embodiment, the exponential constant k may be determined by an inequality established by the above formula (3) and formula (4), since the formula (3) requires that the decreasing rate of the function cannot be too high, that is, the absolute value of the first derivative of the function cannot be too high; whereas equation (4) requires that the rate of decrease of the function is not too small, i.e. the absolute value of the first derivative of the function is not too small, so that the choice of k value is determined by the values of T, T under inequality constraints. For example, when t=6, t=22, k=0.66. It should be noted that, a range of k values may be determined by the constraint of the inequality, and the final k value may be determined based on the range of k values, for example, a minimum value in the range of k values may be determined as the k value.
In order to make the generation of user portraits more accurate, the data weights can be determined jointly based on the frequency of occurrence and time of each data in the log data. The occurrence frequency may refer to an index of distribution of the number of each data in the log data, in this exemplary embodiment, the occurrence frequency of each data may be obtained through a plurality of statistical manners, for example, table 2 is an exemplary log data list of a user in 2018 from 4 months to 8 months, and shows the occurrence condition of each data when the log data is a receiving address, where the statistical method of the occurrence frequency of the data "beijing city" may be a ratio of the occurrence frequency of "beijing city" to the data of all harvest addresses occurring in 5 months, as shown in table 2 is 4/10; the month may be taken as a time period, and the proportion of the time period of the Beijing city, the Chaoyang district, to the total time period is counted, so that the occurrence frequency of the Beijing city, the Chaoyang district is 4/5 in 8 months, 7 months, 5 months and 4 months of the time period as shown in the table 2. Other statistical manners are also possible according to the specific situation of the log data in each time period, and the application is not particularly limited.
TABLE 2
Date of day Goods receiving address
2018, 8, 20 days Haidian District, Beijing
2018, 8, 12 days Xicheng District, Beijing
2018, 8, 5 Chaoyang District, Beijing
2018, 7 and 22 days Haidian District, Beijing
2018, 7, 15 days Chaoyang District, Beijing
2018, 6 and 16 days Haidian District, Beijing
2018, 5, 28 days Xicheng District, Beijing
2018, 5, 22 days Chaoyang District, Beijing
2018, 4 and 12 days Xicheng District, Beijing
2018, 4/2 Chaoyang District, Beijing
In formula (6), S i1 、S i2 、…、S im Representing data D i In the present exemplary embodiment, a time interval may be set, and the period number closest to the current time is the smallest according to the distance of the time, and the longer the time is, the larger the period number is. Thus, the data with more recent time distribution in the log data has a large weight and the data with more distant time distribution has a small weight, and in addition, if a certain data time distribution is not very close to the current time but has a very high occurrence frequency, the weight thereof may be higher, considering the influence of the data occurrence frequency factor, in the formula (6), the occurrence frequency freq (D i ) As coefficients, the final calculation result of each data weight is adjusted.
In an exemplary embodiment, step S110 may include: and acquiring log data of the target user in one or more dimensions and within a preset time range.
In the present exemplary embodiment, when log data of a target user is acquired, data filtering may be performed on the log data. The preset time can be a time range for screening the log data, for example, the preset time is set to be 12 months, then the log data of about 12 months can be obtained from a large amount of log data and aggregated, the user portrait is generated, and the preset time can play a role in filtering the obtaining of the log data. The preset time can be set according to different dimensions, for example, under the dimension of the address, the preset time can be set longer (for example, 24 months) in consideration of the fact that the frequency of changing the address by the user is lower; or in the interest dimension, the change frequency is higher in consideration of the influence of the network family and the like, so that the preset time can be set to be shorter (such as 6 months)
Fig. 3 schematically illustrates a flowchart of another user portrait creation method in this exemplary embodiment, firstly, step S310 is performed to obtain log data of a target user, then step S320 is performed to obtain log data of the target user in one or more dimensions, then step S330 may be performed to data-filter the log data to determine log data in a preset time range, then step S340 is performed to determine weights of each data in the log data of the target user in one or more dimensions according to time distribution of the log data, and finally step S350 is performed to determine feature labels according to the weights of each data, thereby completing creation of a user portrait.
The exemplary embodiment of the application also provides a user portrait generating device. Referring to fig. 4, the apparatus 400 may include a data acquisition module 410, a tag determination module 420, and a representation generation module 430. The data acquisition module 410 is configured to acquire log data of the target user in one or more dimensions; the tag determining module 420 is configured to determine, according to the time distribution of the log data, feature tags of the target user in one or more dimensions; the portrait generation module 430 is used for generating a portrait of the target user according to the feature tag; wherein the tag determination module 420 may include: a weight determining unit 421 for determining the weight of each data in the log data according to the time distribution of the log data in each dimension; the tag processing unit 422 is configured to determine a feature tag of the target user in each dimension according to the weight of each data in the log data.
In this exemplary embodiment, the tag determining module may be configured to determine, in each dimension, a feature tag of the target user in the dimension according to data with the greatest weight in each data of the log data.
In the present exemplary embodiment, the weight determining unit may include: the period statistics subunit is used for counting the log data according to a preset period under each dimension to obtain a period ordinal corresponding to each data in the log data; and the weight determining subunit is used for determining the weight of each data according to the period ordinal corresponding to each data in the log data.
In the present exemplary embodiment, the weight determining unit may be configured to determine the weight of each data by an exponential function based on the cycle number corresponding to each data in the log data; wherein the period ordinal number is an index of an exponential function, and the base of the exponential function is a constant.
In the present exemplary embodiment, the user portrait creation apparatus may further include: the frequency determining unit is used for determining the occurrence frequency of each data in the log data; the weight determination unit may be used for any one of the log data D i Data D is calculated by the following formula i Weight of (2): wherein B represents a weight, S i1 、S i2 、…、S im For data D i Corresponding cycle number, freq (D i ) For data D i Frequency of occurrence in log data, k is an exponential constant.
In this exemplary embodiment, the data acquisition module may be configured to acquire log data of the target user in one or more dimensions and within a preset time range.
The specific details of the above modules/units have been described in the corresponding method portion embodiments, and thus are not repeated here.
The exemplary embodiment of the application also provides electronic equipment capable of realizing the method.
Those skilled in the art will appreciate that the various aspects of the present application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to such an exemplary embodiment of the present application is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, a bus 530 connecting the different system components (including the memory unit 520 and the processing unit 510), and a display unit 540.
Wherein the storage unit stores program code that is executable by the processing unit 510 such that the processing unit 510 performs steps according to various exemplary embodiments of the present application described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 510 may execute steps S110 to S130 shown in fig. 1, may execute steps S210 to S220 shown in fig. 2, or the like.
The storage unit 520 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 521 and/or cache memory 522, and may further include Read Only Memory (ROM) 523.
The storage unit 520 may also include a program/utility 524 having a set (at least one) of program modules 525, such program modules 525 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 530 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 500, and/or any device (e.g., router, modem, etc.) that enables the electronic device 500 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 550. Also, electronic device 500 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 560. As shown, network adapter 560 communicates with other modules of electronic device 500 over bus 530. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 500, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solutions according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and include several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the exemplary embodiments of the present application.
Exemplary embodiments of the present application also provide a computer readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification. In some possible implementations, the various aspects of the present application may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the present application as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.
Referring to fig. 6, a program product 600 for implementing the above-described method according to an exemplary embodiment of the present application is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described figures are only illustrative of the processes involved in the method according to exemplary embodiments of the present application, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with exemplary embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (7)

1. A user portrait creation method, comprising:
acquiring log data of a target user in one or more dimensions;
determining feature tags of the target user in the one or more dimensions according to the time distribution of the log data;
generating a portrait of the target user according to the feature tag;
wherein said determining feature labels of said target user in said one or more dimensions from a temporal distribution of said log data comprises:
under each dimension, determining the weight of each data in the log data according to the time distribution of the log data;
determining a feature tag of the target user in each dimension according to the weight of each data in the log data;
the determining the weight of each data in the log data according to the time distribution of the log data in each dimension comprises the following steps:
under each dimension, counting the log data according to a preset period to obtain a period ordinal corresponding to each data in the log data;
Determining the weight of each data according to the corresponding periodic ordinal number of each data in the log data;
the method further comprises the steps of:
determining occurrence frequency of each data in the log data;
the determining the weight of each data according to the period ordinal corresponding to each data in the log data comprises the following steps:
for any one of the log data D i Data D is calculated by the following formula i Weight of (2):
wherein B represents a weight, S i1 、S i2 、…、S im For data D i Corresponding cycle number, freq (D i ) For data D i And the occurrence frequency in the log data is k which is an exponential constant.
2. The method of claim 1, wherein said determining feature labels for the target user in each of the dimensions based on weights for each of the log data comprises:
and under each dimension, determining the feature tag of the target user under the dimension according to the data with the largest weight in each piece of data of the log data.
3. The method of claim 1, wherein determining the weight of the log data according to the ordinal number of the preset period corresponding to each data in the log data comprises:
determining the weight of each data through an exponential function based on the period ordinal corresponding to each data in the log data; wherein the period ordinal number is an index of the exponential function, and the base of the exponential function is a constant.
4. The method of claim 1, wherein the obtaining log data of the target user in one or more dimensions comprises:
and acquiring log data of the target user in the one or more dimensions and within a preset time range.
5. A user portrait creation apparatus, comprising:
the data acquisition module is used for acquiring log data of a target user in one or more dimensions;
the tag determining module is used for determining feature tags of the target user in one or more dimensions according to the time distribution of the log data;
the portrait generation module is used for generating a portrait of the target user according to the feature tag;
wherein, the label determining module includes:
the weight determining unit is used for determining the weight of each data in the log data according to the time distribution of the log data under each dimension;
the tag processing unit is used for determining the characteristic tag of the target user in each dimension according to the weight of each data in the log data;
a weight determination unit configured to:
under each dimension, counting the log data according to a preset period to obtain a period ordinal corresponding to each data in the log data;
Determining the weight of each data according to the corresponding periodic ordinal number of each data in the log data;
the apparatus is further configured to:
determining occurrence frequency of each data in the log data;
the determining the weight of each data according to the period ordinal corresponding to each data in the log data is configured to:
for any one of the log data D i Data D is calculated by the following formula i Weight of (2):
wherein B represents a weight, S i1 、S i2 、…、S im For data D i Corresponding cycle number, freq (D i ) For data D i At the log numberAccording to the occurrence frequency in the data, k is an exponential constant.
6. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1-4 via execution of the executable instructions.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-4.
CN201910940066.1A 2019-09-30 2019-09-30 User portrait generation method, device, electronic equipment and storage medium Active CN112287208B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910940066.1A CN112287208B (en) 2019-09-30 2019-09-30 User portrait generation method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910940066.1A CN112287208B (en) 2019-09-30 2019-09-30 User portrait generation method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112287208A CN112287208A (en) 2021-01-29
CN112287208B true CN112287208B (en) 2024-03-01

Family

ID=74418878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910940066.1A Active CN112287208B (en) 2019-09-30 2019-09-30 User portrait generation method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112287208B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035317A (en) * 2021-03-16 2021-06-25 北京懿医云科技有限公司 User portrait generation method and device, storage medium and electronic equipment
CN113051914A (en) * 2021-04-09 2021-06-29 淮阴工学院 Enterprise hidden label extraction method and device based on multi-feature dynamic portrait

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005587A (en) * 2015-06-26 2015-10-28 深圳市腾讯计算机系统有限公司 User portrait updating method, apparatus and system
CN105608171A (en) * 2015-12-22 2016-05-25 青岛海贝易通信息技术有限公司 User portrait construction method
CN106446045A (en) * 2016-08-31 2017-02-22 上海交通大学 Method and system for building user portrait based on conversation interaction
WO2017092444A1 (en) * 2015-12-02 2017-06-08 中兴通讯股份有限公司 Log data mining method and system based on hadoop
CN108154401A (en) * 2018-01-15 2018-06-12 网易无尾熊(杭州)科技有限公司 User's portrait depicting method, device, medium and computing device
CN109063059A (en) * 2018-07-20 2018-12-21 腾讯科技(深圳)有限公司 User behaviors log processing method, device and electronic equipment
CN109767300A (en) * 2019-01-14 2019-05-17 博拉网络股份有限公司 Big data portrait and model building method based on user's habit

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005587A (en) * 2015-06-26 2015-10-28 深圳市腾讯计算机系统有限公司 User portrait updating method, apparatus and system
WO2017092444A1 (en) * 2015-12-02 2017-06-08 中兴通讯股份有限公司 Log data mining method and system based on hadoop
CN105608171A (en) * 2015-12-22 2016-05-25 青岛海贝易通信息技术有限公司 User portrait construction method
CN106446045A (en) * 2016-08-31 2017-02-22 上海交通大学 Method and system for building user portrait based on conversation interaction
CN108154401A (en) * 2018-01-15 2018-06-12 网易无尾熊(杭州)科技有限公司 User's portrait depicting method, device, medium and computing device
CN109063059A (en) * 2018-07-20 2018-12-21 腾讯科技(深圳)有限公司 User behaviors log processing method, device and electronic equipment
CN109767300A (en) * 2019-01-14 2019-05-17 博拉网络股份有限公司 Big data portrait and model building method based on user's habit

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于大数据的电子商务用户画像构建研究;李佳慧 等;电子商务;全文 *

Also Published As

Publication number Publication date
CN112287208A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
US10915706B2 (en) Sorting text report categories
US11082509B1 (en) Determining session intent
CN112215448A (en) Method and device for distributing customer service
CN110766486A (en) Method and device for determining item category
CN112287208B (en) User portrait generation method, device, electronic equipment and storage medium
CN107679916A (en) For obtaining the method and device of user interest degree
CN111429214B (en) Transaction data-based buyer and seller matching method and device
WO2022156589A1 (en) Method and device for determining live broadcast click rate
CN110866698A (en) Device for assessing service score of service provider
CN112950359B (en) User identification method and device
CN113760521A (en) Virtual resource allocation method and device
CN110490682B (en) Method and device for analyzing commodity attributes
CN109523296B (en) User behavior probability analysis method and device, electronic equipment and storage medium
CN113792039B (en) Data processing method and device, electronic equipment and storage medium
CN115827994A (en) Data processing method, device, equipment and storage medium
CN114862479A (en) Information pushing method and device, electronic equipment and medium
CN113326436A (en) Method and device for determining recommended resources, electronic equipment and storage medium
CN112529646A (en) Commodity classification method and device
CN110766488A (en) Method and device for automatically determining theme scene
CN114547448B (en) Data processing method, model training method, device, equipment, storage medium and program
CN113554460A (en) Method and device for identifying potential user
CN113763070A (en) Information recommendation method and device
CN112966210A (en) Method and device for storing user data
CN113011922A (en) Similar population determination method and device, electronic equipment and storage medium
CN117633351A (en) Real-time recall method, device, equipment and storage medium in recommendation process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant