CN112287208A - User portrait generation method and device, electronic equipment and storage medium - Google Patents
User portrait generation method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112287208A CN112287208A CN201910940066.1A CN201910940066A CN112287208A CN 112287208 A CN112287208 A CN 112287208A CN 201910940066 A CN201910940066 A CN 201910940066A CN 112287208 A CN112287208 A CN 112287208A
- Authority
- CN
- China
- Prior art keywords
- data
- log data
- weight
- determining
- target user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000009826 distribution Methods 0.000 claims abstract description 51
- 230000006870 function Effects 0.000 claims description 28
- 230000015654 memory Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 14
- 230000008859 change Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013507 mapping Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 238000003306 harvesting Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a user portrait generation method and device, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring log data of a target user in one or more dimensions; determining feature labels of the target user under the one or more dimensions according to the time distribution of the log data; generating a portrait of the target user according to the feature tag; wherein the determining the feature labels of the target user in the one or more dimensions according to the time distribution of the log data comprises: under each dimensionality, determining the weight of each data in the log data according to the time distribution of the log data; and determining the feature labels of the target user under the dimensions according to the weight of each datum in the log data. The method and the device can improve the accuracy of user portrait generation, so that the generated user portrait is relatively comprehensive.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a user portrait generation method and apparatus, an electronic device, and a computer-readable storage medium.
Background
With the increasingly wide popularization and application of the internet in various industries, enterprises in various fields such as e-commerce, internet finance, life service, games and the like are all dedicated to collecting and analyzing information data such as static attributes, social attributes, behavior attributes and the like of users through the internet to abstract user figures, so that user demands are mined, and more targeted products or services are provided for the users.
Most of the existing user portrait generation methods combine the common information of the user, for example, count the common payment methods, common ordering devices or common harvest addresses of the user, and combine the information as the feature tags of the user to compose the portrait of the user. However, the method represents the characteristics of the user through the common information, and the common information cannot represent the current state of the user in some cases, so that the generated user portrait is relatively smooth and lacks objectivity; in addition, the method only processes the user information in a mode of counting the common information, the mode is single, the flexibility is poor, the change rule of the user information cannot be mined, the generated user portrait is over-surfaced, and the accuracy is low.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present application and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.
Disclosure of Invention
The application provides a user portrait generation method, a user portrait generation device, electronic equipment and a computer readable storage medium, and solves the problem that the accuracy of a user portrait generated by an existing user portrait generation method is poor.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to one aspect of the present application, there is provided a user representation generation method, comprising: acquiring log data of a target user in one or more dimensions; determining feature labels of the target user in the one or more dimensions according to the time distribution of the log data; and generating the portrait of the target user according to the feature tag.
In an exemplary embodiment of the present application, the determining, according to the time distribution of the log data, the feature labels of the target user in the one or more dimensions includes: under each dimensionality, determining the weight of each data in the log data according to the time distribution of the log data; determining feature labels of the target user under the dimensions according to the weight of each datum in the log data; determining feature labels of the target user under the dimensions according to the weight of each datum in the log data comprises: and under each dimension, determining the feature label of the target user under the dimension according to the data with the maximum weight in each data of the log data.
In an exemplary embodiment of the present application, the determining, in each of the dimensions, a weight of each data in the log data according to a time distribution of the log data includes: under each dimension, counting the log data according to a preset period to obtain a period ordinal number corresponding to each data in the log data; and determining the weight of each data according to the cycle ordinal number corresponding to each data in the log data.
In an exemplary embodiment of the present application, the determining the weight of the log data according to the ordinal number of the preset period corresponding to each data in the log data includes: determining the weight of each data through an exponential function based on the cycle ordinal number corresponding to each data in the log data; wherein, the period ordinal number is the exponent of the exponential function, and the base number of the exponential function is a constant.
In an exemplary embodiment of the present application, the method further comprises: determining the occurrence frequency of each data in the log data; the determining the weight of each data according to the cycle ordinal number corresponding to each data in the log data comprises: for any data D in the log dataiCalculating data D by the following formulaiThe weight of (c): wherein B represents a weight, Si1、Si2、…、SimAs data DiCorresponding cycle number, freq (D)i) As data DiK is an exponential constant in the frequency of occurrence in the log data.
In an exemplary embodiment of the present application, the acquiring log data of the target user in one or more dimensions includes: and acquiring the log data of the target user in the one or more dimensions and within a preset time range.
According to an aspect of the present application, there is provided a user representation generation apparatus comprising: determining a plurality of feature labels corresponding to the target user in each dimension under the one or more dimensions according to the time distribution of the log data; the generating the representation of the target user according to the feature tag comprises: generating the portrait of the target user according to the feature labels corresponding to the dimensions; wherein the tag determination module comprises: the weight determining unit is used for determining the weight of each data in the log data according to the time distribution of the log data under each dimension; and the label processing unit is used for determining the characteristic labels of the target user under the dimensions according to the weight of each datum in the log data.
According to an aspect of the present application, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.
According to an aspect of the application, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the method of any of the above.
The exemplary embodiments of the present application have the following advantageous effects:
the portrait of the target user is generated by acquiring the log data of the target user and determining the feature tag according to the time distribution of the log data. On one hand, in the exemplary embodiment, because the time distribution corresponding to each data may be different, the feature tag is determined according to the time distribution of the log data, so that the generated feature tag can objectively reflect the difference of the user feature data under different time distributions, and a relatively comprehensive user portrait is generated; on the other hand, the user portrait is determined by combining the time distribution, so that the factors considered during the generation of the user portrait can be richer, and the change rule of each data in the log data of the target user can be effectively determined according to the time distribution, so that the feature label closest to the current state is determined, and the accuracy of the user portrait generation is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 schematically illustrates a flow chart of a user representation generation method in the present exemplary embodiment;
FIG. 2 schematically illustrates a sub-flow diagram of a user representation generation method in the present exemplary embodiment;
FIG. 3 schematically illustrates a flow chart of another user representation generation method in the present exemplary embodiment;
FIG. 4 is a block diagram schematically showing a configuration of a user representation generating apparatus in the present exemplary embodiment;
fig. 5 schematically illustrates an electronic device for implementing the above method in the present exemplary embodiment;
fig. 6 schematically illustrates a computer-readable storage medium for implementing the above-described method in the present exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The exemplary embodiment of the present application provides a user portrait generation method, wherein the user portrait refers to tagging user information, and abstracts user full-view, that is, user portrait, by collecting and analyzing various information data of a user. The user representation may be used in big data applications such as personalized recommendation, and the application scenario may be used in various types of applications including financial services, e-commerce, life services, games, social networks, music, and the like, which is not particularly limited in this application.
The exemplary embodiment is further described with reference to fig. 1, and as shown in fig. 1, the user representation generating method may include the following steps S110 to S130:
step S110, acquiring log data of the target user in one or more dimensions.
The target user may be a user who needs to be portrayed, and the log data may be original information data about the target user, such as the age, address, occupation, payment method, or hobby of the target user. The dimensions refer to categories of target user information contained or reflected in the log data, and generally one dimension is one aspect reflecting attributes of the target user, for example, the dimensions may include personal information attributes, credit attributes, consumption feature attributes, social information attributes, or the like. In this exemplary embodiment, the server may obtain the log data of the user terminal in real time, or may obtain the log data from a specific database, for example, HDFS (Distributed File System). In log data, a target user is usually characterized by a unique identifier, such as a user's mobile phone number, App (Application) account, IP address (Internet Protocol address), and the like. For example, after the user a logs in the App, in order to generate a user portrait of the user a, an App account of the user a may be identified, and corresponding log data may be searched in a log database according to the App account. When the log data is acquired, all log data can be acquired so as to comprehensively count the information of the target user, and the log data in a certain time range can be acquired so as to reduce the subsequent data amount required to be processed.
And step S120, determining feature labels of the target user in one or more dimensions according to the time distribution of the log data.
The time distribution of the log data refers to the distribution characteristics of the log data on different time nodes, and can be expressed as the time-value change trend of the log data, and can also be expressed as the occurrence frequency and timeliness of each data in the log data. In the present exemplary embodiment, the frequency of occurrence refers to how frequently each data appears in all log data, the timeliness refers to how far each data is from the current time, and the closer to the current time, the stronger the timeliness of the data is, and generally, the higher the frequency of occurrence and the stronger the timeliness, the more data can represent the current state of the target user. The feature tag refers to data information that can represent the actual information or the current state of the target user most in each dimension, and is generally an abstract summary of the features of the target user. The feature tag may be original data in the log data, such as a receiving address "xx road x number" xx city xx of the target user, or may be keyword or summary information of the log data, such as "xx area" or "xx road", or may be information data calculated based on the log data, such as a monthly average consumption amount of the user, and the like.
In the present exemplary embodiment, the log data may be processed in a preset processing manner according to the time distribution of the log data. For example, the log data may be converted into feature vectors in time sequence, and the feature vectors are input into a previously trained LSTM (Long Short-Term Memory, Long-Term Memory network), and output corresponding feature labels; or drawing the log data into a graph with time and numerical values as coordinates, performing function fitting, and determining corresponding characteristic labels according to fitting results; the log data can be calculated by using a preset calculation formula to obtain corresponding feature labels and the like.
In an exemplary embodiment, time granularity may be determined and log data may be divided according to time distribution of the log data of a target user, the divided log data belong to respective corresponding time granularity, for example, a month is the time granularity, each log data may be divided into corresponding month degrees according to the time distribution of the log data, so that the time distribution of the log data is expressed as a distribution characteristic of the data in different month degrees. And then analyzing the log data of each month, mining the distribution rule of the log data, and further determining the feature tags of the target user in one or more dimensions.
In an exemplary embodiment, the frequency of occurrence and the timeliness of each data in the log data may be counted, for example, the proportion of each data in all data may be counted, as the frequency of occurrence, the time length of each data occurring from the current time distance may be counted as the timeliness, then the weight of each data is obtained by combining the two indexes through calculation methods such as addition, multiplication, averaging, and the like, then the weight of each data is weighted and calculated according to the weight of each data, and the result of the weighted calculation is determined as the feature tag, for example, if the log data is the consumption level of the target user, the calculation result may be used as the feature tag of the target user by performing the weighted calculation on the consumption level under each time distribution. When the log data is text information, it may be subjected to a numerical processing and then subjected to a weighting calculation.
Step S130, generating the portrait of the target user according to the feature tag.
In the present exemplary embodiment, the determined feature labels may be clustered according to different classification criteria to generate a representation of the target user. For example, sorting from the content of the feature tag may include address, purchase category, rating, and the like; the formal classification from feature tags may include short tags (e.g., within 5 characters), long tags (e.g., above 5 characters), and so forth; the formal classification of the feature tags may include english tags, simplified chinese tags, traditional chinese tags, and the like, and the classification may have other criteria, which is not particularly limited in this application.
In an exemplary embodiment, the step S120 may include the following steps:
step S121, determining the weight of each data in the log data according to the time distribution of the log data in each dimension;
and step S122, determining feature labels of the target user under each dimension according to the weight of each datum in the log data.
In order to effectively determine the feature tag of the target user, the log data may be analyzed in the same dimension, for example, the log data indicating the address and the log data of the payment method may be analyzed. In other embodiments, log data of different dimensions may also be analyzed, for example, log data of multiple dimensions such as address, payment method, or hobbies.
Each data in the log data may be specific data in the original information data of the target user, for example, data in the age log data may be specific data information of 28 years old, 30 years old, and the like, or data in the preference log data may be specific data information of basketball, game, running, and the like. The determination of the target user feature tag is influenced by considering that the weights occupied by the data in the log data distributed at different times are different. Therefore, in the present exemplary embodiment, the weight of each data in the log data can be set according to the time distribution of the log data. The data with the closer time distribution in the log data can be set with larger weight, and the data with the farther time distribution can be set with smaller weight; or the weights can be set to be increased in a certain function type for the log data with time distributed in a certain interval, the weights with the time distributed more closely are increased more greatly, and the weights of the log data with the time distributed in the more distant interval can be uniformly set to be smaller. The calculation method of the weight may include various forms, and may be set by a specific calculation formula or function, such as a negative exponential function, a step function, etc., or by human experience, etc.
According to the calculated weight, the feature label of the target user in each dimension can be determined in various ways. In this exemplary embodiment, each data in the log data has a corresponding weight, a preset criterion may be directly set, and the data whose weight reaches the preset criterion is selected as the feature tag, for example, the data whose weight is the largest according to the preset criterion may be set as the feature tag, or after ranking according to the weight, the data ranked three before is used as the feature tag, and the like. A weight mapping table may also be determined to record the mapping between the data and the weights. Through the weight mapping table, the feature labels corresponding to the weights can be determined according to the weight magnitude sequence, and when the weights are subjected to weighted calculation, data corresponding to the calculation result can be searched in the weight mapping table and determined as the feature labels and the like.
Based on the above description, in the present exemplary embodiment, the representation of the target user is generated by acquiring log data of the target user and determining the feature tag from the time distribution of the log data. On one hand, in the exemplary embodiment, because the time distribution corresponding to each data may be different, the feature tag is determined according to the time distribution of the log data, so that the generated feature tag can objectively reflect the difference of the user feature data under different time distributions, and a relatively comprehensive user portrait is generated; on the other hand, the user portrait is determined by combining the time distribution, so that the factors considered during the generation of the user portrait can be richer, and the change rule of each data in the log data of the target user can be effectively determined according to the time distribution, so that the feature label closest to the current state is determined, and the accuracy of the user portrait generation is improved.
In an exemplary embodiment, the step S122 may include:
and under each dimension, determining the characteristic label of the target user under the dimension according to the data with the maximum weight in each data of the log data.
In consideration of the fact that each data in the log data of the target user changes with time, in step S121, the weight of each data may be determined according to the time distribution, and then the feature label of the target user in each dimension may be determined by the weight, so that the generation reliability of the user image is stronger. In the present exemplary embodiment, the data with the largest weight may be determined as the feature tag of the target user in each dimension. For example, when the log data of the target user is the shipping address, if only the shipping address is considered, the target user may have a situation that the shipping address needs to be changed, such as moving home or long-term business trip, as shown in table 1, the shipping address of the "beijing city haizhou district" appears more frequently before 2018-06 by the user a, and the new address of the "beijing city west city district" appears at 2018-06, but the appearance frequency is lower than the "beijing city haizhou district", and if only the appearance frequency is used as the standard for determining the feature tag, the generation of the user portrait is not accurate. Therefore, the present exemplary embodiment may set a higher weight for the receiving address closer in time according to the time distribution of the receiving addresses, wherein the harvesting address with the largest weight may be determined as the feature tag, for example, in table 1, "beijing city western city" may be determined as the feature tag of the target user in the dimension of the receiving address.
TABLE 1
Delivery address | Time to place order | Number of orders | |
User A | Federal Germany Munich Wittelsbach Square No. 2 | 2017-10 | 11 |
User A | Federal Germany Munich Wittelsbach Square No. 2 | 2017-11 | 9 |
User A | Federal Germany Munich Wittelsbach Square No. 2 | 2018-02 | 4 |
User A | 274700 No. 84, East Section, Jinhe Road, Tancheng Town, Tancheng County, Heze City, Shandong Province | 2018-06 | 7 |
User A | 274700 No. 84, East Section, Jinhe Road, Tancheng Town, Tancheng County, Heze City, Shandong Province | 2018-08 | 4 |
In an exemplary embodiment, step S121 may include the steps of:
step S210, under each dimension, carrying out statistics on log data according to a preset period to obtain a period ordinal number corresponding to each data in the log data;
step S220, determining the weight of each data according to the cycle number corresponding to each data in the log data.
The preset period may be a set time granularity of the log data, for example, the preset period may be one week or one month. According to the time distribution of the log data, the log data can be counted into corresponding periods, for example, in table 1, when the preset period is a month, the log data with the receiving address of "beijing city western city district" can be counted into the time periods of 2018-08 and 2018-06 months, the log data with the receiving address of "beijing city hai-ken district" can be counted into the time periods of 2018-02, 2017-11 and 2017-10 months, and the data of other months are not specifically counted. The cycle ordinal is a cycle ordinal obtained by ranking a preset cycle, and may be a reverse order of the preset cycle, or a forward order, for example, when the time cycle is a month, the nearest month may be set as a minimum ordinal, and according to the time sequence, the ordinal of the cycle is larger for the month farther away, or may be sequentially ranked from the month with the farthest time in a certain time interval until the nearest month, for example, in the time interval of one year, when the time cycle is a month, the ordinal of the first month may be set as 1, and the ordinal of the second month may be set as 12, and the like.
In the present exemplary embodiment, the weight of each data of the log data is determined according to the preset cycle number, and there may be various ways. Wherein the weight of the log data may be calculated according to a specific function, such as a negative exponential function; or may be empirically assigned, weighted for each cycle ordinal, etc. For example, a set of six-month cycle numbers is taken, the cycle numbers can be 1-6 from far to near, and each cycle number is assigned with an initial value ai=i,i∈[1,6]Considering that the weight of the nearest month is large, the initial value of the cycle number is inverted, then, the ordinal weight of each period is calculated by the following formula.
In an exemplary embodiment, the weight of the log data may also be determined by:
in consideration of the influence of the temporal distribution of each data in the user log data on the user profile, a time factor is added when calculating the weight of each data. For example, the user may have changed the shipping address by moving, and thus the most recent shipping address may have been updated recently and is used more often later. In order to avoid the problem of feature tag extraction caused by the fact that the previous historical log data has a large weight, the data with the shorter time can be set to have a higher weight, the cycle number of the time is set to be x, the cycle number with the shorter time is smaller than the cycle number with the shorter time, the cycle number with the longer time is larger than the cycle number with the shorter time, the user log data of two years is analyzed, and the weight of each data in the log data can meet the formula:
B(x)>B(x+1); (2)
i.e. the data that is the longer from the present is weighted the less, and the closer the data is the more weighted it is.
Meanwhile, considering the influence of the data which occasionally appears by the user on the weight calculation result, for example, the user accidentally helps others to purchase goods so that the shipping address is changed. When calculating the weight, this should be excluded, and the weight of the latest time period may be limited to be lower than the sum of the weights of the data that continuously appear before, so that the weight of the data satisfies the formula:
even if the weight of each data in the log data in the time period meeting the latest data is less than the sum of the weights of the last t time periods, t depends on the specific characteristics of the log data, for example, when the log data is a receiving address, the setting time is too long, the historical receiving address has no reference meaning, and the time is too short, the judged feature tag may be wrong, so that a moderate tolerance time can be set comprehensively, such as 6 months.
Further, if the same data occurs in the log data during the most recent consecutive time period, the weight of the data may be greater than the weight of the indicator for the T consecutive time periods before. Therefore, for any data continuously appearing in two time periods, the weight of the data is larger than the sum of the weights of the previous time periods, and the degree of influence of the time factors on weight calculation cannot be too small, so that the descending speed of the calculation function of the weight cannot be too slow, namely the weight of the data can satisfy the formula:
in the exemplary embodiment, data of a longer time period exceeding a certain time range may be set, and the weight value may be substantially unchanged, for example, the weight value of the receiving address of 20 months and the weight value of 21 months may be considered to be substantially consistent. Therefore, the above formula (4) may be satisfied only for ordinal numbers of time periods within a certain time range, instead of being satisfied for ordinal numbers of all time periods, and may be satisfied for ordinal numbers of time periods greater than a certain time range by a decreasing function
According to the above description, in an exemplary embodiment, step S320 may include:
determining the weight of each data through an exponential function based on the cycle ordinal number corresponding to each data in the log data; wherein, the period ordinal number is the exponent of the exponential function, and the base number of the exponential function is a constant.
In the present exemplary embodiment, the weight of each data in the log data can be calculated by equation (5):
B(x)=a-bx+c; (5)
that is, the above equations (2), (3) and (4) can be satisfied. The x is a period ordinal, a, b, and c are constant parameters, wherein the period ordinal can be set by user, if each month is set as a time period, the time is ordered from far to near in half a year, the value range of x can be [1, 6], it needs to be stated that the value range is only a schematic description, and a specific value of the period ordinal can be determined according to a time period which needs to be calculated actually, which is not limited by the disclosure.
In an exemplary embodiment, the user representation generation method may further include the steps of:
determining the occurrence frequency of each data in the log data;
step S220 may include:
for any data D in log dataiData D is calculated by the following formula (6)iThe weight of (c):
wherein B represents a weight, Si1、Si2、…、SimAs data DiCorresponding cycle number, freq (D)i) As data DiThe occurrence frequency in the log data, k is an exponential constant, in the present exemplary embodiment, the exponential constant k may be determined by the inequality established by the above equation (3) and equation (4), since equation (3) requires that the descent rate of the function cannot be too large, i.e., the absolute value of the first derivative of the function cannot be too large; formula (4) requires that the decreasing rate of the function cannot be too small, i.e. the absolute value of the first derivative of the function cannot be too small, so that the value of k is determined by the values of T and T under the constraint of inequality. For example, when T is 6 and T is 22, k is 0.66. It should be noted that a value range of a k value can be determined by a constraint condition of an inequality, a final value of the k value is determined based on the value range, and for example, a minimum value in the value range can be determined as the k value.
In order to make the generation of the user portrait more accurate, the data weight may be determined based on the frequency of occurrence and time of each data in the log data. In this exemplary embodiment, the occurrence frequency of each data may be obtained through a plurality of statistical manners, for example, table 2 is an exemplary log data list of a user from 4 to 8 months in 2018, which shows the occurrence condition of each data when the log data is a receiving address, where the statistical method of the occurrence frequency of the data "beijing city sunny region" may be a ratio of the occurrence frequency of the "beijing city sunny region" to data of all the occurring harvesting addresses in 5 months, and is shown as 4/10 in table 2; or, taking a month as a time period, counting the proportion of the time period of the "beijing yang-oriented region" to the total time period, as shown in table 2, the "beijing yang-oriented region" appears in the time periods of 8 months, 7 months, 5 months, and 4 months, and therefore, the frequency of the "beijing yang-oriented region" may be 4/5. Other statistical methods may also be available according to the specific situation of the log data in each time period, which is not specifically limited in the present application.
TABLE 2
Date | Delivery address |
8 month and 20 days 2018 | Federal Germany Munich Wittelsbach Square No. 2 |
8/12/2018 | 274700 No. 84, East Section, Jinhe Road, Tancheng Town, Tancheng County, Heze City, Shandong Province |
8 month and 5 days 2018 | BeijingMarket rising sun region |
7 month and 22 days 2018 | Federal Germany Munich Wittelsbach Square No. 2 |
7 month and 15 days 2018 | Room 2-602, Building 15, Tiantongyuan East 2nd District, Beijing City, 102218 |
6 months and 16 days 2018 | Federal Germany Munich Wittelsbach Square No. 2 |
Year 2018, month 5 and day 28 | 274700 No. 84, East Section, Jinhe Road, Tancheng Town, Tancheng County, Heze City, Shandong Province |
Year 2018, month 5 and day 22 | Room 2-602, Building 15, Tiantongyuan East 2nd District, Beijing City, 102218 |
12 days 4 month in 2018 | 274700 No. 84, East Section, Jinhe Road, Tancheng Town, Tancheng County, Heze City, Shandong Province |
4 month and 2 days 2018 | Room 2-602, Building 15, Tiantongyuan East 2nd District, Beijing City, 102218 |
In the formula (6), Si1、Si2、…、SimRepresenting data DiCorresponding cycle ordinal number, in the present exemplary embodiment, a time interval may be set, and according to the distance of time, the cycle ordinal number closest to the current time is the smallest, and the cycle ordinal number is larger the longer the time is. Therefore, the data weight of the log data with the closer time distribution is heavy, the data weight of the log data with the farther time distribution is light, and in addition, considering the influence of the frequency of occurrence of the data, if the time distribution of a certain data is not very close to the current time but has very high frequency of occurrence, the weight of the data can also be higher, so that the formula (6) has the advantages that the weight is higher, and the data is more important and more importantThe frequency freq (D) of occurrence of each data in the log data is countedi) The final calculation result of each data weight is adjusted as a coefficient.
In an exemplary embodiment, step S110 may include: and acquiring log data of the target user in one or more dimensions and within a preset time range.
In the present exemplary embodiment, when log data of a target user is acquired, data filtering may be performed on the log data. The preset time may be a time range for filtering the log data, for example, if the preset time is set to 12 months, log data of approximately 12 months may be obtained from a large amount of log data and aggregated to generate a user portrait, and the preset time may filter the log data. The preset time may be set according to different dimensions, for example, in an address dimension, the preset time may be set to be longer (for example, 24 months) in consideration of a low frequency of changing the address by the user; or in the interest dimension, considering that the user is influenced by the network family and the like, the change frequency is higher, so that the preset time can be set to be shorter (for example, 6 months)
Fig. 3 schematically shows a flowchart of another user portrait generation method in this exemplary embodiment, which includes first performing step S310 to obtain log data of a target user, then performing step S320 to obtain log data of the target user in one or more dimensions, then performing step S330 of data filtering on the log data to determine the log data within a preset time range, then determining a weight of each data in the log data of the target user in one or more dimensions according to a time distribution of the log data through step S340, and finally performing step S350 to determine a feature tag according to the weight of each data, thereby completing generation of a user portrait.
Exemplary embodiments of the present application also provide a user representation generating apparatus. Referring to FIG. 4, the apparatus 400 may include a data acquisition module 410, a label determination module 420, and a representation generation module 430. The data acquisition module 410 is configured to acquire log data of a target user in one or more dimensions; the label determination module 420 is configured to determine feature labels of the target user in one or more dimensions according to the time distribution of the log data; the portrait generation module 430 is used for generating a portrait of the target user according to the feature tag; the tag determination module 420 may include: a weight determining unit 421, configured to determine, in each dimension, a weight of each data in the log data according to a time distribution of the log data; and the label processing unit 422 is configured to determine feature labels of the target user in each dimension according to the weight of each data in the log data.
In this exemplary embodiment, the tag determination module may be configured to determine, in each dimension, a feature tag of the target user in the dimension according to the data with the largest weight in each data of the log data.
In the present exemplary embodiment, the weight determination unit may include: the period counting subunit is used for counting the log data according to a preset period under each dimension so as to obtain a period ordinal number corresponding to each data in the log data; and the weight determining subunit is used for determining the weight of each data according to the cycle ordinal number corresponding to each data in the log data.
In this exemplary embodiment, the weight determining unit may be configured to determine the weight of each data through an exponential function based on the cycle number corresponding to each data in the log data; wherein, the period ordinal number is the exponent of the exponential function, and the base number of the exponential function is a constant.
In the present exemplary embodiment, the user representation generating apparatus may further include: the frequency determining unit is used for determining the frequency of occurrence of each data in the log data; the weight determination unit may be configured to determine the weight of any data D in the log dataiCalculating data D by the following formulaiThe weight of (c): wherein B represents a weight, Si1、Si2、…、SimAs data DiCorresponding cycle number, freq (D)i) As data DiK is an exponential constant, the frequency of occurrence in the log data.
In the exemplary embodiment, the data obtaining module may be configured to obtain log data of the target user in one or more dimensions and within a preset time range.
The specific details of each module/unit have been described in detail in the corresponding method embodiment, and therefore are not described herein again.
Exemplary embodiments of the present application also provide an electronic device capable of implementing the above method.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to such an exemplary embodiment of the present application is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, a bus 530 connecting various system components (including the memory unit 520 and the processing unit 510), and a display unit 540.
Where the storage unit stores program code, the program code may be executed by the processing unit 510 such that the processing unit 510 performs the steps according to various exemplary embodiments of the present application described in the above-mentioned "exemplary methods" section of the present specification. For example, the processing unit 510 may execute steps S110 to S130 shown in fig. 1, or may execute steps S210 to S220 shown in fig. 2, or the like.
The storage unit 520 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)521 and/or a cache memory unit 522, and may further include a read only memory unit (ROM) 523.
The storage unit 520 may also include a program/utility 524 having a set (at least one) of program modules 525, such program modules 525 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 500 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. As shown, the network adapter 560 communicates with the other modules of the electronic device 500 over the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present application.
Exemplary embodiments of the present application also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the present application may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present application described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.
Referring to fig. 6, a program product 600 for implementing the above method according to an exemplary embodiment of the present application is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present application, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, according to exemplary embodiments of the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Claims (9)
1. A user representation generation method, comprising:
acquiring log data of a target user in one or more dimensions;
determining feature labels of the target user in the one or more dimensions according to the time distribution of the log data;
generating a portrait of the target user according to the feature tag;
wherein the determining the feature labels of the target user in the one or more dimensions according to the time distribution of the log data comprises:
under each dimensionality, determining the weight of each data in the log data according to the time distribution of the log data;
and determining the feature labels of the target user under the dimensions according to the weight of each datum in the log data.
2. The method of claim 1, wherein determining feature labels of the target user in the dimensions according to the weight of each data in the log data comprises:
and under each dimension, determining the feature label of the target user under the dimension according to the data with the maximum weight in each data of the log data.
3. The method of claim 1, wherein determining the weight of each data in the log data according to the time distribution of the log data in each dimension comprises:
under each dimension, counting the log data according to a preset period to obtain a period ordinal number corresponding to each data in the log data;
and determining the weight of each data according to the cycle ordinal number corresponding to each data in the log data.
4. The method of claim 3, wherein the determining the weight of the log data according to the ordinal number of the preset period corresponding to each data in the log data comprises:
determining the weight of each data through an exponential function based on the cycle ordinal number corresponding to each data in the log data; wherein, the period ordinal number is the exponent of the exponential function, and the base number of the exponential function is a constant.
5. The method of claim 3, further comprising:
determining the occurrence frequency of each data in the log data;
the determining the weight of each data according to the cycle ordinal number corresponding to each data in the log data comprises:
for any data D in the log dataiBy passing throughThe following formula calculates data DiThe weight of (c):
wherein B represents a weight, Si1、Si2、…、SimAs data DiCorresponding cycle number, freq (D)i) As data DiK is an exponential constant in the frequency of occurrence in the log data.
6. The method of claim 1, wherein obtaining log data of the target user in one or more dimensions comprises:
and acquiring log data of the target user in the one or more dimensions and within a preset time range.
7. A user representation generation apparatus, comprising:
the data acquisition module is used for acquiring log data of a target user in one or more dimensions;
and the label determining module is used for determining the characteristic labels of the target user under the one or more dimensions according to the time distribution of the log data.
The portrait generation module is used for generating the portrait of the target user according to the feature tag;
wherein the tag determination module comprises:
the weight determining unit is used for determining the weight of each data in the log data according to the time distribution of the log data under each dimension;
and the label processing unit is used for determining the characteristic labels of the target user under the dimensions according to the weight of each datum in the log data.
8. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the method of any of claims 1-6 via execution of the executable instructions.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910940066.1A CN112287208B (en) | 2019-09-30 | 2019-09-30 | User portrait generation method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910940066.1A CN112287208B (en) | 2019-09-30 | 2019-09-30 | User portrait generation method, device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112287208A true CN112287208A (en) | 2021-01-29 |
CN112287208B CN112287208B (en) | 2024-03-01 |
Family
ID=74418878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910940066.1A Active CN112287208B (en) | 2019-09-30 | 2019-09-30 | User portrait generation method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112287208B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113035317A (en) * | 2021-03-16 | 2021-06-25 | 北京懿医云科技有限公司 | User portrait generation method and device, storage medium and electronic equipment |
CN113051914A (en) * | 2021-04-09 | 2021-06-29 | 淮阴工学院 | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005587A (en) * | 2015-06-26 | 2015-10-28 | 深圳市腾讯计算机系统有限公司 | User portrait updating method, apparatus and system |
CN105608171A (en) * | 2015-12-22 | 2016-05-25 | 青岛海贝易通信息技术有限公司 | User portrait construction method |
CN106446045A (en) * | 2016-08-31 | 2017-02-22 | 上海交通大学 | Method and system for building user portrait based on conversation interaction |
WO2017092444A1 (en) * | 2015-12-02 | 2017-06-08 | 中兴通讯股份有限公司 | Log data mining method and system based on hadoop |
CN108154401A (en) * | 2018-01-15 | 2018-06-12 | 网易无尾熊(杭州)科技有限公司 | User's portrait depicting method, device, medium and computing device |
CN109063059A (en) * | 2018-07-20 | 2018-12-21 | 腾讯科技(深圳)有限公司 | User behaviors log processing method, device and electronic equipment |
CN109767300A (en) * | 2019-01-14 | 2019-05-17 | 博拉网络股份有限公司 | Big data portrait and model building method based on user's habit |
-
2019
- 2019-09-30 CN CN201910940066.1A patent/CN112287208B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005587A (en) * | 2015-06-26 | 2015-10-28 | 深圳市腾讯计算机系统有限公司 | User portrait updating method, apparatus and system |
WO2017092444A1 (en) * | 2015-12-02 | 2017-06-08 | 中兴通讯股份有限公司 | Log data mining method and system based on hadoop |
CN105608171A (en) * | 2015-12-22 | 2016-05-25 | 青岛海贝易通信息技术有限公司 | User portrait construction method |
CN106446045A (en) * | 2016-08-31 | 2017-02-22 | 上海交通大学 | Method and system for building user portrait based on conversation interaction |
CN108154401A (en) * | 2018-01-15 | 2018-06-12 | 网易无尾熊(杭州)科技有限公司 | User's portrait depicting method, device, medium and computing device |
CN109063059A (en) * | 2018-07-20 | 2018-12-21 | 腾讯科技(深圳)有限公司 | User behaviors log processing method, device and electronic equipment |
CN109767300A (en) * | 2019-01-14 | 2019-05-17 | 博拉网络股份有限公司 | Big data portrait and model building method based on user's habit |
Non-Patent Citations (1)
Title |
---|
李佳慧 等: "基于大数据的电子商务用户画像构建研究", 电子商务 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113035317A (en) * | 2021-03-16 | 2021-06-25 | 北京懿医云科技有限公司 | User portrait generation method and device, storage medium and electronic equipment |
WO2022193676A1 (en) * | 2021-03-16 | 2022-09-22 | 北京懿医云科技有限公司 | Method and apparatus for generating user portrait, and storage medium and electronic device |
CN113051914A (en) * | 2021-04-09 | 2021-06-29 | 淮阴工学院 | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait |
Also Published As
Publication number | Publication date |
---|---|
CN112287208B (en) | 2024-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110503206A (en) | A kind of prediction model update method, device, equipment and readable medium | |
WO2020150611A1 (en) | Systems and methods for entity performance and risk scoring | |
CN109190028A (en) | Activity recommendation method, apparatus, electronic equipment and storage medium based on big data | |
CN112070564B (en) | Advertisement pulling method, device and system and electronic equipment | |
CN110866698A (en) | Device for assessing service score of service provider | |
CN113344660A (en) | House source information processing method and device, electronic equipment and storage medium | |
CN112287208B (en) | User portrait generation method, device, electronic equipment and storage medium | |
CN109523296B (en) | User behavior probability analysis method and device, electronic equipment and storage medium | |
CN110570271A (en) | information recommendation method and device, electronic equipment and readable storage medium | |
JP2014222474A (en) | Information processor, method and program | |
CN112950359A (en) | User identification method and device | |
CN113778979A (en) | Method and device for determining live broadcast click rate | |
CN113554448A (en) | User loss prediction method and device and electronic equipment | |
WO2020150597A1 (en) | Systems and methods for entity performance and risk scoring | |
CN109460778B (en) | Activity evaluation method, activity evaluation device, electronic equipment and storage medium | |
CN114862479A (en) | Information pushing method and device, electronic equipment and medium | |
CN114925275A (en) | Product recommendation method and device, computer equipment and storage medium | |
CN114036391A (en) | Data pushing method and device, electronic equipment and storage medium | |
CN113592558A (en) | Information processing method and device | |
CN113326436A (en) | Method and device for determining recommended resources, electronic equipment and storage medium | |
CN114493132A (en) | Resource allocation method and device and electronic equipment | |
CN112989276A (en) | Evaluation method and device of information push system | |
CN113111251A (en) | Project recommendation method, device and system | |
CN113469374B (en) | Data prediction method, device, equipment and medium | |
CN110874386A (en) | Method and device for establishing category mapping relation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |