CN117009654A - User portrait construction method, device, equipment and storage medium - Google Patents

User portrait construction method, device, equipment and storage medium Download PDF

Info

Publication number
CN117009654A
CN117009654A CN202310833638.2A CN202310833638A CN117009654A CN 117009654 A CN117009654 A CN 117009654A CN 202310833638 A CN202310833638 A CN 202310833638A CN 117009654 A CN117009654 A CN 117009654A
Authority
CN
China
Prior art keywords
user
behavior
day
date
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310833638.2A
Other languages
Chinese (zh)
Inventor
张军前
刘霄
柳忠松
桂祖宏
彭彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202310833638.2A priority Critical patent/CN117009654A/en
Publication of CN117009654A publication Critical patent/CN117009654A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a user portrait construction method, a device, equipment and a storage medium, which are used for constructing a target user portrait by responding to a user portrait construction instruction to acquire target behavior data from a behavior decay table of user behaviors which are uniformly spread in N time periods, so that the efficiency of processing the historical gross behaviors is ensured, the total construction time of the user portrait is shortened, and the accuracy of the constructed user portrait is improved.

Description

User portrait construction method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a user portrait construction method, a device, equipment and a storage medium.
Background
The user portraits refer to virtual representatives of real users, are label information built on a series of attribute data, and can promote service product optimization, service personalized recommendation and the like according to the user portraits.
In the prior art, user portrait calculation is generally performed by taking behaviors in a user history preset day, if the history behaviors are too long in time period, the calculation efficiency is low, and resources are occupied more, so that the calculation time is too long, and the service requirements cannot be met; in addition, the processing of the reflow user under the scheme can cause insufficient coverage of the user or excessive weight of the history preference in the user portrait, so that the accurate user portrait can not be obtained.
Disclosure of Invention
Based on the above, the invention provides a user portrait construction method, a device, equipment and a storage medium, which can acquire target behavior data from a behavior decay table of user behaviors in N time periods in a uniformly-spreading manner in response to a user portrait construction instruction so as to construct a target user portrait, thereby ensuring the efficiency in the process of processing the historical full-quantity behaviors, shortening the total construction time of the user portrait and improving the accuracy of the constructed user portrait.
In order to achieve the above object, an embodiment of the present invention provides a user portrait construction method, including:
responding to the user portrait construction instruction, and acquiring target behavior data according to a target user identifier;
constructing a target user portrait according to the target behavior data; the target behavior data are data obtained from a last N-day partition of a behavior attenuation table according to the target user identification, and the current day total user behavior data of current day active users of the set date and the user history behavior data of current day total users to be calculated, which are obtained through attenuation processing, are recorded in the set-date partition of the behavior attenuation table; in a zone of a set date of the behavior decay table, the access date of the full-quantity user to be calculated on the same day is set as the set date; in the zone of the set date of the behavior decay table, the total number of users to be calculated on the same day includes the active users on the same day and users with original access date being N days before the set date.
As an improvement of the above-described scheme, the behavior decay table is constructed by:
acquiring active users on the same day and inactive users within N days of a set date, and combining to obtain the total number of users to be calculated on the same day; wherein, the inactive users in the N days are users with access dates which are set to be the date N days before;
acquiring the current day full-dose user behavior data of the current day active user;
inquiring a behavior attenuation table according to the access date of each user of the total users to be calculated on the same day, acquiring the total historical behavior of the total users to be calculated on the same day, and carrying out attenuation processing on the total historical behavior to obtain user historical behavior data;
combining the current day full-volume user behavior data with the user historical behavior data to obtain current day full-volume behavior data to be calculated;
setting the access date of the total number of users to be calculated on the same day as the set date;
and writing the total behavior data to be calculated on the same day into a current day partition of the behavior decay table on the set date.
As an improvement of the above scheme, the inactive users within N days are obtained by:
inquiring the partition data of the previous day of the set date of the user mapping table, and acquiring the users with access dates N days before the set date to obtain inactive users within N days; the user mapping table records user identification information and access date associated with the user identification information;
The step of carrying out attenuation processing on the total historical behaviors to obtain user historical behavior data comprises the following steps: calculating the attenuation coefficient of each historical behavior in the full-scale historical behaviors according to the date difference value of the access date and the current day to obtain user historical behavior data;
the method further comprises:
and merging the total number of users to be calculated on the same day with the data of the previous day partition of the set date of the user mapping table, and writing the data into the current day partition of the user mapping table on the set date.
As an improvement of the above solution, the obtaining target behavior data according to the target user identifier includes:
and determining the partition which records the behavior data of the target user identifier and has the date closest to the current N-day partition of the behavior decay table according to the target user identifier, and screening the behavior data associated with the target user identifier from the partition to serve as target behavior data.
As an improvement of the above-described aspect, the calculating the decay coefficient of each of the total historical behaviors from the date difference of the access date and the current day includes:
calculating the attenuation coefficient of each historical behavior in the full amount of historical behaviors by the following formula:
decay_ratio Day of the day =f(x)*decay_ratio Last time
f(x)=exp(-1*decayRatio*x);
Wherein, the decay_ratio Day of the day Coefficient index representing the day, decay_ratio Last time The coefficient index representing the last time, f (x) representing the decay index, decayRatio representing a preset constant, x representing the date difference between the date of visit and the current day associated with the historical behavior.
As an improvement of the above solution, the obtaining the current day full-dose user behavior data of the current day active user includes:
acquiring current day user behavior original data of a current active user from a behavior detail table; wherein, each behavior original data in the current day user behavior original data at least comprises the user identification information, behavior event type and behavior event source identification;
and classifying and combining each piece of behavior original data in the current day user behavior original data according to the user identification information, the behavior event type and the behavior event source identification to obtain the current day total user behavior data of the current day active user.
As an improvement of the scheme, the user identification information comprises a user mobile phone number and a user equipment number;
classifying and combining each behavior original data in the current day user behavior original data according to the user identification information, the behavior event type and the behavior event source identification to obtain current day total user behavior data of the current day active user, and further comprising:
Associating the user mapping table, and carrying out information complementation on the user behavior lacking the user mobile phone number in the current total user behavior according to the user equipment number;
and classifying and combining each row of data in the full-time user behavior data of the current day after the information is completed so as to update the full-time user behavior data of the current day.
As an improvement of the scheme, when the attenuation coefficient of the current total-day user behavior data has behavior data with the value larger than a preset threshold, an exponential function is adopted to restrict the behavior data so that the attenuation coefficient falls within the preset threshold.
As an improvement of the above scheme, the user identification information includes a user mobile phone number and a user equipment number, and the user mapping table also records an access type associated with the user identification information; the access type is regular login, anonymity, copy or first login;
when the historical behavior of the user with the access type of the regular login or the copy is obtained from the behavior decay table, historical behavior obtaining is carried out according to the mobile phone number of the user;
and when the historical behavior of the user with the access type being anonymous or first logged-in is obtained from the behavior decay table, historical behavior obtaining is carried out according to the user equipment number.
In order to achieve the above object, an embodiment of the present invention further provides a user portrait construction device, including:
the behavior data acquisition module is used for responding to the user portrait construction instruction and acquiring target behavior data according to the target user identification;
the user portrait construction module is used for constructing a target user portrait according to the target behavior data; the target behavior data are data obtained from a last N-day partition of a behavior attenuation table according to the target user identification, and the current day total user behavior data of current day active users of the set date and the user history behavior data of current day total users to be calculated, which are obtained through attenuation processing, are recorded in the set-date partition of the behavior attenuation table; in a zone of a set date of the behavior decay table, the access date of the full-quantity user to be calculated on the same day is set as the set date; in the zone of the set date of the behavior decay table, the total number of users to be calculated on the same day includes the active users on the same day and users with original access date being N days before the set date.
To achieve the above object, an embodiment of the present invention further provides a user portrait construction device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement the user portrait construction method according to any one of the embodiments.
To achieve the above object, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, a device where the computer readable storage medium is controlled to execute the user portrait construction method according to any one of the embodiments.
Compared with the prior art, the user portrait construction method, the device, the equipment and the storage medium disclosed by the embodiment of the invention acquire target behavior data according to the target user identification by responding to the user portrait construction instruction so as to be used for constructing the target user portrait; the target behavior data are data obtained from a last N-day partition of a behavior attenuation table according to the target user identification, and the current day total user behavior data of current day active users of the set date and the user history behavior data of current day total users to be calculated, which are obtained through attenuation processing, are recorded in the set-date partition of the behavior attenuation table; in a zone of a set date of the behavior decay table, the access date of the full-quantity user to be calculated on the same day is set as the set date; in the zone of the set date of the behavior decay table, the total number of users to be calculated on the same day includes the active users on the same day and users with original access date being N days before the set date. Therefore, the embodiment of the invention uniformly spreads the user behaviors in N time periods in a window mode, carries out historical preference attenuation according to the user activity, ensures the efficiency when processing the historical full-quantity behaviors, shortens the total construction time of the user portrait, and improves the accuracy of the constructed user portrait.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a user portrait construction method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a user portrait construction flow according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating updating of a user mapping table according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a user history handling according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a current day user behavior process according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a table of user portrait computing correlations according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a flowchart of a user portrait construction method according to an embodiment of the present invention is shown.
Specifically, the user portrait construction method includes steps S1-S2:
s1, responding to a user portrait construction instruction, and acquiring target behavior data according to a target user identifier;
s2, constructing a target user portrait according to the target behavior data; the target behavior data are data obtained from a last N-day partition of a behavior attenuation table according to the target user identification, and the current day total user behavior data of current day active users of the set date and the user history behavior data of current day total users to be calculated, which are obtained through attenuation processing, are recorded in the set-date partition of the behavior attenuation table; in a zone of a set date of the behavior decay table, the access date of the full-quantity user to be calculated on the same day is set as the set date; in the zone of the set date of the behavior decay table, the total number of users to be calculated on the same day includes the active users on the same day and users with original access date being N days before the set date.
According to the embodiment of the invention, the user behaviors are uniformly spread in N time periods in a window mode, and the historical preference attenuation is carried out according to the user activity, so that the efficiency of processing the historical full-quantity behaviors is ensured, the total construction time of the user portrait is shortened, and the accuracy of the constructed user portrait is improved.
In a preferred embodiment, the behavior decay table is constructed by:
acquiring active users on the same day and inactive users within N days of a set date, and combining to obtain the total number of users to be calculated on the same day; wherein, the inactive users in the N days are users with access dates which are set to be the date N days before;
acquiring the current day full-dose user behavior data of the current day active user;
inquiring a behavior attenuation table according to the access date of each user of the total users to be calculated on the same day, acquiring the total historical behavior of the total users to be calculated on the same day, and carrying out attenuation processing on the total historical behavior to obtain user historical behavior data;
combining the current day full-volume user behavior data with the user historical behavior data to obtain current day full-volume behavior data to be calculated;
setting the access date of the total number of users to be calculated on the same day as the set date;
and writing the total behavior data to be calculated on the same day into a current day partition of the behavior decay table on the set date.
The embodiment of the invention can uniformly spread the user behaviors in N time periods in a window mode by combining, associating, attenuating and backtracking the user behaviors, and attenuating the historical preference according to the user liveness, thereby ensuring the efficiency when processing the historical full-quantity behaviors, shortening the total construction time of the user portraits and improving the accuracy of the constructed user portraits.
In a preferred embodiment, the N days inactive users are obtained by:
inquiring the partition data of the previous day of the set date of the user mapping table, and acquiring the users with access dates N days before the set date to obtain inactive users within N days; the user mapping table records user identification information and access date associated with the user identification information;
the step of carrying out attenuation processing on the total historical behaviors to obtain user historical behavior data comprises the following steps: calculating the attenuation coefficient of each historical behavior in the full-scale historical behaviors according to the date difference value of the access date and the current day to obtain user historical behavior data;
the method further comprises:
and merging the total number of users to be calculated on the same day with the data of the previous day partition of the set date of the user mapping table, and writing the data into the current day partition of the user mapping table on the set date.
Specifically, the embodiment of the invention relates to data interaction of a plurality of data tables, wherein the data tables mainly comprise a content portrait table, a behavior detail table, a user mapping table, a behavior decay table and a user portrait table. The following will briefly introduce the following data tables:
Content representation table: tag information corresponding to each piece of content is recorded, the primary key ID is the content ID, other fields are tag information of the content, the author, the style and the like, and the tag information is generated by other systems.
Behavior details table: the user behavior original data are recorded, the table is divided into sections according to the day, the total amount of behaviors of the user on the same day are recorded under each section, the behavior data including date, user information, behavior type, content ID and the like are recorded, and the behavior data are generated through reporting of a buried point system.
User mapping table: the updated total mapping information is recorded every day according to the daily partition, and important fields include a dayid (for example, 20220629, 20220630 are the table partition fields), a user phone number (phone_num), a user equipment number (client_id), a user access date (visual_day) and the like.
Behavior decay table: all activities of the active user on the same day are recorded every day according to the daily partition, and important fields include dayid (for example 20220629 and 20220630 are the table partition fields), user mobile phone number, user equipment number, activity type, content ID, activity coefficient and the like.
The user portrait table is divided into regions according to the day, and the portrait corresponding to the user on the same day is recorded every day, and important fields include dayid (for example, 20220629 and 20220630 are the region fields of the table), user equipment number, user preference information and the like.
For example, referring to a schematic diagram of a user portrait construction flow shown in fig. 2, the flow is as follows:
the first step, inquiring a behavior detail table to obtain the active users on the same day, wherein the active users comprise information such as mobile phone numbers, equipment numbers and access time of the users.
Secondly, inquiring yesterday partition data of a user mapping table, acquiring users with access dates N days ago, and acquiring inactive users in N days, wherein if the users are in N days calculated for the first time, the result of the step is null;
the main purpose of the step is to uniformly spread all users within N days by taking N as a period, establish an index relation of a partition where the historical behavior is located for the users, and then only need to search within the partition of N days near the behavior attenuation table when inquiring the historical behavior of the users, so that the whole table scanning is not needed, and the inquiring efficiency is greatly improved.
And thirdly, combining the calculation results of the first step and the second step to obtain the total number of users to be calculated in the same day, and when the total number of users are combined, identifying and distinguishing according to the types of the users to distinguish the mode of inquiring historical behaviors.
Fourthly, marking the access date of the total number of users to be calculated on the same day as the current day, combining the access date with the total number of users on the same yesterday of the mapping table, and writing the access date into a current day partition, so that the updating of the mapping table of the users is completed;
For example, referring to a user mapping table update diagram shown in fig. 3, the table field main fields are as follows:
phone_num: the mobile phone number of the user, the value is given after the user logs in, and the logged-in user uses the field as a mark;
client_id: the device ID, whether logging in or not, has a value, and the anonymous user uses the field as an identification;
visit_day: the last access date of the user is mainly used for inquiring the historical behavior of the user;
user_ffag: the access type marks each type of user per day, 1 non-first login (regular login), 2 anonymity, 3 copies, 4 first logins.
As can be seen from fig. 3, the current day is 20220630, the total mapping of 20220629 partitions in the user mapping table is combined with the current day active user mapping obtained from the behavior detail table, and in the sample data shown in fig. 3, the current day data and the history data are combined at the time of calculation, and the combination basic rule is as follows: only one mapping relation between the same phone_num and the same client_id is reserved, and the value of the visual_day is the maximum value.
Refinement rules exemplify:
the customer 0001 user can obtain the list of actions, i.e. the user has accessed 20220630, has no history access information and has a mobile phone number, so that the visit_day is 20220630 and the user_ffag is 1;
The user of claim 0002 can obtain the history data in the behavior detail table, namely, the history data is accessed by 20220630, and the mobile phone number exists, so that the visual_day is the maximum 20220630, and the user_ffag is 1;
the user of claim 0003 can obtain in the behavior detail table, i.e. accessed at 20220630, and has no phone number in the history data, indicating that logging in for the first time at 20220630, so the visit_day is maximum 20220630 and the user_ffag is 4;
the user of claim 0004 can obtain the behavior detail table, namely, the user has no history data and no mobile phone number after accessing at 20220630, so that the visit_day is the maximum 20220630 and the user_ffag is 2;
the users of the clients 0005 and 0006 do not exist in the behavior detail table, namely, the user does not access the data at 20220630 and directly copies the historical data;
the user of friendly 0007 is not active for 30 days, in this example, the time period N is 30 days, so the visual_day is marked 20220630 and the user_ffag is marked as 3 when the data is copied. When the historical behaviors are calculated on the same day (namely 20220630), the behaviors of the users are attenuated by a small amplitude and recorded into 20220630 partitions of a behavior attenuation table; the visual_day of the table represents which partition the latest behavior of the user is in, so that the historical behavior of the user can be directly checked according to the visual_day when the related data of the user is calculated next time, and in this way, the inactive users in 30 days are copied and attenuated again during daily calculation, so that the historical behavior of the whole users is ensured to be affirmed in the partition of the behavior attenuation table for 30 days, and the whole table scanning is avoided;
Through the above processing, the following functions are realized:
the mapping relation of the total users is established, and for anonymous behaviors without mobile phone numbers, a user mapping table can be associated, so that the mobile phone numbers are complemented;
recording the last visit date of the user on and before the current day every day for backtracking the historical behavior;
for users which are not active for more than N days, updating the access date of the users to be the same day, and controlling the scope of backtracking historical data;
different logics are made when historical behaviors and decays are processed through user identification, and detailed user behavior processing is seen.
Fifthly, inquiring the behavior detail table by taking the date as a condition to obtain the total user behavior data of the same day, merging the behavior data according to the information of the user, the program, the behavior type and the like, and obtaining the data of the user behavior of the same day by associating the user mapping table and supplementing the mobile phone number to the behavior without the mobile phone number.
The step performs data combination on the dimensions of the user, the program and the behavior, and performs secondary combination after the mobile phone number information is supplemented by the related user mapping data, so that the user behavior data volume participating in calculation every day is greatly reduced.
Step six, using the current day obtained in the step three to calculate the total amount of users, inquiring a behavior attenuation table, and obtaining the total amount of historical behaviors of the users according to the last access date of the users; calculating an attenuation coefficient according to the date difference value between the last visit date of the user and the current date to obtain the historical behavior data of the user which finally participates in calculation;
According to the method, different attenuation coefficients are designated according to the access time difference, so that the distinction treatment of high-activity users and low-activity users is realized, and the history preference is reserved for the low-activity users as much as possible.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating a user history behavior processing according to an embodiment of the present invention. The behavior decay table field is mainly as follows:
phone_num: the mobile phone number of the user records the mobile phone number of each behavior event and can be null;
client_id: a user equipment number, recording the equipment ID of each behavior event, and being unable to be empty;
event_id: the behavior event type is used for recording the type of the behavior event and distributing different weights;
source_id: recording the resource ID of each behavior event when the behavior event is generated, and enabling the resource ID not to be empty;
decay_ratio: the attenuation coefficient is used for recording the coefficient of each behavior, and the longer the behavior is, the lower the coefficient is;
the attenuation logic is illustrated below:
in the full-volume mapping relation of the user mapping table 20220630, the data with the visual_day of 20220630 is associated with the full-volume mapping relation data of the user mapping table 20220629, so as to obtain the last access date. After the user accesses every day, all the behaviors are recorded to the current day partition during calculation, so that the historical behaviors can be obtained by inquiring the corresponding partition of the behavior attenuation table according to the last access date of the user:
Firstly, inquiring a yesterday full-size map of a user (including a copy user) with the access date of the user in the full-size map of the same day to obtain a mapping relation and the last access time;
and then according to the access time, inquiring to obtain the history of the user on the same day:
for users with the user_ffag of 1 and 3, acquiring historical behaviors by using phone_num;
for a user with the user_ffag of 2, acquiring historical behaviors by using the client_id;
for a user with the user_ffag of 4, acquiring historical behaviors by using the client_id and associating the historical behaviors with the behaviors;
and finally, carrying out attenuation to different degrees on the queried historical behaviors according to the difference value between the date and the current date, wherein the larger the difference value is, the less active the user is, and the smaller the attenuation amplitude is.
Attenuation function: f (x) =exp (-1 x decayratio x)
Wherein, decayRatio is a configurable item, and the attenuation speed is adjusted; x is the number of days of difference;
in fig. 4, the user_ffag is 3, which is a user who has not accessed for 30 days (i.e. has not participated in a calculation within 30 days), and we record the visit_day in the mapping relation of the user mapping table 20220630 as 20220630, so as to generate "fake access", and when calculating, it is required to copy the behavior data of the user before 30 days (i.e. 20220531), make a copy, make a small attenuation, and record the copy into the 20220630 partition. After such processing, the user mapping table visit_day is 20220630, and the historical behavior data of the user mapping table visit_day is also in the 20220630 partition of the behavior attenuation table, so that in the next 30 days, if the user accesses, the user can query the data in the 30-day partition, and the user can certainly query the historical behavior of the user; if the user has not accessed for 30 days, we continue copying and attenuating.
The method is applicable to all users, so that when the historical behaviors of the users are queried every day, the historical behaviors in the interval of N days (30 days in the example) can be used for covering the total number of users in the terminal, the whole-table scanning is not needed, and the calculation efficiency is greatly improved.
Seventh, combining the calculation results of the fifth step and the sixth step to obtain the total behavior data to be calculated in the same day, and properly restricting the oversized combined coefficient;
the method comprises the steps of combining the current day behaviors of a user with the historical behaviors of the user to obtain the total behaviors to be calculated in the current day, combining the behaviors, and for the combined behaviors with overlarge coefficients, using an exponential function to make certain constraint so as to ensure that the combined behaviors are in a reasonable range; the combined current day behavior is then recorded in the current day partition (20220630 partition, for example) of the behavior decay table, and the purpose of the write-back is to query the partition for its historical behavior for users with a visual_day of 20220630 in the subsequent daily calculations. By adopting the user mapping table and the behavior attenuation table, the behavior of the total users in the terminal is constrained in the N-day partition of the behavior attenuation table, and the same kind of behavior of the users on the same program is combined into one bar, so that the problems of excessive partition and excessive data volume of the data to be queried are solved when historical data are queried, and the query efficiency is greatly improved.
And eighth, writing all the behavior data to be calculated in the seventh step in the current day partition, and for a certain user, if no activity is obtained in the following N days, inquiring all the historical behaviors in the current written data partition when the user participates in calculation after N days. So far, the current day behavior and the inquiry, merging and write-back of the historical behavior of the user are all completed. It should be noted that "yesterday" and "N days ago" are referred to herein as "current day" such as when constructing the partition of 20220630, "current day" refers to 20220630, and "yesterday" refers to 20220629;
the steps are linked with the second step and the fourth step, so that the total user behaviors are uniformly spread in the time period of N days, the historical behaviors of the user can be certainly inquired in the N-day partitioned data, the whole-table scanning is avoided, and the execution efficiency is improved.
And ninth, using the total amount of behavior data to be calculated on the same day in the seventh step, associating the content portrait table, and calculating the user portrait for each user according to the behaviors of the user.
In a preferred embodiment, the acquiring the target behavior data according to the target user identifier includes:
and determining the partition which records the behavior data of the target user identifier and has the date closest to the current N-day partition of the behavior decay table according to the target user identifier, and screening the behavior data associated with the target user identifier from the partition to serve as target behavior data.
It can be understood that, since the daily partition data of the user mapping table is queried every day to determine the inactive users in N days, and the active users in the same day are combined, the behavior data of the users are generated according to the behavior data of the active users in order to record the current day partition of the relevant data table, so that the total user behaviors are uniformly spread in the time period of N days, the total data amount is less but the user coverage is high, meanwhile, in the process of updating the data, the historical data is attenuated, the weight of the historical data is gradually reduced when the user portrait is constructed as the time passes, the situation that the weight of the historical data is too high in the user portrait is avoided, and the accurate user portrait cannot be obtained is avoided, so that the obtained historical behavior data should be selected from the partition closest to the current day when the user portrait mails are carried out.
In a preferred embodiment, the calculating the decay factor of each of the full-scale historic behaviors according to the date difference between the access date and the current day includes:
calculating the attenuation coefficient of each historical behavior in the full amount of historical behaviors by the following formula:
decay_ratio day of the day =f(x)*decay_ratio Last time
f(x)=exp(-1*decayRatio*x);
Wherein, the decay_ratio Day of the day Coefficient index representing the day, decay_ratio Last time The coefficient index representing the last time, f (x) representing the decay index, decayRatio representing a preset constant, x representing the date difference between the date of visit and the current day associated with the historical behavior.
In a preferred embodiment, the acquiring the current day total user behavior data of the current day active user includes:
acquiring current day user behavior original data of a current active user from a behavior detail table; each piece of original behavior data in the current day user behavior original data at least comprises user identification information, behavior event type and behavior event source identification;
and classifying and combining each row of data in the current day user behavior original data according to the user identification information, the behavior event type and the behavior event source identification to obtain the current day total user behavior data of the current day active user.
In a preferred embodiment, the user identification information includes a user mobile phone number and a user equipment number; classifying and combining each piece of original behavior data in the original data of the current day user behavior according to the user identification information, the behavior event type and the behavior event source identification to obtain current day total user behavior data of the current day active user, and further comprising:
Associating the user mapping table, and carrying out information complementation on the user behavior lacking the user mobile phone number in the current total user behavior according to the user equipment number;
and classifying and combining each row of data in the full-time user behavior data of the current day after the information is completed so as to update the full-time user behavior data of the current day.
Referring to fig. 5, fig. 5 is a schematic diagram of a current day user behavior processing according to an embodiment of the present invention. Taking a customer 0002 user as an example, classifying all behaviors according to phone_num, customer_id, event_id (behavior event type) and source_id (behavior event source identification) of the total behaviors of the currently active user, associating mobile phone numbers, finally reserving only one mobile phone number, and then calculating an initial attenuation coefficient, wherein the initial attenuation coefficient is different according to the behavior types:
play behavior: the play behavior basic coefficient is the play rate, namely 'play time play_time/total time length', wherein the total time length is the content basic attribute, and other behaviors can be directly inquired through the source_id in the content attribute table: a base coefficient of 1, such as the user_share behavior in the example;
through the combination and the association, the twice combination, the current day behavior data of the client 0002 is finally obtained, the data contains phone_num, client_id, event_id, source_id and decay_ratio information, the data is temporarily reserved, and in the combination of the current day behavior of a user and the historical behavior of the user, the data is combined with the attenuated historical behavior of the client 0002, so that the full amount of behavior data of the client 0002 in the terminal is obtained.
In a preferred embodiment, when behavior data with a value greater than a preset threshold value exists in the attenuation coefficient, an exponential function is adopted to restrict the behavior data so that each behavior data of the attenuation coefficient falls within the preset threshold value.
In a preferred embodiment, the user identification information includes a user mobile phone number and a user equipment number, and the user mapping table further records an access type associated with the user identification information; the access type is regular login, anonymity, copy or first login;
when the historical behavior of the user with the access type of the regular login or the copy is obtained from the behavior decay table, historical behavior obtaining is carried out according to the mobile phone number of the user;
and when the historical behavior of the user with the access type being anonymous or first logged-in is obtained from the behavior decay table, historical behavior obtaining is carried out according to the user equipment number.
Further, referring to fig. 6, fig. 6 is a schematic diagram of a table for calculating correlation of user portraits according to an embodiment of the present invention. Inquiring the current day partition of the behavior decay table to obtain the user behavior to be calculated on the current day, then associating the content portrait table, coloring the user portrait according to the label corresponding to each content ID of the user behavior, and finally obtaining the user portrait. Wherein, the content ID corresponds to the behavior event source identifier, each content ID of the content portrait list is associated with labels such as a major class (first class), a type (second class), an actor (third class), a keyword (third class), and the like, each level has a corresponding weight, the first-class label weight is 0.25, the second-class label weight is 0.5, the third-class label weight is 1, different behavior event types include different behavior weights, for example, the behavior weight of user_play is 1, the behavior weight of user_share is 1.5, and a single label score is equal to the attenuation coefficient multiplied by the behavior weight multiplied by the label level weight.
Compared with the prior art, at the level of user behaviors, the behaviors are combined, associated, attenuated and traced back, and the user behaviors are uniformly spread in N time periods in a window mode, so that the defect of processing the reflux user and the history preference attenuation in the prior art is overcome, the user coverage is greatly improved, the problem of tracing back the history behaviors after anonymous user login is perfectly solved, the efficiency of processing the history full-quantity behaviors is guaranteed, and the accurate calculation of the full-quantity user portraits in the opposite terminal is realized.
The embodiment of the invention also provides a user portrait construction device, which comprises:
the behavior data acquisition module is used for responding to the user portrait construction instruction and acquiring target behavior data according to the target user identification;
the user portrait construction module is used for constructing a target user portrait according to the target behavior data; the target behavior data are data obtained from a last N-day partition of a behavior attenuation table according to the target user identification, and the current day total user behavior data of current day active users of the set date and the user history behavior data of current day total users to be calculated, which are obtained through attenuation processing, are recorded in the set-date partition of the behavior attenuation table; in a zone of a set date of the behavior decay table, the access date of the full-quantity user to be calculated on the same day is set as the set date; in the zone of the set date of the behavior decay table, the total users to be calculated on the same day comprise the active users on the same day and users with the original access date being N days before the set date.
It should be noted that, the working process of the user portrait construction device may refer to the working process of the user portrait construction method in the foregoing embodiment, which is not described herein.
Compared with the prior art, the user portrait construction device disclosed by the embodiment of the invention acquires target behavior data according to the target user identifier by responding to the user portrait construction instruction so as to be used for constructing the target user portrait; the target behavior data are data obtained from a last N-day partition of a behavior attenuation table according to the target user identification, and the current day total user behavior data of current day active users of the set date and the user history behavior data of current day total users to be calculated, which are obtained through attenuation processing, are recorded in the set-date partition of the behavior attenuation table; in a zone of a set date of the behavior decay table, the access date of the full-quantity user to be calculated on the same day is set as the set date; in the zone of the set date of the behavior decay table, the total number of users to be calculated on the same day includes the active users on the same day and users with original access date being N days before the set date. Therefore, the embodiment of the invention uniformly spreads the user behaviors in N time periods in a window mode, carries out historical preference attenuation according to the user activity, ensures the efficiency when processing the historical full-quantity behaviors, shortens the total construction time of the user portrait, and improves the accuracy of the constructed user portrait.
The embodiment of the invention also provides user portrait construction equipment, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the steps in the embodiment of the user portrait construction method are realized by the processor when the computer program is executed, such as steps S1-S2 in FIG. 1; alternatively, the processor may implement the functions of the modules in the above-described device embodiments when executing the computer program.
The computer program may be divided into one or more modules, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more modules may be a series of computer program instruction segments capable of performing particular functions for describing the execution of the computer program in the user portrayal construction device. For example, the computer program may be divided into a row of data acquisition modules and a user portrayal construction module, each module functioning in particular as follows:
the behavior data acquisition module is used for responding to the user portrait construction instruction and acquiring target behavior data according to the target user identification;
The user portrait construction module is used for constructing a target user portrait according to the target behavior data; the target behavior data are data obtained from a last N-day partition of a behavior attenuation table according to the target user identification, and the current day total user behavior data of current day active users of the set date and the user history behavior data of current day total users to be calculated, which are obtained through attenuation processing, are recorded in the set-date partition of the behavior attenuation table; in a zone of a set date of the behavior decay table, the access date of the full-quantity user to be calculated on the same day is set as the set date; in the zone of the set date of the behavior decay table, the total number of users to be calculated on the same day includes the active users on the same day and users with original access date being N days before the set date.
The specific working process of each module may refer to the working process of the user portrait construction device described in the foregoing embodiment, which is not described herein.
The user portrait construction equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The user portrayal construction device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the user portrayal construction device may also include input and output devices, network access devices, buses, etc.
The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the control center of the user portrayal construction device and that uses various interfaces and lines to connect the various parts of the overall user portrayal construction device.
The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the user portrayal construction device by running or executing the computer program and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as an image playing function, etc.) required for at least one function, etc.; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the modules integrated by the user portrayal construction device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims (12)

1. A user portrait construction method is characterized by comprising the following steps:
responding to the user portrait construction instruction, and acquiring target behavior data according to a target user identifier;
constructing a target user portrait according to the target behavior data; the target behavior data are data obtained from a last N-day partition of a behavior attenuation table according to the target user identification, and the current day total user behavior data of current day active users of the set date and the user history behavior data of current day total users to be calculated, which are obtained through attenuation processing, are recorded in the set-date partition of the behavior attenuation table; in a zone of a set date of the behavior decay table, the access date of the full-quantity user to be calculated on the same day is set as the set date; in the zone of the set date of the behavior decay table, the total users to be calculated on the same day comprise active users on the same day of the set date and users with original access date being N days before the set date.
2. The user portrayal construction method of claim 1, wherein the behavior decay table is constructed by:
acquiring active users on the same day and inactive users within N days of a set date, and combining to obtain the total number of users to be calculated on the same day; wherein, the inactive users in the N days are users with access dates which are set to be the date N days before;
acquiring the current day full-dose user behavior data of the current day active user;
inquiring a behavior attenuation table according to the access date of each user of the total users to be calculated on the same day, acquiring the total historical behavior of the total users to be calculated on the same day, and carrying out attenuation processing on the total historical behavior to obtain user historical behavior data;
combining the current day full-volume user behavior data with the user historical behavior data to obtain current day full-volume behavior data to be calculated;
setting the access date of the total number of users to be calculated on the same day as the set date;
and writing the total behavior data to be calculated on the same day into a current day partition of the behavior decay table on the set date.
3. The user portrayal construction method of claim 2, wherein the N days inactive users are obtained by:
Inquiring the partition data of the previous day of the set date of the user mapping table, and acquiring the users with access dates N days before the set date to obtain inactive users within N days; the user mapping table records user identification information and access date associated with the user identification information;
the step of carrying out attenuation processing on the total historical behaviors to obtain user historical behavior data comprises the following steps: calculating the attenuation coefficient of each history behavior in the total amount of history behaviors according to the date difference value of the access date and the current day to obtain the user history behavior data;
the method further comprises the steps of:
and merging the total number of users to be calculated on the same day with the data of the previous day partition of the set date of the user mapping table, and writing the data into the current day partition of the user mapping table on the set date.
4. The user portrait construction method according to claim 1, wherein said obtaining target behavior data according to a target user identification includes:
and inquiring the latest N-day partition of the behavior decay table according to the target user identifier, determining the partition which records the behavior data of the target user identifier and has the date closest to the current date, and screening the behavior data associated with the target user identifier from the partition to serve as target behavior data.
5. The user portrait construction method according to claim 3 wherein said calculating a decay coefficient of each of said full-scale historical behaviors from a date difference between a date of access and a date of day includes:
calculating the attenuation coefficient of each historical behavior in the full amount of historical behaviors by the following formula:
decay_ratio day of the day =f(x)*decay_ratio Last time
f(x)=exp(-1*decayRatio*x);
Wherein, the decay_ratio Day of the day Coefficient index representing the day, decay_ratio Last time The coefficient index representing the last time, f (x) representing the decay index, decayRatio representing a preset constant, x representing the date difference between the date of visit and the current day associated with the historical behavior.
6. The user portrait construction method of claim 3 wherein said obtaining current day full volume user behavior data of said current day active user includes:
acquiring current day user behavior original data of a current active user from a behavior detail table; wherein, each behavior original data in the current day user behavior original data at least comprises the user identification information, behavior event type and behavior event source identification;
and classifying and combining each piece of behavior original data in the current day user behavior original data according to the user identification information, the behavior event type and the behavior event source identification to obtain the current day total user behavior data of the current day active user.
7. The user portrait construction method of claim 6 in which the user identification information includes a user handset number and a user equipment number;
classifying and combining each behavior original data in the current day user behavior original data according to the user identification information, the behavior event type and the behavior event source identification to obtain current day total user behavior data of the current day active user, and further comprising:
associating the user mapping table, and carrying out information complementation on the user behavior lacking the user mobile phone number in the current total user behavior according to the user equipment number;
and classifying and combining each row of data in the full-time user behavior data of the current day after the information is completed so as to update the full-time user behavior data of the current day.
8. The user portrait construction method according to claim 6 or 7, wherein when there is behavior data whose value is greater than a preset threshold in attenuation coefficients of the total amount of current day user behavior data, it is constrained by an exponential function so that the attenuation coefficients fall within the preset threshold.
9. The user portrait construction method according to claim 3, wherein said user identification information includes a user mobile phone number and a user equipment number, and said user mapping table further records an access type associated with said user identification information; the access type is regular login, anonymity, copy or first login;
When the historical behavior of the user with the access type of the regular login or the copy is obtained from the behavior decay table, historical behavior obtaining is carried out according to the mobile phone number of the user;
and when the historical behavior of the user with the access type being anonymous or first logged-in is obtained from the behavior decay table, historical behavior obtaining is carried out according to the user equipment number.
10. A user portrayal construction device comprising:
the behavior data acquisition module is used for responding to the user portrait construction instruction and acquiring target behavior data according to the target user identification;
the user portrait construction module is used for constructing a target user portrait according to the target behavior data; the target behavior data are data obtained from a last N-day partition of a behavior attenuation table according to the target user identification, and the current day total user behavior data of current day active users of the set date and the user history behavior data of current day total users to be calculated, which are obtained through attenuation processing, are recorded in the set-date partition of the behavior attenuation table; in a zone of a set date of the behavior decay table, the access date of the full-quantity user to be calculated on the same day is set as the set date; in the zone of the set date of the behavior decay table, the total number of users to be calculated on the same day includes the active users on the same day and users with original access date being N days before the set date.
11. A user portrayal construction device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the user portrayal construction method according to any one of claims 1 to 9 when the computer program is executed.
12. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the user portrayal construction method according to any one of claims 1 to 9.
CN202310833638.2A 2023-07-07 2023-07-07 User portrait construction method, device, equipment and storage medium Pending CN117009654A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310833638.2A CN117009654A (en) 2023-07-07 2023-07-07 User portrait construction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310833638.2A CN117009654A (en) 2023-07-07 2023-07-07 User portrait construction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117009654A true CN117009654A (en) 2023-11-07

Family

ID=88562801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310833638.2A Pending CN117009654A (en) 2023-07-07 2023-07-07 User portrait construction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117009654A (en)

Similar Documents

Publication Publication Date Title
US11526653B1 (en) System and method for optimizing electronic document layouts
CN104834731B (en) A kind of recommended method and device from media information
US7373338B2 (en) Access control to shared resources
US20110252121A1 (en) Recommendation ranking system with distrust
CN109614347B (en) Processing method and device for multi-level cache data, storage medium and server
US20130212115A1 (en) Tag inheritance
CN103646049B (en) The method and system of automatically generated data form
CA3069908A1 (en) Differentially private query budget refunding
CN110402570A (en) Information processing method and system, server, terminal, computer storage medium
CN108021673A (en) A kind of user interest model generation method, position recommend method and computing device
US11606330B2 (en) Domain name determination
CN112434015B (en) Data storage method and device, electronic equipment and medium
CN112686519A (en) Gray scale adjusting method and device, electronic equipment and storage medium
CN107943542A (en) A kind of configuration information management method, device, computer-readable recording medium and storage control
CN105117489B (en) Database management method and device and electronic equipment
CN105989066A (en) Information processing method and device
CN113918149A (en) Interface development method and device, computer equipment and storage medium
US9372930B2 (en) Generating a supplemental description of an entity
KR102140325B1 (en) Method of fact-cheching, searching and managing contents based on blockchain and system thereof
US20170134484A1 (en) Cost-effective reuse of digital assets
CN117009654A (en) User portrait construction method, device, equipment and storage medium
CN116578984A (en) Risk management and control method, system, equipment and medium for business data
CN107295074A (en) It is a kind of to realize the method and apparatus that cloud resource is shown
CN110414813B (en) Index curve construction method, device and equipment
CN111259201B (en) Data maintenance method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination