CN106649681A - Data processing method, device and equipment - Google Patents

Data processing method, device and equipment Download PDF

Info

Publication number
CN106649681A
CN106649681A CN201611163210.8A CN201611163210A CN106649681A CN 106649681 A CN106649681 A CN 106649681A CN 201611163210 A CN201611163210 A CN 201611163210A CN 106649681 A CN106649681 A CN 106649681A
Authority
CN
China
Prior art keywords
user
label
weight
interest
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611163210.8A
Other languages
Chinese (zh)
Other versions
CN106649681B (en
Inventor
王玉伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201611163210.8A priority Critical patent/CN106649681B/en
Publication of CN106649681A publication Critical patent/CN106649681A/en
Application granted granted Critical
Publication of CN106649681B publication Critical patent/CN106649681B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data processing method, a data processing device and data processing equipment. Wherein, the method comprises the following steps: the data processing system obtains the accumulated weight of the user on each label according to the historical behavior data of the user in the scene and the weight of each label of each piece of information in the historical behavior data; calculating the ratio of the accumulated weight of the user on each label to the total accumulated weight of the user on all labels, and taking the ratio as the accumulated weight distribution of the user on each label; determining the interest weight of the user on each label according to the cumulative weight distribution of the user on each label and the corresponding total cumulative weight distribution of all the users on each label in the scene; and generating an interest distribution vector of the user in the scene by using each label and the interest weight of the user on each label. By implementing the embodiment of the invention, the personalized interest of the user can be highlighted, and the content recommendation efficiency is improved.

Description

A kind of data processing method, device and equipment
Technical field
The present invention relates to field of computer technology, and in particular to a kind of data processing method, device and equipment.
Background technology
With the explosive growth of content in network, how the interest based on user is to user's recommendation content interested One problem demanding prompt solution.In order to solve the problem, can according to the feedback of user, click on read etc. user behavior, with reference to The tag attributes of content itself, distribution of the counting user behavior on each label, as the foundation of commending contents.But in reality Middle discovery is trampled, a large amount of displayings of hot content and click often lead to user behavior and concentrate on some popular labels, it is impossible to prominent Go out the personalized interest of user, so as to cause the less efficient of commending contents.
The content of the invention
The embodiment of the present invention provides a kind of data processing method, device and equipment, can project the personalized interest of user, Improve the efficiency of commending contents.
Embodiment of the present invention first aspect provides a kind of data processing method, including:
Each label of every information in historical behavior data and the historical behavior data according to user under scene Weight Acquisition described in user's accumulated weight on each tab;
Calculate accumulated weight of the user on described each label total accumulative on all labels with the user Ratio between weight, is distributed as the user in the accumulated weight of each label;
All users are in institute under accumulated weight distribution and the scene according to the user on described each label Corresponding total accumulated weight distribution on each label is stated, interest weight of the user on described each label is determined;
Institute under the scene is generated using the interest weight of the user on described each label and described each label State the interest distribution vector of user.
Optionally, for each scene in scene set, using the user described each label under this scenario On interest weight, the user total accumulated weight under this scenario on all labels and the user in the field Total accumulated weight in scape set under all scenes, determines interest of the user under this scenario on described each label Weight proportion;
For each label, the interest weight ratio of the user under all scenes on the label is calculated Example sum, as total interest weight of the user under all scenes on the label;
Using described each label and the user in the corresponding total interest weight of described each label, institute is generated State final interest distribution vector of the user under all scenes.
Optionally, according to the feature of user's every information in historical behavior data under scene, by every information content Label vector is turned to, label and the weight of each label that the label vector has including every information.
Optionally, per bar row in the historical behavior data according to user under scene and the historical behavior data User's accumulated weight on each tab described in the Weight Acquisition of each label of information, including:For user under scene Historical behavior data in every information, the weight for calculating each label of every information is corresponding with every information Historical behavior produce product of the moment between the decay factor at current time, as the entirety power of every information Weight;Calculate the overall weight sum of the corresponding all information of historical behavior of the user, as the user it is described each Accumulated weight on label.
Optionally, historical behavior data of the user under each scene are obtained with predetermined period.
Accordingly, embodiment of the present invention second aspect also provides a kind of data processing equipment, including:
First acquisition module, in the historical behavior data according to user under scene and the historical behavior data User's accumulated weight on each tab described in the Weight Acquisition of each label of every information;
Computing module, for calculating accumulated weight of the user on described each label with the user in all marks Ratio between the total accumulated weight signed, is distributed as the user in the accumulated weight of each label;
Determining module, under the accumulated weight distribution according to the user on described each label and the scene All users corresponding total accumulated weight distribution on described each label, determines that the user is emerging on described each label Interesting weight;
Generation module, for being generated using the interest weight of the user on described each label and described each label The interest distribution vector of the user under the scene.
Optionally, the determining module is additionally operable to:
For each scene in scene set, using the interest on the user under this scenario described each label Weight, the user total accumulated weight under this scenario on all labels and the user are in the scene set Total accumulated weight under all scenes, determines interest weight ratio of the user under this scenario on described each label Example;
The computing module, is additionally operable to for each label, calculates the user under all scenes in the mark The interest weight proportion sum signed, as total interest power of the user under all scenes on the label Weight;
The generation module, is additionally operable to using described each label and the user in the corresponding institute of described each label Total interest weight is stated, final interest distribution vector of the user under all scenes is generated.
Optionally, quantization modules, for according to the feature of user's every information in historical behavior data under scene, by institute It is label vector to state every information quantization, label and described each mark that the label vector has including every information The weight of label.
Optionally, first acquisition module, specifically for:
Every information in historical behavior data for user under scene, calculates each label of every information Corresponding with the every information historical behavior of weight produce product of the moment between the decay factor at current time, make For the overall weight of every information;
The overall weight sum of the corresponding all information of historical behavior of the user is calculated, as the user described Accumulated weight on each label.
Optionally, the second acquisition module, for obtaining historical behavior data of the user under each scene with predetermined period.
The embodiment of the present invention third aspect additionally provides a kind of data processing equipment, including:Processor, memory, communication Interface and communication bus;
The processor, the memory and the communication interface are connected by the bus and are completed mutual leading to Letter;The memory storage executable program code;The executable journey that the processor passes through storage in the reading memory Sequence code running program corresponding with the executable program code, for performing a kind of data processing method;Wherein, institute The method of stating includes:
Each label of every information in historical behavior data and the historical behavior data according to user under scene Weight Acquisition described in user's accumulated weight on each tab;
Calculate accumulated weight of the user on described each label total accumulative on all labels with the user Ratio between weight, is distributed as the user in the accumulated weight of each label;
All users are in institute under accumulated weight distribution and the scene according to the user on described each label Corresponding total accumulated weight distribution on each label is stated, interest weight of the user on described each label is determined;
Institute under the scene is generated using the interest weight of the user on described each label and described each label State the interest distribution vector of user.
In the embodiment of the present invention, historical behavior data and the history row of the data handling system according to user under scene For the Weight Acquisition of each label of every information user accumulated weight on each tab in data, it may be determined that the user Interest weight on each tab, such that it is able to generate the interest distribution vector of the user under the scene, to project user Personalized interest, improve commending contents efficiency.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to institute in embodiment The accompanying drawing that needs are used is briefly described, it should be apparent that, drawings in the following description are only some enforcements of the present invention Example, for those of ordinary skill in the art, on the premise of not paying creative work, can be being obtained according to these accompanying drawings Obtain other accompanying drawings.
Fig. 1 is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention;
Fig. 4 is a kind of structural representation of data processing equipment provided in an embodiment of the present invention;
Fig. 5 is a kind of structural representation of data processing equipment provided in an embodiment of the present invention;
Fig. 6 is a kind of structural representation of data processing equipment provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
Under the trend of current information globalization, the product of a globalization will simultaneously face different user groups, because This, needs a set of unified modeling mechanism or service the historical behavior data of user are carried out to process the interest for obtaining user Model.However, it has been found in practice that, under different scenes, for example, from country variant, area, languages user, belong to different User group, due to the impact of the social factors such as its culture background, economic level, its demand to required news has larger Difference, the distribution of its information aggregate for receiving and user interest also has larger difference.For example, some developed countries may More focused on finance, fashion, some developing countries may more focused on urgent development, society, life kind news, accordingly Ground, the colony of different regions also can the different sports items of preference;Again for example, country variant, area, the user of languages see News agregator also has greatly difference, and the label distribution that corresponding user behavior is produced also has larger difference;Again for example, some ground The user in area belongs to multilingual user, and its interest tags set under different language needs to be integrated, and can just obtain complete With unified user interest set, recommend for follow-up news or product is pushed and provides complete, accurate, comprehensive user interest Model.Again for example, operation of the user in various applications on the mobile terminals such as PC, mobile phone can reflect the hobby of user, pass through Collect, collect behavior of the user in different application, more data can be provided for user interest modeling and supported, contribute to carrying Rise the complete and degree of accuracy of user interest modeling.However, the crowd on different application product, it sees that content and correspondence are obtained The label overall distribution of feedback also correspondingly has obvious difference.For example, throw in browser, physical culture, game application News or other guide, the feature distribution for obtaining the content of feedback has obvious difference, and both draw using upper user The overall popularity of different labels.
In sum, during the historical behavior data to user are processed, need using overall content and Label popularity aids in user personalized interest to model as deviation.Therefore, the data processing method in the embodiment of the present invention The overall difference of user group and news under different scenes (including but not limited to country, area, language, product) can be considered. For the historical behavior data of the user under different scenes, the total accumulated weight distribution calculated under all scenes is corresponding as calculating The deviation of the interest model of user.That is, the embodiment of the present invention can be integrated the user interest derived under each scene, Unified user interest profile model is set up, for information such as follow-up recommendation task, product, news complete unified user is provided Interest model.In the embodiment of the present invention, by considering participation putting as current user interest of the user under different scenes Reliability, by linear weighted function interest of the user under different scenes is merged, and obtains the final interest model of active user.
Further, the embodiment of the present invention can be adopted to the interest model of user and periodically update calculation process, every One fixed time slicing is updated to user's current interest model.Due to news content and corresponding tag set, Yi Jiyong Family interest can change over time, and the embodiment can more project the recent behavior of user, and reflect user in time The change of short-term interest.News that user was read, information or the application for using, according to its reading or use time distance The duration of current time, use time damped method, arrange read every time in the historical behavior data of user or usage behavior to The significance level of family current interest distribution.
Data processing method provided in an embodiment of the present invention, system and equipment are described in detail below.
Refer to Fig. 1, Fig. 1 is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention, the data Processing method can be performed by data handling system, and the data handling system can be arranged in terminal or server, this Inventive embodiments are not limited.As shown in figure 1, the data processing method may comprise steps of:
101st, data handling system is according to the feature of every information in scene set, by every information quantization be label to Amount.
In the embodiment of the present invention, label vector includes the label that every information has and each label in the information Weight.The interest of user is usually used the characteristic set of labeling to portray description, and such as user marks to " amusement ", " basketball " Sign preference, i.e. interest-degree.Use tkA label is represented, the news agregator that user μ read in history is denoted as C (μ).To every Individual news Ci, the tag representation of its feature is < (t1,wi1),(t2,wi2),…,(tn,win)>, wherein wikRepresent label tkIn Ci In significance level.For example, recommend for news, the label in news read to user is carried out described in the embodiment of the present invention Data processing, it is possible to obtain the historical behavior data of user on each label interest vector distribution.
In the embodiment of the present invention, each application that user is used, such as game application, shopping application, news category application, The products such as browser application, and country, area, language etc. be referred to as different scenes, correspondingly, the scene is not limited to above-mentioned Content;In the embodiment of the present invention, news that user was read, the application for using etc. are referred to as information, and correspondingly, the information can To include but is not limited to the above.The embodiment of the present invention according to the feature of every information can by every information summarize including Multiple labels, for example, the news agregator that the information was read for user, then can with the label of configuration information as amusement, it is social, bright Star, crime, video display, politics, the world, science and technology, health etc., i.e., every information can correspond to multiple labels.In the embodiment of the present invention, Significance level of the label in the information is set into the weight of label.
For example, the scene set including various scenes is represented with S, s represents a special scenes in S, and user μ exists The information aggregate corresponding to historical behavior data in scene s is denoted as C (μ), and every information is denoted as Ci, the mark that every information includes It is n to sign, respectively t1,t2,…,tk,…,tn, wikRepresent label tkIn information CiIn significance level, i.e. label tkWeight.Therefore, every information CiThe label vector for being quantified is<(t1,wi1),(t2,wi2),…,(tk,wik),…,(tn, win)>。
102nd, it is every in historical behavior data and the historical behavior data of the data handling system according to user under scene The Weight Acquisition of each label of bar information user accumulated weight on each tab.
In the embodiment of the present invention, data handling system can according to historical behavior data of the user under scene and this go through The label vector of every information obtains user's accumulated weight on each tab in history behavioral data.
Alternatively, data handling system can perform following steps to determine accumulated weight of the user on label:
Every information in historical behavior data for user under scene, calculates each label of every information Corresponding with the every information historical behavior of weight produce product of the moment between the decay factor at current time, make For the overall weight of every information;The overall weight sum of the corresponding all information of historical behavior of the user is calculated, As accumulated weight of the user on described each label.
The embodiment can according to user to the reading of every information or use time apart from the duration of current time, make The weight of label is set with time damped method, the weight of label is multiplied by the weight after decay factor is referred to as label and exist The overall weight of the information, so as to so that the user interest model that data handling system is obtained can reflect the history of user The significance level of each reading or usage behavior to user's current interest model in behavioral data.
For example, the label t of information Ci is calculatedkWeight wikHistorical behavior corresponding with information Ci produces the moment Apart from the decay factor at current timeBetween product, as the overall weight of the information, specially:Calculate The overall weight sum of the corresponding all information (i.e. information aggregate C (μ)) of historical behavior of user μ, as user μ in label tk On accumulated weightSpecially:
Wherein, decay factorIn, α is that (generally, 0 < α≤1), Ti goes through Preset Time attenuation parameter for information is corresponding History behavior produces duration of the moment apart from current time, that is, user is current to the reading of every information or use time distance The duration of time.
103rd, data handling system calculates user accumulated weight on each tab and the user on all labels Total accumulated weight between ratio, as the user each label accumulated weight be distributed.
In the embodiment of the present invention, data handling system can count label of the user in certain scene by step 103 Distribution, i.e. ratio situation of the historical behavior data of user in each label.
Specifically, user μ is in label tkOn accumulated weight beCorrespondingly, all labels of the user μ in scene s t1,t2,…,tk,…,tnOn total accumulated weightFor:Correspondingly, user μ is in label tkOn it is accumulative Weight distributionAs both ratio:
104th, data handling system according to the user on each tab accumulated weight distribution and the scene under own User's corresponding total accumulated weight distribution on each tab, determines user interest weight on each tab.
In the embodiment of the present invention, accumulated weight of the unique user on single label is distributed as shown in step 103, for example, User μ is in label tkOn accumulated weight be distributed asCorrespondingly, all users are corresponding total on each tab under scene Accumulated weight distribution is:On to that tag corresponding accumulated weight is corresponding on all labels with all users for all users Ratio between total accumulated weight sum.
For example, under scene s all users in label tkAbove corresponding accumulated weight is:
All users corresponding total accumulated weight on all labels is under scene s:
Accordingly, under scene s all users in label tkUpper corresponding total accumulated weight distribution is:
User group is reflected under scene s each Total accumulated weight distribution on label, it is thus possible to using total accumulated weight distribution vector to weigh scene s under each label Popular degree, i.e., total accumulated weight is distributed bigger label, more popular, label in information, news, application corresponding to it etc. Weight it is bigger, the information, news, application it is higher by the welcome degree of user group.
Correspondingly, in step 104, user's interest weight on each tab just can be according to above-mentioned user at each Accumulated weight distribution on labelAnd all users total accumulated weight distribution on each tabTo determine the use The difference that family is distributed on to that tag, using the difference user's interest-degree on to that tag can be represented.Specifically, user μ In label tkAccumulated weight distribution and all users be user group in this label tkOn total accumulated weight distribution differenceFor:
Wherein, the size of smoothing factor ∈, depending on being compared with actual number according to past prediction number.Difference is big, then put down Sliding coefficient should take larger;Conversely, then taking smaller.Smoothing factor is bigger, then recent tendentiousness influence of change is bigger;Conversely, Then recent tendentiousness influence of change is less, more smooths.
The differenceUser is reflected in label tkOn interest-degree and user group in label tkOn interest-degree Difference, therefore can be by the use of the difference as the interest weight of user, such that it is able to more clearly reflection user in the mark The personalized interest signed, accordingly, the interest weight of multiple labels may make up the user in the personalized emerging of the scene Interesting distribution vector.
Optionally, the accumulated weight distribution as user on label divides less than total accumulated weight of the user group on label During cloth, the difference is the numerical value less than 0, that is to say, that the label is not user's label interested, therefore, in order to more straight The interest distribution vector of the reflection user of sight, can remove the interest weight less than 0, i.e., user μ is in label tkOn interest weightFor:
When total accumulated weight of accumulated weight distribution of the user on label less than user group on label is distributed, the difference Different is the numerical value less than 0, and to some degree, such label is the uninterested label of user, therefore, it can utilization and is less than The 0 corresponding label of interest weight to get rid of the content pushed to user in corresponding partial content, reduce content push Error rate, i.e., user μ is in label tkOn weight of loseing interest inFor:
105th, data handling system is generated under the scene using the interest weight of user on each label and each label The interest distribution vector of the user.
For example, the interest distribution vector of user μ is under scene s:
It can be seen that, the embodiment of the present invention can utilize matching between the label vector and the interest distribution vector of user of information Spend to determine whether the information pushing to the user, it is emerging as user with traditional simple accumulated weight distribution by the use of user Interesting distribution vector carries out the method for commending contents and compares, and the interest distribution vector of the user constructed by the embodiment can be more " personalization " interest in prominent user interest, i.e., the accumulated weight distribution shown in step 104 using unique user is useful with institute Difference between the accumulated weight distribution at family can extract the uniqueness of user determining interest weight of the user on certain label Interest.For example, user clicks on the news for reading hot ticket, such as " Olympic Games " are clicked on user and read the new of less popular event News is compared, and it reflects that user is different to the level of interest of the corresponding label of such news, therefore described in the embodiment of the present invention Data processing method can build the interest distribution vector of the true interest of user of more fitting, it is thus possible in certain scene It is lower that content interested is pushed to user, improve the accuracy rate of content push.
Refer to Fig. 2, Fig. 2 is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention, the data Processing method can be performed by data handling system, and the data handling system can be arranged in terminal or server, this Inventive embodiments are not limited.Data processing method shown in Fig. 2 compared with the data processing method shown in Fig. 1, shown in Fig. 2 Data processing method can integrate the interest distribution vector of user under many scenes, obtain the overall interest under different scenes of user Distribution vector.Specifically, specifically, the data processing method shown in Fig. 2 can also be comprised the following steps:
106th, data handling system is for each scene in scene set, using the user under the scene each label On total accumulated weight and the user under the scene on all labels of interest weight, the user under all scenes Total accumulated weight, determines user interest weight proportion under the scene on each tab.
In the embodiment of the present invention, data handling system can obtain the user μ each mark under scene by step 105 The interest weight signedData handling system can obtain all label ts of the user μ in scene s by step 1031, t2,…,tk,…,tnOn total accumulated weightAccordingly, data handling system can basisUser μ is obtained in all fields Total accumulated weight N under scapeμ, i.e.,Wherein, s ∈ S;Accordingly, the user μ under scene s in label tkOn Interest weight proportion be:
For example, user μ under scene s in label t1On interest weight proportion can beIn label t2On Interest weight proportion can be
107th, data handling system is directed to each label, calculates user interest power under all scenes on to that tag Anharmonic ratio example sum, as total interest weight of the user under all scenes on to that tag.
In the embodiment of the present invention, interest weight proportion of the user under the scene on each tab can pass through step 106 obtaining, and accordingly, total interest weight of the user under all scenes on each tab is:The user is in all fields Interest weight proportion sum under scape on each label is used as the user in total interest weight w on to that tagμk, that is, For the final interest weight of all scenes.
For example, user μ under scene s in label tkOn interest weight proportion beSo can be obtained by the use Family is under all scenes in label tkOn total interest weight wμk
108th, using each label and user in the corresponding total interest weight of each label, generating should for data handling system Final interest distribution vector of the user under all scenes.
For example, final interest distribution vectors of the user μ under all scenes can be:
Preference (μ)=< wμ1,wμ2,…,wμn>
Wherein, system can represent it with sparse vector, and with it user's current interest model is updated.For example, wμ1It is use Family μ is directed to label t1Total interest weight, as user μ is to label t1Interest-degree;wμ2Label t is directed to for user μ2It is total emerging Interesting weight, as user μ is to label t2Interest-degree.
It can be seen that, in the embodiment shown in Fig. 3, data handling system not only can obtain single field by step 101-105 In scape under the interest weight and single game scape of user user interest distribution vector, can with by step 106-108 integrate scene Interest weight in set under each scene, to the interest weight in the interest distribution vector under each sceneCarry out linear Weighting, obtains user under all scenes to total interest weight of each label, to obtain all scenes in user it is final Interest distribution vector, it is seen then that the embodiment of the present invention can more fully hereinafter calculate the complete interest of user for different scenes Distribution, compensate for the disappearance of user's user interest profile when across scene, also for follow-up commending contents provides completely, it is accurate, Comprehensive user interest model.
Refer to Fig. 3, Fig. 3 is a kind of schematic flow sheet of data processing method provided in an embodiment of the present invention, the data Processing method can be performed by data handling system, and the data handling system can be arranged in terminal or server, this Inventive embodiments are not limited.Data processing method shown in Fig. 3 compared with the data processing method shown in Fig. 2, shown in Fig. 3 Data processing method can pass through periodically to obtain historical behavior data of the user under all scenes, and by shown in Fig. 2 Each step determine interest distribution vector of the user under all scenes.Specifically, the data processing method shown in Fig. 3 can be with Can include including all steps shown in Fig. 2, and step 102:Step 102a, step 102b and step 102c, specifically Ground:
102a, data handling system obtain historical behavior data of the user under each scene with predetermined period.
In the embodiment of the present invention, data handling system can be preset under scene the renewal of the interest distribution vector to user Cycle, to update the interest model of user.Therefore, data handling system can be with predetermined period acquisition user under each scene Historical behavior data.Wherein, the predetermined period can be the default update cycle.
It should be noted that historical behavior of the user that herein data handling system is obtained with predetermined period under each scene Data, can be data handling system after the interest model that have updated user every time, after the historical behavior data before emptying The new historical behavior data with regard to the user are recorded, be can also be in scene ShiShimonoseki in all of historical behavior data of user, Related null clear operation is not carried out, the embodiment of the present invention is not limited to it.
It should be noted that data handling system can be according to user for the history row under each scene in scene set For the adding up on each tab of the Weight Acquisition of each label of every information user in data and the historical behavior data Weight.Above-mentioned historical behavior data can record the day in the performed message reading operations under one or more scenes of user In will information.Wherein, the log information of the message reading operations can include user read message content, reading time, Message remarks etc., the embodiment of the present invention is without limitation.For popular, the log information can be that the history of user is read Record, history read footprint or history reads footprint etc..Certainly, user can be by touch control operations such as click, slips, current The page or jump page are reading corresponding message.
In the embodiment of the present invention, after execution of step 102a, can be with execution step 102b.
102b, data handling system are calculated per bar for every information in historical behavior data of the user under scene The weight of each label of information historical behavior corresponding with every information produce the moment apart from current time decay factor it Between product, as the overall weight of every information.
102c, data handling system calculate the overall weight sum of the corresponding all information of historical behavior of the user, As accumulated weight of the user on described each label.
In the embodiment of the present invention, the specific descriptions of step 102b and step 102c may be referred in embodiment 1 to step Rapid 102 associated description part, no longer will further be explained herein.
It can be seen that, in the embodiment of the present invention, data handling system can be by periodically acquisition user under each scene Historical behavior data, to update interest distribution vector of the user under the scene, if in conjunction with the embodiments from the point of view of 2, at data Reason system can also update final interest distribution vector of the user under many scenes, to update the data processing system in regard to The interest model of the user, so as to facilitate subsequent content to recommend related work.
Refer to Fig. 4, Fig. 4 is a kind of structural representation of data processing equipment provided in an embodiment of the present invention, the data Processing meanss can apply in data handling system, and the data handling system can be arranged in terminal or server, this Inventive embodiments are not limited.As shown in figure 4, the data processing equipment can include:
First acquisition module 401, for the historical behavior data according to user under scene and the historical behavior data In the Weight Acquisition of each label of every information user accumulated weight on each tab.
In the embodiment of the present invention, the first acquisition module 401 can be in historical behavior data of the user under scene Every information, calculating the weight historical behavior corresponding with every information of each label of every information, to produce moment distance current Product between the decay factor at moment, as the overall weight of every information;Calculate the corresponding institute of historical behavior of the user There is the overall weight sum of information, as user accumulated weight on each tab.
Computing module 402, for calculating user accumulated weight on each tab with the user in all labels Total accumulated weight between ratio, as the user each label accumulated weight be distributed.
Determining module 403, for according to the user on each tab accumulated weight distribution and the scene under own User's corresponding total accumulated weight distribution on each tab, determines user interest weight on each tab.
Generation module 404, for generating the scene using the interest weight of the user on each label and each label The interest distribution vector of the lower user.
It can be seen that, in the embodiment shown in Fig. 4, data handling system can utilize the label vector of information and the interest of user Matching degree between distribution vector is simple using the accumulative of user with traditional determining whether the information pushing to the user Weight distribution is compared as the method that user interest distribution vector carries out commending contents, and the user's constructed by the embodiment is emerging Interesting distribution vector can more project " personalization " interest in user interest, wherein, data handling system utilizes unique user Accumulated weight distribution and the accumulated weight distribution of all users between difference determining interest of the user on certain label Weight, can extract unique interest of user.For example, user click on read hot ticket news, such as " Olympic Games ", with Family is clicked on the news of reading less popular event and is compared, and it reflects that user is different to the level of interest of the corresponding label of such news , thus the data processing method described in the embodiment of the present invention can build the true interest of user of more fitting interest be distributed to Amount, it is thus possible to push content interested to user under certain scene, improves the accuracy rate of content push.
Fig. 5 is seen also, Fig. 5 is a kind of structural representation of data processing equipment provided in an embodiment of the present invention, should Data processing equipment can apply in data handling system, and the data handling system can be arranged on terminal or server In, the embodiment of the present invention is not limited.Fig. 5 optimizes on the basis of Fig. 4 and obtains.Wherein, the data processing equipment includes First acquisition module 401, computing module 402, determining module 403, generation module 404, also obtain including quantization modules 405, second Delivery block 406, wherein, the device includes:
Optionally, determining module 403, are additionally operable to for each scene in scene set, using the user in the scene Total accumulated weight and the user of the interest weight, the user on lower each label under the scene on all labels is at this Total accumulated weight in scene set under all scenes, determines user interest weight ratio under the scene on each tab Example.
Optionally, computing module 402, are additionally operable to, for each label, calculate the user under all scenes in the label On interest weight proportion sum, as total interest weight of the user under all scenes on to that tag.
Optionally, generation module 404, are additionally operable to using each label and the user in the corresponding total interest of each label Weight, generates final interest distribution vector of the user under all scenes.
In the embodiment of the present invention, determining module 403 can be directed to each scene in scene set, determine the user at this Interest weight proportion under scene on each tab, and the user can be obtained under all scenes by computing module 402 Total interest weight on to that tag, then transfers to generation module 404 to generate final interest of the user under all scenes Distribution vector, the interest distribution complete to calculate user more fully hereinafter, compensate for user's user interest when across scene special The disappearance levied, also provides complete, accurate, comprehensive user interest model for follow-up commending contents.
Optionally, quantization modules 405, for according to the feature of user's every information in historical behavior data under scene, It is label vector by this every information quantization, the label vector includes the label that every information has and the power of each label Weight.
Optionally, the second acquisition module 406, for obtaining historical behavior number of the user under each scene with predetermined period According to.
It can be seen that, in the embodiment shown in Fig. 5, data handling system can not only can obtain single field with data handling system In scape under the interest weight and single game scape of user user interest distribution vector, each scene in scene set can also be integrated Under interest weight, linear weighted function is carried out to the interest weight in the interest distribution vector under each scene, obtain user in institute Have under scene to total interest weight of each label, to obtain all scenes in user final interest distribution vector, it is seen then that The embodiment of the present invention can be directed to different scenes calculate more fully hereinafter user it is complete interest distribution, compensate for user across The disappearance of user interest profile during scene, also provides complete, accurate, comprehensive user interest model for follow-up commending contents. Also, data handling system can also pass through periodically to obtain historical behavior data of the user under each scene, to update use Interest distribution vector of the family under the scene, and also final interest distribution vector of the user under many scenes can be updated, With regard to the interest model of the user in update the data processing system, so as to facilitate subsequent content to recommend related work.
Fig. 6 is referred to, Fig. 6 is a kind of structural representation of data processing equipment provided in an embodiment of the present invention, as schemed institute Show, the data processing equipment can include:At least one processor 601, such as CPU (Central Processing Unit, Central processing unit), at least one communication interface 603, memory 604, at least one communication bus 602.Wherein, communication bus 602 are used to realize the connection communication between these components.Wherein, communication interface 603 can include display screen (Display), key Disk (Keyboard), optional communication interface 603 can also include wireline interface, the wave point of standard.Memory 604 can be High-speed RAM memory (Ramdom Access Memory, effumability random access memory), or non-labile Memory (non-volatile memory), for example, at least one magnetic disc store.Memory 604 optionally can also be to A few storage device for being located remotely from aforementioned processor 601.The dress that wherein processor 601 can be with reference to described by Figure 4 and 5 Put, in memory 604 store batch processing code, and processor 601 call in memory 604 store program code, with In a kind of data processing method is performed, i.e., for performing following operation:
Each label of every information in historical behavior data and the historical behavior data according to user under scene Weight Acquisition described in user's accumulated weight on each tab;
Calculate accumulated weight of the user on described each label total accumulative on all labels with the user Ratio between weight, is distributed as the user in the accumulated weight of each label;
All users are in institute under accumulated weight distribution and the scene according to the user on described each label Corresponding total accumulated weight distribution on each label is stated, interest weight of the user on described each label is determined;
Institute under the scene is generated using the interest weight of the user on described each label and described each label State the interest distribution vector of user.
In the embodiment of the present invention, processor 601 calls the program code in memory 604, is additionally operable to perform following operation:
For each scene in scene set, using the interest on the user under this scenario described each label Weight, the user total accumulated weight under this scenario on all labels and the user are in the scene set Total accumulated weight under all scenes, determines interest weight ratio of the user under this scenario on described each label Example;
For each label, the interest weight ratio of the user under all scenes on the label is calculated Example sum, as total interest weight of the user under all scenes on the label;
Using described each label and the user in the corresponding total interest weight of described each label, institute is generated State final interest distribution vector of the user under all scenes.
In the embodiment of the present invention, processor 601 calls the program code in memory 604, according to user under scene User described in the Weight Acquisition of each label of every information marks at each in historical behavior data and the historical behavior data Before the accumulated weight signed, it is additionally operable to perform following operation:
It is label by every information quantization according to the feature of user's every information in historical behavior data under scene Vector, label and the weight of each label that the label vector has including every information.
In the embodiment of the present invention, processor 601 calls the program code in memory 604, according to user under scene User described in the Weight Acquisition of each label of every behavioural information is every in historical behavior data and the historical behavior data Accumulated weight on individual label, can perform following operation:
Every information in historical behavior data for user under scene, calculates each label of every information Corresponding with the every information historical behavior of weight produce product of the moment between the decay factor at current time, make For the overall weight of every information;
The overall weight sum of the corresponding all information of historical behavior of the user is calculated, as the user described Accumulated weight on each label.
In the embodiment of the present invention, processor 601 calls the program code in memory 604, for user under scene Every information in historical behavior data, the weight for calculating each label of every information is corresponding with every information Historical behavior produces product of the moment between the decay factor at current time, as every information overall weight it Before, it is additionally operable to perform following operation:
Historical behavior data of the user under each scene are obtained with predetermined period.
Wherein, communication bus 602 can be Peripheral Component Interconnect standard (peripheral component Interconnect, abbreviation PCI) bus or EISA (extended industry standard Architecture, abbreviation EISA) bus etc..The communication bus 602 can be divided into address bus, data/address bus, control always Line etc..For ease of representing, only represented with a thick line in Fig. 6, it is not intended that only one bus or a type of bus.
Wherein, memory 604 can include volatile memory (English:Volatile memory), such as arbitrary access Memory (English:Random-access memory, abbreviation:RAM);Memory can also include nonvolatile memory (English Text:Non-volatile memory), such as flash memory (English:Flash memory), hard disk (English:hard disk Drive, abbreviation:HDD) or solid state hard disc (English:Solid-state drive, abbreviation:SSD);Memory 604 can also be wrapped Include the combination of the memory of mentioned kind.
Wherein, processor 601 can be central processing unit (English:Central processing unit, abbreviation: CPU), network processing unit (English:Network processor, abbreviation:NP) or CPU and NP combination.
Wherein, processor 601 can further include hardware chip.Above-mentioned hardware chip can be special IC (English:Application-specific integrated circuit, abbreviation:ASIC), PLD (English: Programmable logic device, abbreviation:PLD) or its combination.Above-mentioned PLD can be CPLD (English:Complex programmable logic device, abbreviation:CPLD), field programmable gate array (English: Field-programmable gate array, abbreviation:FPGA), GAL (English:generic array Logic, abbreviation:GAL) or its any combination.
Alternatively, the memory 604 is additionally operable to storage program instruction.The processor 601 can call described program Instruction, realizes such as the application Fig. 1, the data processing method shown in 2 and 3 embodiments.
One of ordinary skill in the art will appreciate that realizing all or part of flow process in above-described embodiment method, can be Related hardware is instructed to complete by computer program, described program can be stored in a computer read/write memory medium In, the program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
Above disclosed is only a kind of preferred embodiment of the invention, can not limit the power of the present invention with this certainly Sharp scope, one of ordinary skill in the art will appreciate that all or part of flow process of above-described embodiment is realized, and according to present invention power Profit requires made equivalent variations, still falls within the covered scope of invention.

Claims (10)

1. a kind of data processing method, it is characterised in that include:
The power of each label of every information in historical behavior data and the historical behavior data according to user under scene Recapture and take user accumulated weight on each tab;
Calculate total accumulated weight of accumulated weight of the user on described each label with the user on all labels Between ratio, as the user each label accumulated weight be distributed;
All users are described every under accumulated weight distribution and the scene according to the user on described each label Corresponding total accumulated weight distribution, determines interest weight of the user on described each label on individual label;
The use under the scene is generated using the interest weight of the user on described each label and described each label The interest distribution vector at family.
2. method according to claim 1, it is characterised in that methods described also includes:
For each scene in scene set, weighed using the interest on the user under this scenario described each label Weight, the user total accumulated weight and the user institute in the scene set under this scenario on all labels There is the total accumulated weight under scene, determine interest weight proportion of the user under this scenario on described each label;
For each label, calculate the interest weight proportion of the user under all scenes on the label it With as total interest weight of the user under all scenes on the label;
Using described each label and the user in the corresponding total interest weight of described each label, the use is generated Final interest distribution vector of the family under all scenes.
3. method according to claim 2, it is characterised in that the historical behavior data according to user under scene with And user's accumulated weight on each tab described in the Weight Acquisition of each label of every information in the historical behavior data Before, methods described also includes:
According to the feature of user's every information in historical behavior data under scene, by every information quantization be label to Amount, label and the weight of each label that the label vector has including every information.
4. the method according to any one of claims 1 to 3, it is characterised in that the history according to user under scene User described in the Weight Acquisition of each label of every behavioural information marks at each in behavioral data and the historical behavior data The accumulated weight signed, including:
Every information in historical behavior data for user under scene, calculates the power of each label of every information Weight historical behavior corresponding with every information produces product of the moment between the decay factor at current time, used as institute State the overall weight of every information;
Calculate the overall weight sum of the corresponding all information of historical behavior of the user, as the user it is described each Accumulated weight on label.
5. method according to claim 4, it is characterised in that in the historical behavior data for user under scene Every information, when the weight historical behavior corresponding with every information for calculating each label of every information is produced The product between the decay factor at current time is carved, before the overall weight as every information, methods described is also Including:
Historical behavior data of the user under each scene are obtained with predetermined period.
6. a kind of data processing equipment, it is characterised in that include:
First acquisition module, in the historical behavior data according to user under scene and the historical behavior data per bar User's accumulated weight on each tab described in the Weight Acquisition of each label of information;
Computing module, for calculating accumulated weight of the user on described each label with the user in all labels Total accumulated weight between ratio, as the user each label accumulated weight be distributed;
Determining module, for owning under the accumulated weight distribution according to the user on described each label and the scene User's corresponding total accumulated weight distribution on described each label, determines interest power of the user on described each label Weight;
Generation module, it is described for being generated using the interest weight of the user on described each label and described each label The interest distribution vector of the user under scene.
7. device according to claim 6, it is characterised in that the determining module is additionally operable to:
For each scene in scene set, weighed using the interest on the user under this scenario described each label Weight, the user total accumulated weight and the user institute in the scene set under this scenario on all labels There is the total accumulated weight under scene, determine interest weight proportion of the user under this scenario on described each label;
The computing module, is additionally operable to for each label, and the calculating user is under all scenes on the label The interest weight proportion sum, as total interest weight of the user under all scenes on the label;
The generation module, is additionally operable to corresponding described total in described each label using described each label and the user Interest weight, generates final interest distribution vector of the user under all scenes.
8. device according to claim 7, it is characterised in that described device also includes:
Quantization modules, for according to the feature of user's every information in historical behavior data under scene, by every information Label vector is quantified as, label and the weight of each label that the label vector has including every information.
9. the device according to any one of claim 6 to 8, it is characterised in that first acquisition module, specifically for:
Every information in historical behavior data for user under scene, calculates the power of each label of every information Weight historical behavior corresponding with every information produces product of the moment between the decay factor at current time, used as institute State the overall weight of every information;
Calculate the overall weight sum of the corresponding all information of historical behavior of the user, as the user it is described each Accumulated weight on label.
10. device according to claim 9, it is characterised in that described device also includes:
Second acquisition module, for obtaining historical behavior data of the user under each scene with predetermined period.
CN201611163210.8A 2016-12-15 2016-12-15 Data processing method, device and equipment Active CN106649681B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611163210.8A CN106649681B (en) 2016-12-15 2016-12-15 Data processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611163210.8A CN106649681B (en) 2016-12-15 2016-12-15 Data processing method, device and equipment

Publications (2)

Publication Number Publication Date
CN106649681A true CN106649681A (en) 2017-05-10
CN106649681B CN106649681B (en) 2020-06-05

Family

ID=58822817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611163210.8A Active CN106649681B (en) 2016-12-15 2016-12-15 Data processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN106649681B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203602A (en) * 2017-05-15 2017-09-26 竹间智能科技(上海)有限公司 User model belief updating method and device based on chat memory
CN107729937A (en) * 2017-10-12 2018-02-23 北京京东尚科信息技术有限公司 For determining the method and device of user interest label
CN108062410A (en) * 2017-12-29 2018-05-22 北京奇元科技有限公司 A kind of method and device of definite object point of interest
CN108242016A (en) * 2018-01-25 2018-07-03 阿里巴巴集团控股有限公司 A kind of method and apparatus of Products Show
CN108596695A (en) * 2018-05-15 2018-09-28 口口相传(北京)网络技术有限公司 Entity method for pushing and system
CN109859002A (en) * 2019-01-04 2019-06-07 平安科技(深圳)有限公司 Product method for pushing, device, computer equipment and storage medium
CN110517665A (en) * 2019-08-29 2019-11-29 中国银行股份有限公司 Obtain the method and device of test sample
CN111125514A (en) * 2019-11-20 2020-05-08 泰康保险集团股份有限公司 User behavior analysis method and device, electronic equipment and storage medium
CN113497831A (en) * 2021-06-30 2021-10-12 西安交通大学 Content placement method and system based on feedback popularity under mobile edge network
CN114090854A (en) * 2022-01-24 2022-02-25 佰聆数据股份有限公司 Intelligent label weight updating method and system based on information entropy and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014062762A1 (en) * 2012-10-18 2014-04-24 Google Inc. Propagating information through networks
US20140270533A1 (en) * 2013-03-14 2014-09-18 Christopher Serge Benjamin Chedeau Image Cropping According to Points of Interest
CN105573995A (en) * 2014-10-09 2016-05-11 中国银联股份有限公司 Interest identification method, interest identification equipment and data analysis method
CN105740444A (en) * 2016-02-02 2016-07-06 桂林电子科技大学 User score-based project recommendation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014062762A1 (en) * 2012-10-18 2014-04-24 Google Inc. Propagating information through networks
US20140270533A1 (en) * 2013-03-14 2014-09-18 Christopher Serge Benjamin Chedeau Image Cropping According to Points of Interest
CN105573995A (en) * 2014-10-09 2016-05-11 中国银联股份有限公司 Interest identification method, interest identification equipment and data analysis method
CN105740444A (en) * 2016-02-02 2016-07-06 桂林电子科技大学 User score-based project recommendation method

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203602A (en) * 2017-05-15 2017-09-26 竹间智能科技(上海)有限公司 User model belief updating method and device based on chat memory
CN107729937B (en) * 2017-10-12 2020-11-03 北京京东尚科信息技术有限公司 Method and device for determining user interest tag
CN107729937A (en) * 2017-10-12 2018-02-23 北京京东尚科信息技术有限公司 For determining the method and device of user interest label
WO2019072091A1 (en) * 2017-10-12 2019-04-18 北京京东尚科信息技术有限公司 Method and apparatus for use in determining tags of interest to user
CN108062410A (en) * 2017-12-29 2018-05-22 北京奇元科技有限公司 A kind of method and device of definite object point of interest
CN108242016A (en) * 2018-01-25 2018-07-03 阿里巴巴集团控股有限公司 A kind of method and apparatus of Products Show
CN108242016B (en) * 2018-01-25 2022-01-25 创新先进技术有限公司 Product recommendation method and device
CN108596695A (en) * 2018-05-15 2018-09-28 口口相传(北京)网络技术有限公司 Entity method for pushing and system
CN108596695B (en) * 2018-05-15 2021-04-27 口口相传(北京)网络技术有限公司 Entity pushing method and system
CN109859002A (en) * 2019-01-04 2019-06-07 平安科技(深圳)有限公司 Product method for pushing, device, computer equipment and storage medium
CN109859002B (en) * 2019-01-04 2024-03-05 平安科技(深圳)有限公司 Product pushing method, device, computer equipment and storage medium
CN110517665A (en) * 2019-08-29 2019-11-29 中国银行股份有限公司 Obtain the method and device of test sample
CN111125514A (en) * 2019-11-20 2020-05-08 泰康保险集团股份有限公司 User behavior analysis method and device, electronic equipment and storage medium
CN111125514B (en) * 2019-11-20 2023-08-22 泰康保险集团股份有限公司 Method, device, electronic equipment and storage medium for analyzing user behaviors
CN113497831A (en) * 2021-06-30 2021-10-12 西安交通大学 Content placement method and system based on feedback popularity under mobile edge network
CN114090854A (en) * 2022-01-24 2022-02-25 佰聆数据股份有限公司 Intelligent label weight updating method and system based on information entropy and computer equipment
CN114090854B (en) * 2022-01-24 2022-04-19 佰聆数据股份有限公司 Intelligent label weight updating method and system based on information entropy and computer equipment

Also Published As

Publication number Publication date
CN106649681B (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN106649681A (en) Data processing method, device and equipment
KR102122373B1 (en) Method and apparatus for obtaining user portrait
WO2018040944A1 (en) System, method, and device for identifying malicious address/malicious purchase order
CN105069172B (en) Interest tags generation method
US9588648B2 (en) Providing history-based data processing
CN102609474B (en) A kind of visit information supplying method and system
CN107222566A (en) Information-pushing method, device and server
CN106943747B (en) Virtual role name recommendation method and device, electronic equipment and storage medium
CN106557929A (en) Logistics information processing method and processing device
CN111371767B (en) Malicious account identification method, malicious account identification device, medium and electronic device
CN109582772A (en) Contract information extracting method, device, computer equipment and storage medium
CN108460627A (en) Marketing activity scheme method for pushing, device, computer equipment and storage medium
CN110910201B (en) Information recommendation control method and device, computer equipment and storage medium
CN102934113A (en) Information provision system, information provision method, information provision device, program, and information recording medium
CN108777701A (en) A kind of method and device of determining receiver
US20230004979A1 (en) Abnormal behavior detection method and apparatus, electronic device, and computer-readable storage medium
CN103544150B (en) For browser of mobile terminal provides the method and system of recommendation information
CN113538070B (en) User life value cycle detection method and device and computer equipment
CN109600724A (en) A kind of method and apparatus that short message is sent
US20230259959A1 (en) Multi-target prediction method and apparatus, device, storage medium and program product
CN111429214B (en) Transaction data-based buyer and seller matching method and device
CN113569129A (en) Click rate prediction model processing method, content recommendation method, device and equipment
CN104142975A (en) Microblog information promotion method, device and system
CN113297486A (en) Click rate prediction method and related device
CN111723294A (en) AI-based RPA robot intelligent recommendation method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant