CN105260414A - User behavior similarity computing method and device - Google Patents

User behavior similarity computing method and device Download PDF

Info

Publication number
CN105260414A
CN105260414A CN201510618301.5A CN201510618301A CN105260414A CN 105260414 A CN105260414 A CN 105260414A CN 201510618301 A CN201510618301 A CN 201510618301A CN 105260414 A CN105260414 A CN 105260414A
Authority
CN
China
Prior art keywords
behavioural characteristic
user
behavior
characteristic
behavioural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510618301.5A
Other languages
Chinese (zh)
Other versions
CN105260414B (en
Inventor
李倚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Enyike (Beijing) Data Technology Co.,Ltd.
Original Assignee
JINGSHUO CENTURY TECHNOLOGY (BEIJING) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JINGSHUO CENTURY TECHNOLOGY (BEIJING) Co Ltd filed Critical JINGSHUO CENTURY TECHNOLOGY (BEIJING) Co Ltd
Priority to CN201510618301.5A priority Critical patent/CN105260414B/en
Publication of CN105260414A publication Critical patent/CN105260414A/en
Application granted granted Critical
Publication of CN105260414B publication Critical patent/CN105260414B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the invention provide a user behavior similarity computing method and device. The method comprises the steps of collecting a behavior characteristic value corresponding to each of a plurality of behavior characteristics of first class of users and second class of users; screening the plurality of behavior characteristics to obtain a target behavior characteristic set; building a first generalized linear model according to the target behavior characteristic set, computing a first maximum likelihood estimated value of the first generalized linear model by use of an optimization method, and obtaining an estimated parameter corresponding to the first maximum likelihood estimated value; and computing the behavior similarity of a to-be-tested user and the first class of users by use of the estimated parameter and the behavior characteristic value corresponding to each behavior characteristic in a target behavior characteristic set corresponding to the to-be-tested user. According to the method, the similarity of behaviors of different users is analyzed by full use of behavior characteristics of a lot of users, so that the use ratio of collected behavior characteristics of a lot of users is improved.

Description

User behavior similarity calculation method and device
Technical field
The embodiment of the present invention relates to field of computer technology, particularly relates to a kind of user behavior similarity calculation method and device.
Background technology
Along with the development of computer technology, user is by individual PC or mobile terminal connecting Internet and browsing network information.
Usually its interested information can be clicked during user's browsing network information; such as; user browses webpage by individual PC; this webpage comprises a lot of bar information; user, in navigation process, first sees the title of every bar information, if it is interested in this title; by this title of click, browse the content that this title is corresponding in detail.Prior art can collect the behavioural characteristic of a large number of users, such as, browse web page operation, webpage clicking operation, the info web browsed or click, the content information of click, the number of times of webpage clicking information, the time etc. of click.
But, lack the method utilizing the behavioural characteristic of a large number of users to analyze the similarity of different user behavior in prior art, cause the utilization factor of the behavioural characteristic to a large number of users collected lower.
Summary of the invention
The embodiment of the present invention provides a kind of user behavior similarity calculation method and device, to improve the utilization factor of the behavioural characteristic to a large number of users collected.
An aspect of the embodiment of the present invention is to provide a kind of user behavior similarity calculation method, comprising:
Gather the behavioural characteristic value that in multiple behavioural characteristics of first kind user, each behavioural characteristic is corresponding, and the behavioural characteristic value that in described multiple behavioural characteristic of Equations of The Second Kind user, each behavioural characteristic is corresponding;
The multiple behavioural characteristic value corresponding according to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user are carried out screening to described multiple behavioural characteristic and are obtained goal behavior characteristic set;
Set up the first generalized linear model according to described goal behavior characteristic set, utilize optimization method to calculate the first maximum likelihood estimation of described first generalized linear model, and obtain estimated parameter corresponding to described first maximum likelihood estimation;
Behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding is utilized to calculate the behavior similarity of described user to be measured and described first kind user.
Another aspect of the embodiment of the present invention is to provide a kind of user behavior Similarity measures device, comprising:
Acquisition module, for gather first kind user multiple behavioural characteristics in behavioural characteristic value corresponding to each behavioural characteristic, and the behavioural characteristic value that in described multiple behavioural characteristic of Equations of The Second Kind user, each behavioural characteristic is corresponding;
Screening module, for carrying out screening acquisition goal behavior characteristic set according to multiple behavioural characteristic value corresponding to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user to described multiple behavioural characteristic;
MBM, for setting up the first generalized linear model according to described goal behavior characteristic set, utilize optimization method to calculate the first maximum likelihood estimation of described first generalized linear model, and obtain estimated parameter corresponding to described first maximum likelihood estimation;
Computing module, for the behavior similarity utilizing behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding to calculate described user to be measured and described first kind user.
The user behavior similarity calculation method that the embodiment of the present invention provides and device, by the behavioural characteristic value that each behavioural characteristic in multiple behavioural characteristics that dissimilar user is corresponding is respectively corresponding, screening is carried out to multiple behavioural characteristic and obtain goal behavior characteristic set, generalized linear model is set up according to this goal behavior characteristic set, optimization method is utilized to calculate the maximum likelihood estimation of generalized linear model, and obtain estimated parameter corresponding to this maximum likelihood estimation, the behavior similarity of user to be measured and particular type of user is calculated by the behavioural characteristic value of this estimated parameter and user to be measured, the behavioural characteristic making full use of a large number of users analyzes the similarity of different user behavior, improve the utilization factor of the behavioural characteristic to a large number of users collected.
Accompanying drawing explanation
The user behavior similarity calculation method process flow diagram that Fig. 1 provides for the embodiment of the present invention;
The structural drawing of the user behavior Similarity measures device that Fig. 2 provides for the embodiment of the present invention;
The structural drawing of the user behavior Similarity measures device that Fig. 3 provides for another embodiment of the present invention.
Embodiment
The user behavior similarity calculation method process flow diagram that Fig. 1 provides for the embodiment of the present invention.The embodiment of the present invention is for lacking the method utilizing the behavioural characteristic of a large number of users to analyze the similarity of different user behavior in prior art, cause the utilization factor of the behavioural characteristic to a large number of users collected lower, provide user behavior similarity calculation method, the method concrete steps are as follows:
The behavioural characteristic value that in multiple behavioural characteristics of step S101, collection first kind user, each behavioural characteristic is corresponding, and the behavioural characteristic value that in described multiple behavioural characteristic of Equations of The Second Kind user, each behavioural characteristic is corresponding;
The embodiment of the present invention browses according to the multiple behavioural characteristics preset the behavioural characteristic gathering user in the process of webpage clicking information user, such as, multiple behavioural characteristic specifically comprises: whether browse certain webpage, the title whether clicked in certain webpage, user browse the time of certain webpage, the time clicking certain title, click title content, click the number of times etc. of certain title in one day, the number that the embodiment of the present invention does not limit the multiple behavioural characteristics preset is 6, can be any number of.In addition, the embodiment of the present invention carries out Digital ID to each behavioural characteristic in multiple behavioural characteristic in advance, and such as user has browsed certain webpage and has been designated as 1, and user does not browse certain webpage and is designated as 0; The title that user clicks in certain webpage is designated as 1, and the title that user does not click in certain webpage is designated as 0; The time that user browses certain webpage is designated as 1 in the morning, and noon is designated as 2, and be designated as 3 in the afternoon, and be designated as 4 in the evening; The time clicking certain title is designated as 1 in the morning, and noon is designated as 2, and be designated as 3 in the afternoon, and be designated as 4 in the evening; The title content clicked belongs to health diet and is designated as 1, and amusement and recreation are designated as 2, and financial investment is designated as 3, and scientific and technical information is designated as 4 etc.; The number of times clicking certain title in one day can define according to the number of times of certain title content of actual click.Such as, user has browsed certain webpage, click the title in certain webpage, user browses the time of certain webpage in the morning, click the time of certain title at noon, the title content clicked belongs to health diet, the number of times clicking certain title in one day is 3, then the browser of user side gathers the behavioural characteristic value that in multiple behavioural characteristics of this user, each behavioural characteristic is corresponding and is respectively 1,1,1,2,1,3, and the behavioural characteristic value that in multiple behavioural characteristics of this user, each behavioural characteristic is corresponding can form a behavioural characteristic vector [1,1,1,2,1,3].
Described first kind user is the user meeting first object behavioural characteristic, and described Equations of The Second Kind user is the user meeting the second goal behavior feature, and described first object behavioural characteristic has the identical behavioural characteristic of part with described second goal behavior feature.
The embodiment of the present invention gathers the behavioural characteristic value that in multiple behavioural characteristics of first kind user and Equations of The Second Kind user, each behavioural characteristic is corresponding respectively, first kind user is specially seed user, Equations of The Second Kind user is specially contrast user, seed user is identical with the part behavioural characteristic of contrast user, part behavioural characteristic is different, such as, seed user is the user having browsed certain brand milk advertisement and clicked this advertisement, contrast user is the user having browsed certain brand milk advertisement but do not clicked this advertisement, and the identification number of first kind user is 1, the identification number of Equations of The Second Kind user is 0.Such as, the embodiment of the present invention gathers the behavioural characteristic value that in multiple behavioural characteristics of 100 first kind users and 100 Equations of The Second Kind user difference correspondences, each behavioural characteristic is corresponding, namely 100 first kind users are to there being 100 behavioural characteristic vectors, and 100 Equations of The Second Kind users are to there being 100 behavioural characteristic vectors.
Step S102, according to multiple behavioural characteristic value corresponding to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user, screening is carried out to described multiple behavioural characteristic and obtain goal behavior characteristic set;
Number due to the multiple behavioural characteristics preset can be any number of, but some behavioural characteristic is redundancy for the user behavior similarity calculation method that the embodiment of the present invention provides in the plurality of behavioural characteristic, so need to carry out screening to multiple behavioural characteristic to obtain goal behavior characteristic set.
Step S103, set up the first generalized linear model according to described goal behavior characteristic set, utilize optimization method to calculate the first maximum likelihood estimation of described first generalized linear model, and obtain estimated parameter corresponding to described first maximum likelihood estimation;
The first generalized linear model is set up according to this goal behavior characteristic set, the method setting up generalized linear model can adopt any one method in prior art, optimization method is utilized to calculate the first maximum likelihood estimation of described first generalized linear model, can obtain corresponding estimated parameter by this first maximum likelihood estimation, the number of this estimated parameter is identical with the number of behavioural characteristic in goal behavior characteristic set.
Step S104, behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding is utilized to calculate the behavior similarity of described user to be measured and described first kind user.
The described behavior similarity utilizing behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding to calculate described user to be measured and described first kind user, comprise: described estimated parameter is formed primary vector, behavioural characteristic value corresponding for each behavioural characteristic in described goal behavior characteristic set corresponding for described user to be measured is formed secondary vector; The inner product calculating described primary vector and described secondary vector obtains described behavior similarity.
Behavioural characteristic value that in this estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding is utilized to calculate the behavior similarity of described user to be measured and described first kind user, be specially the behavioural characteristic value that in the described goal behavior characteristic set gathering user to be measured, each behavioural characteristic is corresponding, the behavior, eigenwert formed behavioural characteristic value vector, by the behavior feature value vector and the one-dimensional vector that forms of estimated parameter do inner product, this inner product value is the behavior similarity of described user to be measured and described first kind user.
The embodiment of the present invention carries out screening acquisition goal behavior characteristic set by the behavioural characteristic value that each behavioural characteristic in the corresponding respectively multiple behavioural characteristics of dissimilar user is corresponding to multiple behavioural characteristic, generalized linear model is set up according to this goal behavior characteristic set, optimization method is utilized to calculate the maximum likelihood estimation of generalized linear model, and obtain estimated parameter corresponding to this maximum likelihood estimation, the behavior similarity of user to be measured and particular type of user is calculated by the behavioural characteristic value of this estimated parameter and user to be measured, the behavioural characteristic making full use of a large number of users analyzes the similarity of different user behavior, improve the utilization factor of the behavioural characteristic to a large number of users collected.
On the basis of above-described embodiment, the described multiple behavioural characteristic value corresponding according to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user are carried out screening to described multiple behavioural characteristic and are obtained goal behavior characteristic set, comprising:
The multiple behavioural characteristic value corresponding according to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user calculate coverage rate, chi amount and the information entropy that in described multiple behavioural characteristic, each behavioural characteristic is corresponding respectively;
From described multiple behavioural characteristic, delete the behavioural characteristic that coverage rate is less than the behavioural characteristic of first threshold, chi amount is less than Second Threshold behavioural characteristic and information entropy be less than the 3rd threshold value obtain the first behavior characteristic set;
Delete the degree of association any one behavioural characteristic be greater than in two behavioural characteristics of the 4th threshold value in described first behavior characteristic set and obtain the second behavioural characteristic set;
The second generalized linear model is set up according to described second behavioural characteristic set, utilize optimization method to calculate the maximum likelihood estimation of described second generalized linear model, delete in described second behavioural characteristic set and do not have influential behavioural characteristic to obtain described goal behavior characteristic set to described second maximum likelihood estimation.
On the basis of above-described embodiment, 100 first kind users are to there being 100 behavioural characteristic vectors, 100 Equations of The Second Kind users, to there being 100 behavioural characteristic vectors, distinguish corresponding coverage rate, chi amount and information entropy according to these 200 each behavioural characteristics of behavioural characteristic vector calculation i.e. " whether browsing certain webpage ", " whether clicking the title in certain webpage ", " user browses the time of certain webpage ", " clicking the time of certain title ", " title content of click ", " clicking the number of times of certain title in one day ".
Foundation coverage rate order is from big to small to 6 behavioural characteristics: " whether browsing certain webpage ", " whether clicking the title in certain webpage ", " user browses the time of certain webpage ", " clicking the time of certain title ", " title content of click ", " clicking the number of times of certain title in one day " are sorted, and such as, last behavioural characteristic after sequence is " title content of click ", sort to 6 behavioural characteristics according to chi amount order from big to small, such as, last behavioural characteristic after sequence is " number of times clicking certain title in a day ", sort to 6 behavioural characteristics according to information entropy order from big to small, such as, last behavioural characteristic after sequence is " number of times clicking certain title in a day ", the behavioural characteristic that coverage rate is less than first threshold is deleted from described multiple behavioural characteristic, chi amount is less than the behavioural characteristic that the behavioural characteristic of Second Threshold and information entropy be less than the 3rd threshold value and obtains the first behavior characteristic set, last behavioural characteristic after aforementioned three sequences specifically can be deleted from multiple behavioural characteristic, namely whether " title content of click " and " clicking the number of times of certain title in one day " reservation " browses certain webpage ", " whether click the title in certain webpage ", " user browses the time of certain webpage ", " click the time of certain title " and form the first behavior characteristic set.
First behavior characteristic set comprises 4 behavioural characteristics and " whether browses certain webpage ", " whether click the title in certain webpage ", " user browses the time of certain webpage ", " click the time of certain title ", wherein, " user browses the time of certain webpage " and " clicking the time of certain title " is all higher about its degree of association of behavioural characteristic of time, then delete any one behavioural characteristic in " user browses the time of certain webpage " and " clicking the time of certain title ", such as deletion " user browses the time of certain webpage " reservation " is clicked the time of certain title " and is obtained the second behavioural characteristic set afterwards.
This the second behavioural characteristic set comprises: " whether browsing certain webpage ", " whether click the title in certain webpage ", " click the time of certain title ", according to " whether browsing certain webpage " that 100 first kind users are corresponding respectively, " whether click the title in certain webpage ", behavioural characteristic value of " clicking the time of certain title " and corresponding respectively " whether the browsing certain webpage " of 100 Equations of The Second Kind users, " whether click the title in certain webpage ", behavioural characteristic value sets up generalized linear model again " to click the time of certain title ", and again utilize optimization method to calculate the maximum likelihood estimation of this generalized linear model, record this maximum likelihood estimation, removing in the second behavioural characteristic set utilizes optimization method to calculate the maximum likelihood estimation of this generalized linear model after any one behavioural characteristic, if maximum likelihood estimation does not change, illustrate that the behavioural characteristic removed does not affect maximum likelihood estimation, if maximum likelihood estimation there occurs change, illustrate that the behavioural characteristic removed has impact to maximum likelihood estimation, retain in the second behavioural characteristic set the influential behavioural characteristic of maximum likelihood estimation, remove and influential behavioural characteristic is not had to maximum likelihood estimation, further screening behavioural characteristic.Reasonable assumption, each behavioural characteristic in the second behavioural characteristic set has impact to maximum likelihood estimation, then the second behavioural characteristic set is goal behavior characteristic set.
The embodiment of the present invention is by screening to delete the behavioural characteristic of redundancy to multiple behavioural characteristic value, the behavioural characteristic filtered out is utilized to set up generalized linear model, and utilize optimization method to calculate the maximum likelihood estimation of this generalized linear model, improve counting yield.
On the basis of above-described embodiment, describedly utilize after behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding calculates the behavior similarity of described user to be measured and described first kind user, also comprise: judge whether more described behavior similarity is greater than the 5th threshold value; If described behavior similarity is greater than described 5th threshold value, then judge that described user to be measured is similar to the behavior of described first kind user; Add up the ratio of user to be measured similar to the behavior of described first kind user in all users to be measured.
The embodiment of the present invention carries out behavioural analysis to a large amount of users to be measured, the behavioural characteristic value of each user to be measured is gathered according to the behavioural characteristic in the goal behavior characteristic set obtained in above-described embodiment, namely gather " whether browsing certain webpage " that each user to be measured is corresponding respectively, " whether click the title in certain webpage ", the behavioural characteristic value " clicking the time of certain title ", and the method utilizing above-described embodiment to calculate behavior similarity calculates the behavior similarity of each user to be measured and first kind user, judge whether behavior similarity is greater than default threshold value, if be greater than, illustrate that this user to be measured is similar to the behavior of first kind user, also can count the ratio of user to be measured similar to the behavior of first kind user in all users to be measured simultaneously.
The embodiment of the present invention, by judging that user to be measured and the behavior similarity of first kind user are greater than a certain threshold value and determine that user to be measured is similar to the behavior of first kind user, also can obtain the ratio of user to be measured similar to the behavior of first kind user in all users to be measured.
The structural drawing of the user behavior Similarity measures device that Fig. 2 provides for the embodiment of the present invention.The user behavior Similarity measures device that the embodiment of the present invention provides can perform the treatment scheme that user behavior similarity calculation method embodiment provides, as shown in Figure 2, user behavior Similarity measures device 20 comprises acquisition module 21, screening module 22, MBM 23 and computing module 24, wherein, acquisition module 21 for gather first kind user multiple behavioural characteristics in behavioural characteristic value corresponding to each behavioural characteristic, and the behavioural characteristic value that in described multiple behavioural characteristic of Equations of The Second Kind user, each behavioural characteristic is corresponding; Screening module 22 is for carrying out screening acquisition goal behavior characteristic set according to multiple behavioural characteristic value corresponding to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user to described multiple behavioural characteristic; MBM 23 is for setting up the first generalized linear model according to described goal behavior characteristic set, utilize optimization method to calculate the first maximum likelihood estimation of described first generalized linear model, and obtain estimated parameter corresponding to described first maximum likelihood estimation; The behavior similarity of computing module 24 for utilizing behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding to calculate described user to be measured and described first kind user.
The embodiment of the present invention carries out screening acquisition goal behavior characteristic set by the behavioural characteristic value that each behavioural characteristic in the corresponding respectively multiple behavioural characteristics of dissimilar user is corresponding to multiple behavioural characteristic, generalized linear model is set up according to this goal behavior characteristic set, optimization method is utilized to calculate the maximum likelihood estimation of generalized linear model, and obtain estimated parameter corresponding to this maximum likelihood estimation, the behavior similarity of user to be measured and particular type of user is calculated by the behavioural characteristic value of this estimated parameter and user to be measured, the behavioural characteristic making full use of a large number of users analyzes the similarity of different user behavior, improve the utilization factor of the behavioural characteristic to a large number of users collected.
The structural drawing of the user behavior Similarity measures device that Fig. 3 provides for another embodiment of the present invention.On the basis of above-described embodiment, screening module 22 is specifically for calculating according to multiple behavioural characteristic value corresponding to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user coverage rate, chi amount and the information entropy that in described multiple behavioural characteristic, each behavioural characteristic is corresponding respectively; From described multiple behavioural characteristic, delete the behavioural characteristic that coverage rate is less than the behavioural characteristic of first threshold, chi amount is less than Second Threshold behavioural characteristic and information entropy be less than the 3rd threshold value obtain the first behavior characteristic set; Delete the degree of association any one behavioural characteristic be greater than in two behavioural characteristics of the 4th threshold value in described first behavior characteristic set and obtain the second behavioural characteristic set; The second generalized linear model is set up according to described second behavioural characteristic set, utilize optimization method to calculate the maximum likelihood estimation of described second generalized linear model, delete in described second behavioural characteristic set and do not have influential behavioural characteristic to obtain described goal behavior characteristic set to described second maximum likelihood estimation.
Described first kind user is the user meeting first object behavioural characteristic, and described Equations of The Second Kind user is the user meeting the second goal behavior feature, and described first object behavioural characteristic has the identical behavioural characteristic of part with described second goal behavior feature.
Behavioural characteristic value corresponding for each behavioural characteristic in described goal behavior characteristic set corresponding for described user to be measured, specifically for described estimated parameter is formed primary vector, is formed secondary vector by computing module 24; The inner product calculating described primary vector and described secondary vector obtains described behavior similarity.
User behavior Similarity measures device 20 also comprises judge module 25 and statistical module 26, and wherein, judge module 25 is for judging whether more described behavior similarity is greater than the 5th threshold value; If described behavior similarity is greater than described 5th threshold value, then judge that described user to be measured is similar to the behavior of described first kind user; Statistical module 26 is for adding up the ratio of user to be measured similar to the behavior of described first kind user in all users to be measured.
The user behavior Similarity measures device that the embodiment of the present invention provides can specifically for performing the embodiment of the method that above-mentioned Fig. 1 provides, and concrete function repeats no more herein.
The embodiment of the present invention is by screening to delete the behavioural characteristic of redundancy to multiple behavioural characteristic value, the behavioural characteristic filtered out is utilized to set up generalized linear model, and utilize optimization method to calculate the maximum likelihood estimation of this generalized linear model, improve counting yield; By judging that user to be measured and the behavior similarity of first kind user are greater than a certain threshold value and determine that user to be measured is similar to the behavior of first kind user, the ratio of user to be measured similar to the behavior of first kind user in all users to be measured also can be obtained.
In sum, the embodiment of the present invention carries out screening acquisition goal behavior characteristic set by the behavioural characteristic value that each behavioural characteristic in the corresponding respectively multiple behavioural characteristics of dissimilar user is corresponding to multiple behavioural characteristic, generalized linear model is set up according to this goal behavior characteristic set, optimization method is utilized to calculate the maximum likelihood estimation of generalized linear model, and obtain estimated parameter corresponding to this maximum likelihood estimation, the behavior similarity of user to be measured and particular type of user is calculated by the behavioural characteristic value of this estimated parameter and user to be measured, the behavioural characteristic making full use of a large number of users analyzes the similarity of different user behavior, improve the utilization factor of the behavioural characteristic to a large number of users collected, by screening multiple behavioural characteristic value to delete the behavioural characteristic of redundancy, utilizing the behavioural characteristic filtered out to set up generalized linear model, and utilizing optimization method to calculate the maximum likelihood estimation of this generalized linear model, improve counting yield, by judging that user to be measured and the behavior similarity of first kind user are greater than a certain threshold value and determine that user to be measured is similar to the behavior of first kind user, the ratio of user to be measured similar to the behavior of first kind user in all users to be measured also can be obtained.
In several embodiment provided by the present invention, should be understood that, disclosed apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) or processor (processor) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-OnlyMemory, ROM), random access memory (RandomAccessMemory, RAM), magnetic disc or CD etc. various can be program code stored medium.
Those skilled in the art can be well understood to, for convenience and simplicity of description, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, inner structure by device is divided into different functional modules, to complete all or part of function described above.The specific works process of the device of foregoing description, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.

Claims (10)

1. a user behavior similarity calculation method, is characterized in that, comprising:
Gather the behavioural characteristic value that in multiple behavioural characteristics of first kind user, each behavioural characteristic is corresponding, and the behavioural characteristic value that in described multiple behavioural characteristic of Equations of The Second Kind user, each behavioural characteristic is corresponding;
The multiple behavioural characteristic value corresponding according to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user are carried out screening to described multiple behavioural characteristic and are obtained goal behavior characteristic set;
Set up the first generalized linear model according to described goal behavior characteristic set, utilize optimization method to calculate the first maximum likelihood estimation of described first generalized linear model, and obtain estimated parameter corresponding to described first maximum likelihood estimation;
Behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding is utilized to calculate the behavior similarity of described user to be measured and described first kind user.
2. method according to claim 1, it is characterized in that, the described multiple behavioural characteristic value corresponding according to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user are carried out screening to described multiple behavioural characteristic and are obtained goal behavior characteristic set, comprising:
The multiple behavioural characteristic value corresponding according to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user calculate coverage rate, chi amount and the information entropy that in described multiple behavioural characteristic, each behavioural characteristic is corresponding respectively;
From described multiple behavioural characteristic, delete the behavioural characteristic that coverage rate is less than the behavioural characteristic of first threshold, chi amount is less than Second Threshold behavioural characteristic and information entropy be less than the 3rd threshold value obtain the first behavior characteristic set;
Delete the degree of association any one behavioural characteristic be greater than in two behavioural characteristics of the 4th threshold value in described first behavior characteristic set and obtain the second behavioural characteristic set;
The second generalized linear model is set up according to described second behavioural characteristic set, utilize optimization method to calculate the maximum likelihood estimation of described second generalized linear model, delete in described second behavioural characteristic set and do not have influential behavioural characteristic to obtain described goal behavior characteristic set to described second maximum likelihood estimation.
3. method according to claim 1 and 2, it is characterized in that, described first kind user is the user meeting first object behavioural characteristic, described Equations of The Second Kind user is the user meeting the second goal behavior feature, and described first object behavioural characteristic has the identical behavioural characteristic of part with described second goal behavior feature.
4. method according to claim 3, it is characterized in that, the described behavior similarity utilizing behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding to calculate described user to be measured and described first kind user, comprising:
Described estimated parameter is formed primary vector, behavioural characteristic value corresponding for each behavioural characteristic in described goal behavior characteristic set corresponding for described user to be measured is formed secondary vector;
The inner product calculating described primary vector and described secondary vector obtains described behavior similarity.
5. method according to claim 4, it is characterized in that, describedly utilize after behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding calculates the behavior similarity of described user to be measured and described first kind user, also comprise:
Judge whether more described behavior similarity is greater than the 5th threshold value;
If described behavior similarity is greater than described 5th threshold value, then judge that described user to be measured is similar to the behavior of described first kind user;
Add up the ratio of user to be measured similar to the behavior of described first kind user in all users to be measured.
6. a user behavior Similarity measures device, is characterized in that, comprising:
Acquisition module, for gather first kind user multiple behavioural characteristics in behavioural characteristic value corresponding to each behavioural characteristic, and the behavioural characteristic value that in described multiple behavioural characteristic of Equations of The Second Kind user, each behavioural characteristic is corresponding;
Screening module, for carrying out screening acquisition goal behavior characteristic set according to multiple behavioural characteristic value corresponding to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user to described multiple behavioural characteristic;
MBM, for setting up the first generalized linear model according to described goal behavior characteristic set, utilize optimization method to calculate the first maximum likelihood estimation of described first generalized linear model, and obtain estimated parameter corresponding to described first maximum likelihood estimation;
Computing module, for the behavior similarity utilizing behavioural characteristic value that in described estimated parameter and described goal behavior characteristic set corresponding to user to be measured, each behavioural characteristic is corresponding to calculate described user to be measured and described first kind user.
7. user behavior Similarity measures device according to claim 6, it is characterized in that, described screening module is specifically for calculating according to multiple behavioural characteristic value corresponding to described first kind user and multiple behavioural characteristic values corresponding to described Equations of The Second Kind user coverage rate, chi amount and the information entropy that in described multiple behavioural characteristic, each behavioural characteristic is corresponding respectively; From described multiple behavioural characteristic, delete the behavioural characteristic that coverage rate is less than the behavioural characteristic of first threshold, chi amount is less than Second Threshold behavioural characteristic and information entropy be less than the 3rd threshold value obtain the first behavior characteristic set; Delete the degree of association any one behavioural characteristic be greater than in two behavioural characteristics of the 4th threshold value in described first behavior characteristic set and obtain the second behavioural characteristic set; The second generalized linear model is set up according to described second behavioural characteristic set, utilize optimization method to calculate the maximum likelihood estimation of described second generalized linear model, delete in described second behavioural characteristic set and do not have influential behavioural characteristic to obtain described goal behavior characteristic set to described second maximum likelihood estimation.
8. the user behavior Similarity measures device according to claim 6 or 7, it is characterized in that, described first kind user is the user meeting first object behavioural characteristic, described Equations of The Second Kind user is the user meeting the second goal behavior feature, and described first object behavioural characteristic has the identical behavioural characteristic of part with described second goal behavior feature.
9. user behavior Similarity measures device according to claim 8, it is characterized in that, behavioural characteristic value corresponding for each behavioural characteristic in described goal behavior characteristic set corresponding for described user to be measured, specifically for described estimated parameter is formed primary vector, is formed secondary vector by described computing module; The inner product calculating described primary vector and described secondary vector obtains described behavior similarity.
10. user behavior Similarity measures device according to claim 9, is characterized in that, also comprise:
Judge module, for judging whether more described behavior similarity is greater than the 5th threshold value; If described behavior similarity is greater than described 5th threshold value, then judge that described user to be measured is similar to the behavior of described first kind user;
Statistical module, for adding up the ratio of user to be measured similar to the behavior of described first kind user in all users to be measured.
CN201510618301.5A 2015-09-24 2015-09-24 User behavior similarity calculation method and device Active CN105260414B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510618301.5A CN105260414B (en) 2015-09-24 2015-09-24 User behavior similarity calculation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510618301.5A CN105260414B (en) 2015-09-24 2015-09-24 User behavior similarity calculation method and device

Publications (2)

Publication Number Publication Date
CN105260414A true CN105260414A (en) 2016-01-20
CN105260414B CN105260414B (en) 2018-10-19

Family

ID=55100106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510618301.5A Active CN105260414B (en) 2015-09-24 2015-09-24 User behavior similarity calculation method and device

Country Status (1)

Country Link
CN (1) CN105260414B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180044A (en) * 2016-03-09 2017-09-19 精硕科技(北京)股份有限公司 Recognize Internet user's sex method and system
CN107402984A (en) * 2017-07-11 2017-11-28 北京金堤科技有限公司 A kind of sorting technique and device based on theme
WO2018099177A1 (en) * 2016-11-29 2018-06-07 华为技术有限公司 Potential user expansion method and device
WO2018113370A1 (en) * 2016-12-21 2018-06-28 华为技术有限公司 Method, device, and system for increasing users
CN108600792A (en) * 2018-04-02 2018-09-28 武汉斗鱼网络科技有限公司 A kind of method for measuring similarity, device, equipment and storage medium
CN110348581A (en) * 2019-06-19 2019-10-18 平安科技(深圳)有限公司 User characteristics optimization method, device, medium and electronic equipment in user characteristics group
CN110491488A (en) * 2019-06-28 2019-11-22 上海明品医学数据科技有限公司 A kind of control method and system of determining medical data mark terminal
CN110557447A (en) * 2019-08-26 2019-12-10 腾讯科技(武汉)有限公司 user behavior identification method and device, storage medium and server
CN111210253A (en) * 2019-11-26 2020-05-29 恒大智慧科技有限公司 Method, device and storage medium for matching consumer with sales consultant

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010011211A1 (en) * 1998-06-03 2001-08-02 Sbc Technology Resources, Inc. A method for categorizing, describing and modeling types of system users
US7676467B1 (en) * 2005-04-14 2010-03-09 AudienceScience Inc. User segment population techniques
CN102521248A (en) * 2011-11-14 2012-06-27 北京亿赞普网络技术有限公司 Network user classification method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010011211A1 (en) * 1998-06-03 2001-08-02 Sbc Technology Resources, Inc. A method for categorizing, describing and modeling types of system users
US7676467B1 (en) * 2005-04-14 2010-03-09 AudienceScience Inc. User segment population techniques
CN102521248A (en) * 2011-11-14 2012-06-27 北京亿赞普网络技术有限公司 Network user classification method and device

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180044A (en) * 2016-03-09 2017-09-19 精硕科技(北京)股份有限公司 Recognize Internet user's sex method and system
WO2018099177A1 (en) * 2016-11-29 2018-06-07 华为技术有限公司 Potential user expansion method and device
WO2018113370A1 (en) * 2016-12-21 2018-06-28 华为技术有限公司 Method, device, and system for increasing users
CN108230001A (en) * 2016-12-21 2018-06-29 华为技术有限公司 The method, apparatus and system of extending user
CN107402984A (en) * 2017-07-11 2017-11-28 北京金堤科技有限公司 A kind of sorting technique and device based on theme
CN107402984B (en) * 2017-07-11 2018-11-20 北京金堤科技有限公司 A kind of classification method and device based on theme
CN108600792B (en) * 2018-04-02 2020-08-04 武汉斗鱼网络科技有限公司 Similarity measurement method, device, equipment and storage medium
CN108600792A (en) * 2018-04-02 2018-09-28 武汉斗鱼网络科技有限公司 A kind of method for measuring similarity, device, equipment and storage medium
WO2020252925A1 (en) * 2019-06-19 2020-12-24 平安科技(深圳)有限公司 Method and apparatus for searching user feature group for optimized user feature, electronic device, and computer nonvolatile readable storage medium
CN110348581A (en) * 2019-06-19 2019-10-18 平安科技(深圳)有限公司 User characteristics optimization method, device, medium and electronic equipment in user characteristics group
CN110348581B (en) * 2019-06-19 2023-08-18 平安科技(深圳)有限公司 User feature optimizing method, device, medium and electronic equipment in user feature group
CN110491488A (en) * 2019-06-28 2019-11-22 上海明品医学数据科技有限公司 A kind of control method and system of determining medical data mark terminal
CN110491488B (en) * 2019-06-28 2023-10-27 上海明品医学数据科技有限公司 Control method and system for determining medical data labeling terminal
CN110557447A (en) * 2019-08-26 2019-12-10 腾讯科技(武汉)有限公司 user behavior identification method and device, storage medium and server
CN110557447B (en) * 2019-08-26 2022-06-10 腾讯科技(武汉)有限公司 User behavior identification method and device, storage medium and server
CN111210253A (en) * 2019-11-26 2020-05-29 恒大智慧科技有限公司 Method, device and storage medium for matching consumer with sales consultant

Also Published As

Publication number Publication date
CN105260414B (en) 2018-10-19

Similar Documents

Publication Publication Date Title
CN105260414A (en) User behavior similarity computing method and device
CN103886068B (en) Data processing method and device for Internet user's behavioural analysis
CN107797894B (en) APP user behavior analysis method and device
CN107800591A (en) A kind of analysis method of unified daily record data
CN102760138B (en) Classification method and device for user network behaviors and search method and device for user network behaviors
CN103729383A (en) Push method and device for commodity information
CN103942712A (en) Product similarity based e-commerce recommendation system and method thereof
CN104537115A (en) Method and device for exploring user interests
CN102831193A (en) Topic detecting device and topic detecting method based on distributed multistage cluster
CN104504086B (en) The clustering method and device of Webpage
CN103366020A (en) System and method for analyzing user behaviors
CN104182506A (en) Log management method
CN104835057A (en) Method and device for obtaining consumption feature information of network user
CN103530796B (en) The active period detection method of application program and active period detection system
CN106651416A (en) Analyzing method and analyzing device of application popularization information
CN103729446A (en) Processing method and device for user operation data and server
CN106547793A (en) The method and apparatus for obtaining proxy server address
US10467255B2 (en) Methods and systems for analyzing reading logs and documents thereof
CN107292751B (en) Method and device for mining node importance in time sequence network
CN104361092A (en) Searching method and device
CN106339891A (en) Intelligent analysis method and system based on large data acquisition
CN103810162A (en) Method and system for recommending network information
CN104778237A (en) Individual recommending method and system based on key users
CN103761228A (en) Ranking threshold determination method and ranking threshold determination system for application program
CN105930507A (en) Method and apparatus for obtaining Web browsing interest of user

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100041 Beijing, Shijingshan District Xing Xing street, building 30, room 3, building 9, room 9014

Applicant after: Jing Shuo Technology (Beijing) Limited by Share Ltd

Address before: 100010 Beijing city Dongcheng District bamboo rod alley No. 1 9 1007

Applicant before: JINGSHUO CENTURY TECHNOLOGY (BEIJING) CO., LTD.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200717

Address after: 136a, 1f, d-1f, Dongsheng Science Park, No. 66, xixiaokou Road, Haidian District, Beijing

Patentee after: Enyike (Beijing) Data Technology Co.,Ltd.

Address before: 100041 Beijing, Shijingshan District Xing Xing street, building 30, room 3, building 9, room 9014

Patentee before: ADMASTER TECHNOLOGY (BEIJING) Co.,Ltd.