CN116167829A - Multidimensional and multi-granularity user behavior analysis method - Google Patents

Multidimensional and multi-granularity user behavior analysis method Download PDF

Info

Publication number
CN116167829A
CN116167829A CN202310461608.3A CN202310461608A CN116167829A CN 116167829 A CN116167829 A CN 116167829A CN 202310461608 A CN202310461608 A CN 202310461608A CN 116167829 A CN116167829 A CN 116167829A
Authority
CN
China
Prior art keywords
user
webpage
software
behavior
click
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310461608.3A
Other languages
Chinese (zh)
Other versions
CN116167829B (en
Inventor
王琨
刘滔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Weike Technology Group Co ltd
Original Assignee
Hunan Weike Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Weike Technology Group Co ltd filed Critical Hunan Weike Technology Group Co ltd
Priority to CN202310461608.3A priority Critical patent/CN116167829B/en
Publication of CN116167829A publication Critical patent/CN116167829A/en
Application granted granted Critical
Publication of CN116167829B publication Critical patent/CN116167829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Development Economics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of user behavior analysis, and discloses a multidimensional multi-granularity user behavior analysis method, which comprises the following steps: collecting online user behavior data, and constructing the collected online user behavior data into a user multidimensional behavior track; inputting the multidimensional behavior track of the user into a user group identification model, and identifying to obtain the group category of the user; constructing a multi-granularity user click intention model to obtain the intention degrees of different categories of users on different pages; and sequencing the intention degree of the users on different pages from high to low according to the category, and recommending the pages to the users in sequence. The invention constructs the multidimensional behavior track of the user by combining the click frequency characteristic and the time sequence characteristic of the user on the webpage and the software, further realizes the user group identification based on the user behavior, and obtains the intention degree of the user on different webpages according to the message recursion and the aggregation processing of the click time sequence matrix of the same category of the user webpage, thereby recommending the webpage.

Description

Multidimensional and multi-granularity user behavior analysis method
Technical Field
The invention relates to the technical field of user behavior analysis, in particular to a multidimensional and multi-granularity user behavior analysis method.
Background
With the gradual maturity of related industries such as intelligent equipment and 5G, the data scale of online behaviors of users is rapidly expanded, and the problems of personal information leakage, difficulty in identification of anonymous user identities and the like are increasingly highlighted. However, the traditional online user behavior analysis model is often constructed aiming at specific data, cannot be combined with multidimensional data to perform analysis processing, and needs to consume a large amount of manpower and material resources to extract features, and the limitations of strong subjectivity of the extracted features, large data noise and the like cause poor online shopping experience of the user, so that good commodity recommendation cannot be provided for the user. Aiming at the problem, the invention provides a multidimensional multi-granularity user behavior analysis method which is used for analyzing user behaviors by combining various types of online data and providing powerful technical support for commodity recommendation.
Disclosure of Invention
In view of this, the present invention provides a multidimensional and multi-granularity user behavior analysis method, which aims at: 1) Screening to obtain candidate webpage behavior track points and candidate software behavior track points by taking the click frequency and the total click frequency of different webpages or software as screening conditions, wherein the higher the total click frequency is, the higher the click frequency is, the probability of being judged as the track points is, the confidence level of the candidate behavior track is determined by calculating the probability that any two track points in the candidate track points are clicked simultaneously, the higher the confidence level is the stronger the relevance between any two track points in the candidate behavior track, otherwise, the accidental track points are represented, and further, the user multidimensional behavior track comprising the webpage behavior track of the user and the behavior track of the user software is constructed from the click frequency characteristics and the time characteristics of the webpages or the software; 2) Constructing a multi-granularity user click intention model, initializing user parameters and webpage parameters of the same category of users, constructing a message propagation system, performing embedded recursion propagation on the user parameters and the webpage parameters, further performing aggregation processing on a webpage click time sequence matrix of the users and coding representation of the click frequency, obtaining inner product calculation representation of the category of users on different webpages, namely, the time sequence coding representation of the category of users on different webpage click frequencies after multiple rounds of message transmission is the same as the angle of the webpage in a vector space, using the inner product calculation representation as the intention degree of the same category of users on different webpages, mapping the webpage to a commodity page, using the intention degree of the webpage as the intention degree of the mapped commodity page, and providing powerful technical support for commodity recommendation.
The invention provides a multidimensional multi-granularity user behavior analysis method, which comprises the following steps:
s1: collecting online user behavior data, and constructing the collected online user behavior data into a user multidimensional behavior track, wherein the online user behavior data comprises webpage click data and software click data;
s2: constructing and training to obtain a user group identification model, inputting a user multidimensional behavior track into the user group identification model, and identifying to obtain a group category of the user, wherein the user group identification model takes the user multidimensional behavior track as input and takes maximized user group distribution as a training objective function;
s3: constructing a multi-granularity user click intention model, wherein the multi-granularity click intention model takes a user multi-dimensional behavior track set of the same category as input and takes the intention degree of the category user on different pages as output;
s4: and according to the identified user group category, sequencing the intention degree of the users to different pages from high to low according to the category, and recommending the pages to the users in sequence.
As a further improvement of the present invention:
optionally, the step S1 of collecting online user behavior data includes:
The method comprises the steps of collecting online user behavior data, wherein the online user behavior data comprises webpage click data and software click data, and the collection flow of the online user behavior data is as follows:
constructing a webpage clicking statistical table, wherein the webpage clicking statistical table comprises
Figure SMS_1
The method comprises the steps of receiving a webpage click statistics table, acquiring click time sequence data of a user on the webpage in the webpage click statistics table, wherein the click time sequence data comprises common webpages and acquisition time sequence data of the user on the common webpages in the webpage click statistics table are acquired:
Figure SMS_2
wherein:
Figure SMS_3
time sequence data representing the clicking of the ith common webpage in the webpage clicking statistical table by the user, +.>
Figure SMS_4
Figure SMS_5
Indicating that the user is +.>
Figure SMS_6
Time period without clicking on the ith common webpage, +.>
Figure SMS_7
Indicating that the user is +.>
Figure SMS_8
Time period without clicking on the ith common webpage, +.>
Figure SMS_9
Representing the acquisition time range of the online user behavior data;
counting the total number of clicks of the user of each common webpage in the webpage clicking statistical table:
Figure SMS_10
wherein:
Figure SMS_11
representing acquisition time range of ith common webpage in webpage click statistical tableTotal number of clicks within the enclosure;
taking the total clicking times and clicking time sequence data of the user of each common webpage in the webpage clicking statistical table as webpage clicking data;
constructing a software click statistical table, wherein the software click statistical table comprises
Figure SMS_12
And acquiring click time sequence data of the user on the common software in the software click statistical table:
Figure SMS_13
Wherein:
Figure SMS_14
click time sequence data representing the j-th common software in the software click statistic table by a user,
Figure SMS_15
,/>
Figure SMS_16
indicating that the user is +.>
Figure SMS_17
Time period without clicking on the j-th common software, +.>
Figure SMS_18
Indicating that the user is +.>
Figure SMS_19
The j-th common software is not clicked in the time period;
statistics software the total number of clicks by the user for each common software in the statistics table:
Figure SMS_20
wherein:
Figure SMS_21
representing the total clicking times of the jth common software in the software clicking statistical table in the acquisition time range;
taking the total clicking times and the clicking time sequence data of the user of each common software in the software clicking statistical table as software clicking data;
and constructing the webpage clicking data and the software clicking data as online user behavior data. In the embodiment of the invention, if the user is in the process of
Figure SMS_22
And (3) operating and clicking the interface of the j-th common software or common webpage in the period, marking the user to click the j-th common software or common webpage in the period, wherein the total times of operating and clicking the interface of the common software or common webpage in the acquisition time range by the user is the total times of clicking by the user.
Optionally, in the step S1, the building the collected online user behavior data into a user multidimensional behavior track includes:
the collected online user behavior data is constructed into a user multidimensional behavior track, wherein the construction flow of the user multidimensional track is as follows:
S11: calculating click frequency of a user on any webpage and software in an acquisition time range:
Figure SMS_23
wherein:
Figure SMS_24
indicating the click frequency of the user on the ith common webpage, < ->
Figure SMS_25
Representing the click frequency of the user on the j-th common software;
s12: setting minimum click frequency values for web pages and software respectively
Figure SMS_26
Wherein->
Figure SMS_27
Minimum click frequency value representing a web page, +.>
Figure SMS_28
Representing a minimum click frequency value for the software;
s13: keep click frequency greater than
Figure SMS_29
Is a web page of (2); keep click frequency greater than +.>
Figure SMS_30
Is a software of (a);
s14: combining the web pages reserved in the step S13 in pairs, calculating the click frequency of the user clicking the web page combining results in the acquisition time range, and returning to the step S13 until a new web page combining result cannot be reserved;
combining the software reserved in the step S13 in pairs, calculating the click frequency of the software combination result clicked by the user in the acquisition time range, and returning to the step S13 until a new software combination result cannot be reserved;
s15: respectively constructing and obtaining a webpage clicking time sequence matrix A and a software clicking time sequence matrix B of a user:
Figure SMS_31
Figure SMS_32
calculating to obtain the probability of clicking the (i+1) th webpage simultaneously in the time period of clicking any (i) th webpage by the user, wherein the probability is used as the confidence coefficient between the (i) th webpage and the (i+1) th webpage;
Calculating to obtain the probability of clicking the j+1th software in the time period of clicking any j software by the user, wherein the probability is used as the confidence coefficient between the j software and the j+1th software;
s16: setting the confidence threshold as
Figure SMS_33
Calculating the confidence coefficient of any two webpages in the combination result of the webpages reserved in the step S14, taking the confidence coefficient mean value as the confidence coefficient of the combination result of the reserved webpages, and selecting the confidence coefficient to be greater than or equal to a confidence coefficient threshold value
Figure SMS_34
The webpage combination result of the (2) is used as a candidate webpage behavior track, and the track with the largest webpage number in the candidate webpage behavior track is selected as a user webpage behavior track;
calculating the confidence coefficient of any two pieces of software in the combination result of the software reserved in the step S14, taking the confidence coefficient mean value as the confidence coefficient of the combination result of the reserved software, and selecting the confidence coefficient to be greater than or equal to a confidence coefficient threshold value
Figure SMS_35
The software combination result of the software is used as a candidate software behavior track, and the track with the largest software number in the candidate software behavior tracks is selected as a user software behavior track; />
S17: filtering repeated webpages in the webpage behavior track of the user; filtering repeated software in the behavior track of the user software; obtaining the multidimensional behavior track of the user
Figure SMS_36
Wherein->
Figure SMS_37
Representing the filtered behavior trace of the user web page, +. >
Figure SMS_38
Representing the filtered behavior trace of the user software.
Optionally, the step S2 of constructing a user group identification model includes: constructing a user group identification model, wherein the input of the constructed user group identification model is a user multidimensional behavior track of a user to be identified, the output result is a group category of the user, and a group category identification formula based on the user group identification model is as follows:
Figure SMS_39
wherein:
Figure SMS_40
representing the probability that the multidimensional behavior track X of the user belongs to the class of the y-th population;
Figure SMS_41
s is represented as the behavior trace of the user webpage>
Figure SMS_42
Wherein the track point is the user webpage behavior track +.>
Figure SMS_43
Is a web page->
Figure SMS_44
Weights representing the trace points s +.>
Figure SMS_45
Representing the number of times the locus s appears in the class of the group y;
Figure SMS_46
representing k as user software behavior trace +.>
Figure SMS_47
Wherein the track point is a user software behavior track +.>
Figure SMS_48
Is one of the software->
Figure SMS_49
Weights representing track points k +.>
Figure SMS_50
Representing the number of times that the locus point k appears in the class of the group y;
Figure SMS_51
is the total number of group categories;
Figure SMS_52
and +.>
Figure SMS_53
And solving parameters for the to-be-trained.
Optionally, training the constructed user group identification model in the step S2 includes:
Training the constructed user group identification model, wherein the training process is as follows:
s21: acquiring online user behavior data of M users and extracting user multidimensional behavior tracks, wherein the acquired user multidimensional behavior tracks are not repeated, the acquisition flow of the user multidimensional behavior tracks is step S1, and in the embodiment of the invention, the acquired acquisition time ranges of the M user multidimensional behavior tracks are the same in length;
s22: marking the category of a user group for M user multidimensional behavior tracks, wherein in the embodiment of the invention, the category of the user group comprises game users, news users, movie users, variety users and the like;
s23: building a training objective function of a user group identification model:
Figure SMS_54
Figure SMS_55
wherein:
Figure SMS_56
representing the probability of occurrence of the class of the y-th population in the collected M user multidimensional behavior tracks;
r represents any locus point in the multidimensional behavior locus of M users,
Figure SMS_57
representing track point weights +.>
Figure SMS_58
Figure SMS_59
Representing the frequency of occurrence of the trace point r in the multi-dimensional behavioural traces of M users, +.>
Figure SMS_60
The probability of occurrence of the y-th population class and occurrence of the track point r is represented; />
Obtaining according to the obtained M user multidimensional behavior tracks
Figure SMS_61
And +.>
Figure SMS_62
And obtaining the weight of each track point by minimizing the training objective function.
Optionally, in the step S2, the multi-dimensional behavior track of the user is input to a user group identification model, and the group category of the user is identified, including:
inputting the user multidimensional behavior track X into a user group identification model to obtain probabilities that the user multidimensional behavior track X belongs to different group categories, selecting the group category with the highest probability as an output value of the user group identification model, and identifying to obtain the group category of the user
Figure SMS_63
Optionally, constructing a multi-granularity user click intention model in the step S3 includes:
constructing a multi-granularity user click intention model, wherein the multi-granularity click intention model takes a user multi-dimensional behavior track set of the same category as input and takes the intention degree of the category user on different pages as output;
the constructed multi-granularity user click intention model comprises an embedding layer, a propagation layer and an output layer, wherein the embedding layer is used for initializing user parameters and webpage parameters of the same category of users, the propagation layer is used for constructing a message propagation system, the user parameters and the webpage parameters of the same category are embedded and propagated, the output layer is used for evaluating the intention degree of the same category of users on different webpages, the webpages are mapped to commodity pages, and the intention degree of the webpages is used as the intention degree of the mapped commodity pages; in the embodiment of the invention, the content specifically described in the webpage is the same as the commodity displayed in the mapped commodity page;
Extracting webpage click time sequence matrixes of M users in the step S2, taking the webpage click time sequence matrixes of the users in the same category as a group, constructing each group to obtain a multi-granularity user click intention model, wherein the construction flow of the multi-granularity user click intention model corresponding to the y group is as follows:
s31: the embedded layer initializes the webpage clicking time sequence matrix of the user in the y group as a user parameter, and initializes the webpage ID names appearing in the user parameter as a webpage parameter, wherein the number of the webpage clicking time sequence matrix of the user in the y group is as follows
Figure SMS_64
The number of web page ID names is->
Figure SMS_65
S32: the embedded layer constructs initial codes of user parameters and webpage parameters:
wherein:
Figure SMS_66
represents the u-th webpage click timing matrix +.>
Figure SMS_67
And c-th webpage parameter->
Figure SMS_68
Is used for the initial encoding of (a),
Figure SMS_69
,/>
Figure SMS_70
s33: the propagation layer recursively represents the message based on the user parameters, the web page parameters, and the initial encoding of the embedded layer:
Figure SMS_71
Figure SMS_72
wherein:
Figure SMS_73
representing the webpage click time sequence matrix after D times of propagation>
Figure SMS_74
,/>
Figure SMS_75
Figure SMS_76
Representing the encoded representation after the D-pass;
Figure SMS_77
representing an activation function;
s34: the output layer performs aggregation treatment on the propagation result of the propagation layer:
Figure SMS_78
Figure SMS_79
wherein:
Figure SMS_80
an aggregation processing result representing the user parameter propagation result;
Figure SMS_81
representing the c-th webpage parameter->
Figure SMS_82
Is a result of the polymerization treatment;
S35: calculated to obtain
Figure SMS_83
And->
Figure SMS_84
Is expressed by the inner product of (2) and is calculated to obtain +.>
Figure SMS_85
And->
Figure SMS_86
Inner product representation of individual web page parameters, normalized to +.>
Figure SMS_87
Carrying out normalization processing on the inner product representation of each webpage parameter, wherein the normalization processing result is the intention degree of the corresponding webpage parameter; and mapping the webpage to the commodity page, and taking the intention degree of the webpage as the intention degree of the mapped commodity page.
Optionally, in the step S4, according to the identified user group category, the user' S intention degree of the category user for different pages is ordered from high to low, and pages are sequentially recommended to the user in order, including:
and (2) identifying the obtained user group category according to the step (S2), sequencing the intention degree of the category users on different pages according to the intention degree of the category users on the different pages from high to low based on the intention degree of the category users on different pages output by the multi-granularity user click intention model, and recommending commodity pages to the users in sequence.
In order to solve the above-described problems, the present invention provides an electronic apparatus including:
a memory storing at least one instruction;
the communication interface is used for realizing the communication of the electronic equipment; and a processor executing the instructions stored in the memory to implement the multidimensional multi-granularity user behavior analysis method.
In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having at least one instruction stored therein, the at least one instruction being executed by a processor in an electronic device to implement the above-mentioned multidimensional multi-granularity user behavior analysis method.
Compared with the prior art, the invention provides a multidimensional multi-granularity user behavior analysis method, which has the following advantages: firstly, the scheme provides a user multidimensional behavior track construction flow, and click frequency of a user on any webpage and software in an acquisition time range is calculated:
Figure SMS_88
wherein:
Figure SMS_89
indicating the click frequency of the user on the ith common webpage, < ->
Figure SMS_90
Representing the click frequency of the user on the j-th common software; setting minimum click frequency values for web pages and software, respectively>
Figure SMS_91
Wherein->
Figure SMS_92
Minimum click frequency value representing a web page, +.>
Figure SMS_93
Representing a minimum click frequency value for the software; keep click frequency greater than +.>
Figure SMS_94
Is a web page of (2); keep click frequency greater than +.>
Figure SMS_95
Is a software of (a); combining the reserved webpages in pairs, and calculating the click frequency of the user clicking the webpage combining results in the acquisition time range until a new webpage combining result cannot be reserved; combining the reserved software in pairs, and calculating the click frequency of the user clicking the software combination result in the acquisition time range until a new software combination result cannot be reserved; respectively constructing and obtaining a webpage clicking time sequence matrix A and a software clicking time sequence matrix B of a user:
Figure SMS_96
;/>
Figure SMS_97
Calculating to obtain the probability of clicking the (i+1) th webpage simultaneously in the time period of clicking any (i) th webpage by the user, wherein the probability is used as the confidence coefficient between the (i) th webpage and the (i+1) th webpage; calculating to obtain the probability of clicking the j+1th software in the time period of clicking any j software by the user, wherein the probability is used as the confidence coefficient between the j software and the j+1th software; setting the confidence threshold as
Figure SMS_98
The method comprises the steps of carrying out a first treatment on the surface of the Calculating the confidence coefficient of any two webpages in the reserved webpage combination result, taking the confidence coefficient mean value as the confidence coefficient of the reserved webpage combination result, and selecting the confidence coefficient to be greater than or equal to the confidence coefficient threshold value +.>
Figure SMS_99
The webpage combination result of the (2) is used as a candidate webpage behavior track, and the track with the largest webpage number in the candidate webpage behavior track is selected as a user webpage behavior track; calculating the confidence coefficient of any two pieces of software in the combination result of the reserved software, taking the confidence coefficient mean value as the confidence coefficient of the combination result of the reserved software, and selecting the confidence coefficient to be greater than or equal to a confidence coefficient threshold value +.>
Figure SMS_100
The software combination result of the software is used as a candidate software behavior track, and the track with the largest software number in the candidate software behavior tracks is selected as a user software behavior track; filtering repeated webpages in the webpage behavior track of the user; filtering repeated software in the behavior track of the user software; obtaining a multidimensional behavior track of a user >
Figure SMS_101
Wherein
Figure SMS_102
Representing the filtered behavior trace of the user web page, +.>
Figure SMS_103
Representing the filtered behavior trace of the user software. According to the scheme, the click frequency and the total click frequency of different webpages or software are taken as screening conditions, candidate webpage behavior track points and candidate software behavior track points are screened, wherein the higher the total click frequency is, the higher the click frequency is, the probability of being judged as the track points is higher, the confidence level of the candidate behavior track is determined by calculating the probability that any two track points in the candidate track points are clicked simultaneously, the higher the confidence level is the stronger the relevance between any two track points in the candidate behavior track, otherwise, the accidental track points are represented, and further the user multidimensional behavior track comprising the webpage behavior track of the user and the behavior track of the user software is constructed from the click frequency characteristics and the time sequence characteristics of the webpages or the software.
Meanwhile, the scheme provides a page recommendation method, wherein a multi-granularity user click intention model is built, the multi-granularity click intention model takes a user multi-dimensional behavior track set of the same category as input, and the intention degree of the category user on different pages as output; the embedded layer initializes the webpage clicking time sequence matrix of the user in the y group as a user parameter, and initializes the webpage ID names appearing in the user parameter as a webpage parameter, wherein the number of the webpage clicking time sequence matrix of the user in the y group is as follows
Figure SMS_104
Number of web page ID namesThe purpose is->
Figure SMS_105
The method comprises the steps of carrying out a first treatment on the surface of the The embedded layer constructs initial codes of user parameters and webpage parameters:
Figure SMS_106
wherein:
Figure SMS_107
represents the u-th webpage click timing matrix +.>
Figure SMS_108
And c-th webpage parameter->
Figure SMS_109
Is encoded initially,/->
Figure SMS_110
,/>
Figure SMS_111
The method comprises the steps of carrying out a first treatment on the surface of the The propagation layer recursively represents the message based on the user parameters, the web page parameters, and the initial encoding of the embedded layer:
Figure SMS_112
Figure SMS_113
wherein:
Figure SMS_114
representing the webpage click time sequence matrix after D times of propagation>
Figure SMS_115
,/>
Figure SMS_116
;/>
Figure SMS_117
Representing the encoded representation after the D-pass; />
Figure SMS_118
Representing an activation function; the output layer performs aggregation treatment on the propagation result of the propagation layer: />
Figure SMS_119
Figure SMS_120
Wherein:
Figure SMS_122
an aggregation processing result representing the user parameter propagation result; />
Figure SMS_125
Representing the c-th webpage parameter->
Figure SMS_126
Is a result of the polymerization treatment; calculated->
Figure SMS_123
And->
Figure SMS_124
Is expressed by the inner product of (2) and is calculated to obtain +.>
Figure SMS_127
And->
Figure SMS_128
Inner product representation of individual web page parameters, normalized to +.>
Figure SMS_121
Carrying out normalization processing on the inner product representation of each webpage parameter, wherein the normalization processing result is the intention degree of the corresponding webpage parameter; and mapping the webpage to the commodity page, and taking the intention degree of the webpage as the intention degree of the mapped commodity page. The scheme is realized by initializing the sameAnd constructing a message propagation system for user parameters and webpage parameters of the class users, performing embedded recursion propagation for the user parameters and the webpage parameters, further carrying out aggregation processing on a webpage click time sequence matrix of the users and coding representations of the click frequencies, and obtaining inner product calculation representations of the class users on different webpages, namely, the time sequence coding representations of the class users on the different webpage click frequencies after multiple rounds of message transmission represent angles of the same webpage in a vector space, and taking the inner product calculation representations as the intention degrees of the same class user on the different webpages.
Drawings
FIG. 1 is a flow chart of a method for analyzing multi-dimensional and multi-granularity user behavior according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an electronic device for implementing a multidimensional multi-granularity user behavior analysis method according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the application provides a multidimensional and multi-granularity user behavior analysis method. The execution subject of the multidimensional multi-granularity user behavior analysis method includes, but is not limited to, at least one of a server, a terminal and the like capable of being configured to execute the method provided by the embodiment of the application. In other words, the multidimensional multi-granularity user behavior analysis method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Example 1
S1: and acquiring online user behavior data, and constructing the acquired online user behavior data into a user multidimensional behavior track, wherein the online user behavior data comprises webpage click data and software click data.
And in the step S1, collecting online user behavior data, including:
the method comprises the steps of collecting online user behavior data, wherein the online user behavior data comprises webpage click data and software click data, and the collection flow of the online user behavior data is as follows:
constructing a webpage clicking statistical table, wherein the webpage clicking statistical table comprises
Figure SMS_129
The method comprises the steps of receiving a webpage click statistics table, acquiring click time sequence data of a user on the webpage in the webpage click statistics table, wherein the click time sequence data comprises common webpages and acquisition time sequence data of the user on the common webpages in the webpage click statistics table are acquired:
Figure SMS_130
;/>
wherein:
Figure SMS_131
time sequence data representing the clicking of the ith common webpage in the webpage clicking statistical table by the user, +.>
Figure SMS_132
,/>
Figure SMS_133
Indicating that the user is +.>
Figure SMS_134
Time period without clicking on the ith common webpage, +.>
Figure SMS_135
Indicating that the user is +.>
Figure SMS_136
Time period without clicking on the ith common webpage, +.>
Figure SMS_137
Representing the collection of online user behavior dataA span;
counting the total number of clicks of the user of each common webpage in the webpage clicking statistical table:
Figure SMS_138
wherein:
Figure SMS_139
representing the total clicking times of the ith common webpage in the webpage clicking statistical table in the acquisition time range;
Taking the total clicking times and clicking time sequence data of the user of each common webpage in the webpage clicking statistical table as webpage clicking data;
constructing a software click statistical table, wherein the software click statistical table comprises
Figure SMS_140
And acquiring click time sequence data of the user on the common software in the software click statistical table:
Figure SMS_141
wherein:
Figure SMS_142
time sequence data representing the j-th common software in the software click statistical table by the user, ++>
Figure SMS_143
,/>
Figure SMS_144
Indicating that the user is +.>
Figure SMS_145
Time period without clicking on the j-th common software, +.>
Figure SMS_146
Indicating that the user is +.>
Figure SMS_147
The j-th common software is not clicked in the time period;
statistics software the total number of clicks by the user for each common software in the statistics table:
Figure SMS_148
wherein:
Figure SMS_149
representing the total clicking times of the jth common software in the software clicking statistical table in the acquisition time range;
taking the total clicking times and the clicking time sequence data of the user of each common software in the software clicking statistical table as software clicking data;
and constructing the webpage clicking data and the software clicking data as online user behavior data. In the embodiment of the invention, if the user is in the process of
Figure SMS_150
And (3) operating and clicking the interface of the j-th common software or common webpage in the period, marking the user to click the j-th common software or common webpage in the period, wherein the total times of operating and clicking the interface of the common software or common webpage in the acquisition time range by the user is the total times of clicking by the user.
In the step S1, the collected online user behavior data is constructed into a user multidimensional behavior track, and the method comprises the following steps:
the collected online user behavior data is constructed into a user multidimensional behavior track, wherein the construction flow of the user multidimensional track is as follows:
s11: calculating click frequency of a user on any webpage and software in an acquisition time range:
Figure SMS_151
wherein:
Figure SMS_152
indicating the click frequency of the user on the ith common webpage, < ->
Figure SMS_153
Representing the click frequency of the user on the j-th common software; />
S12: setting minimum click frequency values for web pages and software respectively
Figure SMS_154
Wherein->
Figure SMS_155
Minimum click frequency value representing a web page, +.>
Figure SMS_156
Representing a minimum click frequency value for the software;
s13: keep click frequency greater than
Figure SMS_157
Is a web page of (2); keep click frequency greater than +.>
Figure SMS_158
Is a software of (a);
s14: combining the web pages reserved in the step S13 in pairs, calculating the click frequency of the user clicking the web page combining results in the acquisition time range, and returning to the step S13 until a new web page combining result cannot be reserved;
combining the software reserved in the step S13 in pairs, calculating the click frequency of the software combination result clicked by the user in the acquisition time range, and returning to the step S13 until a new software combination result cannot be reserved;
S15: respectively constructing and obtaining a webpage clicking time sequence matrix A and a software clicking time sequence matrix B of a user:
Figure SMS_159
Figure SMS_160
calculating to obtain the probability of clicking the (i+1) th webpage simultaneously in the time period of clicking any (i) th webpage by the user, wherein the probability is used as the confidence coefficient between the (i) th webpage and the (i+1) th webpage;
calculating to obtain the probability of clicking the j+1th software in the time period of clicking any j software by the user, wherein the probability is used as the confidence coefficient between the j software and the j+1th software;
s16: setting the confidence threshold as
Figure SMS_161
Calculating the confidence coefficient of any two webpages in the combination result of the webpages reserved in the step S14, taking the confidence coefficient mean value as the confidence coefficient of the combination result of the reserved webpages, and selecting the confidence coefficient to be greater than or equal to a confidence coefficient threshold value
Figure SMS_162
The webpage combination result of the (2) is used as a candidate webpage behavior track, and the track with the largest webpage number in the candidate webpage behavior track is selected as a user webpage behavior track;
calculating the confidence coefficient of any two pieces of software in the combination result of the software reserved in the step S14, taking the confidence coefficient mean value as the confidence coefficient of the combination result of the reserved software, and selecting the confidence coefficient to be greater than or equal to a confidence coefficient threshold value
Figure SMS_163
The software combination result of the software is used as a candidate software behavior track, and the track with the largest software number in the candidate software behavior tracks is selected as a user software behavior track;
S17: filtering repeated webpages in the webpage behavior track of the user; filtering repeated software in the behavior track of the user software; obtaining the multidimensional behavior track of the user
Figure SMS_164
Wherein->
Figure SMS_165
Representing the filtered behavior trace of the user web page,/>
Figure SMS_166
representing the filtered behavior trace of the user software.
S2: and constructing and training to obtain a user group identification model, inputting the user multidimensional behavior track into the user group identification model, and identifying to obtain the group category of the user, wherein the user group identification model takes the user multidimensional behavior track as input and takes the maximized user group distribution as a training objective function.
And S2, constructing and obtaining a user group identification model, wherein the step comprises the following steps: constructing a user group identification model, wherein the input of the constructed user group identification model is a user multidimensional behavior track of a user to be identified, the output result is a group category of the user, and a group category identification formula based on the user group identification model is as follows:
Figure SMS_167
wherein:
Figure SMS_168
representing the probability that the multidimensional behavior track X of the user belongs to the class of the y-th population;
Figure SMS_169
s is represented as the behavior trace of the user webpage>
Figure SMS_170
Wherein the track point is the user webpage behavior track +.>
Figure SMS_171
Is a web page- >
Figure SMS_172
Weights representing the trace points s +.>
Figure SMS_173
Representing a trace points number of occurrences of the group category at y;
Figure SMS_174
representing k as user software behavior trace +.>
Figure SMS_175
Wherein the track point is a user software behavior track +.>
Figure SMS_176
Is one of the software->
Figure SMS_177
Weights representing track points k +.>
Figure SMS_178
Representing the number of times that the locus point k appears in the class of the group y;
Figure SMS_179
is the total number of group categories;
Figure SMS_180
and +.>
Figure SMS_181
And solving parameters for the to-be-trained.
And in the step S2, training the constructed user group identification model, which comprises the following steps:
training the constructed user group identification model, wherein the training process is as follows:
s21: acquiring online user behavior data of M users and extracting user multidimensional behavior tracks, wherein the acquired user multidimensional behavior tracks are not repeated, and the acquisition flow of the user multidimensional behavior tracks is that in the embodiment of the invention, the acquired acquisition time ranges of the M user multidimensional behavior tracks are the same in length;
s22: marking the category of a user group for M user multidimensional behavior tracks, wherein in the embodiment of the invention, the category of the user group comprises game users, news users, movie users, variety users and the like;
s23: building a training objective function of a user group identification model:
Figure SMS_182
Figure SMS_183
Wherein:
Figure SMS_184
representing the probability of occurrence of the class of the y-th population in the collected M user multidimensional behavior tracks;
r represents any locus point in the multidimensional behavior locus of M users,
Figure SMS_185
representing track point weights +.>
Figure SMS_186
Figure SMS_187
Representing the frequency of occurrence of the trace point r in the multi-dimensional behavioural traces of M users, +.>
Figure SMS_188
The probability of occurrence of the y-th population class and occurrence of the track point r is represented;
obtaining according to the obtained M user multidimensional behavior tracks
Figure SMS_189
And +.>
Figure SMS_190
And obtaining the weight of each track point by minimizing the training objective function.
In the step S2, the multidimensional behavior track of the user is input into a user group identification model, and the group category of the user is identified and obtained, which comprises the following steps:
inputting the user multidimensional behavior track X into a user group identification model to obtain probabilities that the user multidimensional behavior track X belongs to different group categories, selecting the group category with the highest probability as an output value of the user group identification model, and identifying to obtain the group category of the user
Figure SMS_191
S3: and constructing a multi-granularity clicking intention model, wherein the multi-granularity clicking intention model takes a multi-dimensional behavior track set of the user in the same category as input and takes the intention degree of the user in the category to different pages as output.
And in the step S3, a multi-granularity user click intention model is constructed, which comprises the following steps:
Constructing a multi-granularity user click intention model, wherein the multi-granularity click intention model takes a user multi-dimensional behavior track set of the same category as input and takes the intention degree of the category user on different pages as output;
the constructed multi-granularity user click intention model comprises an embedding layer, a propagation layer and an output layer, wherein the embedding layer is used for initializing user parameters and webpage parameters of the same category of users, the propagation layer is used for constructing a message propagation system, the user parameters and the webpage parameters of the same category are embedded and propagated, the output layer is used for evaluating the intention degree of the same category of users on different webpages, the webpages are mapped to commodity pages, and the intention degree of the webpages is used as the intention degree of the mapped commodity pages; in the embodiment of the invention, the content specifically described in the webpage is the same as the commodity displayed in the mapped commodity page;
extracting webpage click time sequence matrixes of M users in the step S2, taking the webpage click time sequence matrixes of the users in the same category as a group, constructing each group to obtain a multi-granularity user click intention model, wherein the construction flow of the multi-granularity user click intention model corresponding to the y group is as follows:
s31: the embedded layer initializes a webpage click timing matrix of a y-th group of users as user parameters, and initializes a webpage ID name appearing in the user parameters as webpage parameters, wherein the y-th group of users is a user group of users, and the y-th group of users is a user group of users The number of the webpage clicking time sequence matrixes of the y groups of users is
Figure SMS_192
The number of web page ID names is->
Figure SMS_193
S32: the embedded layer constructs initial codes of user parameters and webpage parameters:
Figure SMS_194
wherein:
Figure SMS_195
represents the u-th webpage click timing matrix +.>
Figure SMS_196
And c-th webpage parameter->
Figure SMS_197
Is used for the initial encoding of (a),
Figure SMS_198
,/>
Figure SMS_199
s33: the propagation layer recursively represents the message based on the user parameters, the web page parameters, and the initial encoding of the embedded layer:
Figure SMS_200
Figure SMS_201
wherein:
Figure SMS_202
representing the webpage click time sequence matrix after D times of propagation>
Figure SMS_203
,/>
Figure SMS_204
Figure SMS_205
Representing the encoded representation after the D-pass;
Figure SMS_206
representing an activation function;
s34: the output layer performs aggregation treatment on the propagation result of the propagation layer:
Figure SMS_207
Figure SMS_208
;/>
wherein:
Figure SMS_209
an aggregation processing result representing the user parameter propagation result;
Figure SMS_210
representing the c-th webpage parameter->
Figure SMS_211
Is a result of the polymerization treatment;
s35: calculated to obtain
Figure SMS_212
And->
Figure SMS_213
Is expressed by the inner product of (2) and is calculated to obtain +.>
Figure SMS_214
And->
Figure SMS_215
Inner product representation of individual web page parameters, normalized to +.>
Figure SMS_216
Carrying out normalization processing on the inner product representation of each webpage parameter, wherein the normalization processing result is the intention degree of the corresponding webpage parameter; and mapping the webpage to the commodity page, and taking the intention degree of the webpage as the intention degree of the mapped commodity page.
S4: and according to the identified user group category, sequencing the intention degree of the users to different pages from high to low according to the category, and recommending the pages to the users in sequence.
And S4, according to the identified user group category, sequencing the intention degree of the user on different pages from high to low according to the category, and recommending the pages to the user in sequence, wherein the S4 comprises the following steps:
and (2) identifying the obtained user group category according to the step (S2), sequencing the intention degree of the category users on different pages according to the intention degree of the category users on the different pages from high to low based on the intention degree of the category users on different pages output by the multi-granularity user click intention model, and recommending commodity pages to the users in sequence.
Example 2
Fig. 2 is a schematic structural diagram of an electronic device for implementing a multidimensional and multi-granularity user behavior analysis method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication interface 13 and a bus, and may further comprise a computer program, such as program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of the program 12, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective parts of the entire electronic device using various interfaces and lines, executes or executes programs or modules (a program 12 for realizing user behavior analysis, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process data.
The communication interface 13 may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device 1 and other electronic devices and to enable connection communication between internal components of the electronic device.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 2 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 2 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
collecting online user behavior data, and constructing the collected online user behavior data into a user multidimensional behavior track;
constructing and training to obtain a user group identification model, inputting a user multidimensional behavior track into the user group identification model, and identifying to obtain a group category of a user;
constructing a multi-granularity user click intention model;
and according to the identified user group category, sequencing the intention degree of the users to different pages from high to low according to the category, and recommending the pages to the users in sequence.
Specifically, the specific implementation method of the above instruction by the processor 10 may refer to descriptions of related steps in the corresponding embodiments of fig. 1 to 2, which are not repeated herein.
It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (8)

1. A method of multidimensional, multi-granularity user behavior analysis, the method comprising:
S1: collecting online user behavior data, and constructing the collected online user behavior data into a user multidimensional behavior track, wherein the online user behavior data comprises webpage click data and software click data;
s2: constructing and training to obtain a user group identification model, inputting a user multidimensional behavior track into the user group identification model, and identifying to obtain a group category of the user, wherein the user group identification model takes the user multidimensional behavior track as input and takes maximized user group distribution as a training objective function;
s3: constructing a multi-granularity user click intention model, wherein the multi-granularity click intention model takes a user multi-dimensional behavior track set of the same category as input and takes the intention degree of the category user on different pages as output;
s4: and according to the identified user group category, sequencing the intention degree of the users to different pages from high to low according to the category, and recommending the pages to the users in sequence.
2. The method for analyzing multi-dimensional and multi-granularity user behavior according to claim 1, wherein the step S1 of collecting online user behavior data comprises the steps of:
the method comprises the steps of collecting online user behavior data, wherein the online user behavior data comprises webpage click data and software click data, and the collection flow of the online user behavior data is as follows:
Constructing a webpage clicking statistical table, wherein the webpage clicking statistical table comprises
Figure QLYQS_1
The method comprises the steps of receiving a webpage click statistics table, acquiring click time sequence data of a user on the webpage in the webpage click statistics table, wherein the click time sequence data comprises common webpages and acquisition time sequence data of the user on the common webpages in the webpage click statistics table are acquired:
Figure QLYQS_2
wherein:
Figure QLYQS_3
time sequence data representing the clicking of the ith common webpage in the webpage clicking statistical table by the user, +.>
Figure QLYQS_4
Figure QLYQS_5
Indicating that the user is +.>
Figure QLYQS_6
Time period without clicking on the ith common webpage, +.>
Figure QLYQS_7
Indicating that the user is +.>
Figure QLYQS_8
Time period without clicking on the ith common webpage, +.>
Figure QLYQS_9
Representing the acquisition time range of the online user behavior data;
counting the total number of clicks of the user of each common webpage in the webpage clicking statistical table:
Figure QLYQS_10
wherein:
Figure QLYQS_11
representation netThe total clicking times of the ith common webpage in the page clicking statistical table in the acquisition time range;
taking the total clicking times and clicking time sequence data of the user of each common webpage in the webpage clicking statistical table as webpage clicking data;
constructing a software click statistical table, wherein the software click statistical table comprises
Figure QLYQS_12
And acquiring click time sequence data of the user on the common software in the software click statistical table:
Figure QLYQS_13
wherein:
Figure QLYQS_14
time sequence data representing the j-th common software in the software click statistical table by the user, ++>
Figure QLYQS_15
Figure QLYQS_16
Indicating that the user is +.>
Figure QLYQS_17
Time period without clicking on the j-th common software, +. >
Figure QLYQS_18
Indicating that the user is +.>
Figure QLYQS_19
The j-th common software is not clicked in the time period;
statistics software the total number of clicks by the user for each common software in the statistics table:
Figure QLYQS_20
wherein:
Figure QLYQS_21
representing the total clicking times of the jth common software in the software clicking statistical table in the acquisition time range;
taking the total clicking times and the clicking time sequence data of the user of each common software in the software clicking statistical table as software clicking data;
and constructing the webpage clicking data and the software clicking data as online user behavior data.
3. The method for analyzing multi-dimensional and multi-granularity user behavior according to claim 2, wherein in the step S1, the collected online user behavior data is constructed as a multi-dimensional behavior track of the user, and the method comprises the following steps:
the collected online user behavior data is constructed into a user multidimensional behavior track, wherein the construction flow of the user multidimensional track is as follows:
s11: calculating click frequency of a user on any webpage and software in an acquisition time range:
Figure QLYQS_22
wherein:
Figure QLYQS_23
indicating the click frequency of the user on the ith common webpage, < ->
Figure QLYQS_24
Representing the click frequency of the user on the j-th common software;
s12: setting minimum click frequency values for web pages and software respectively
Figure QLYQS_25
Wherein- >
Figure QLYQS_26
Minimum click frequency value representing a web page, +.>
Figure QLYQS_27
Representing a minimum click frequency value for the software;
s13: keep click frequency greater than
Figure QLYQS_28
Is a web page of (2); keep click frequency greater than +.>
Figure QLYQS_29
Is a software of (a);
s14: combining the web pages reserved in the step S13 in pairs, calculating the click frequency of the user clicking the web page combining results in the acquisition time range, and returning to the step S13 until a new web page combining result cannot be reserved;
combining the software reserved in the step S13 in pairs, calculating the click frequency of the software combination result clicked by the user in the acquisition time range, and returning to the step S13 until a new software combination result cannot be reserved;
s15: respectively constructing and obtaining a webpage clicking time sequence matrix A and a software clicking time sequence matrix B of a user:
Figure QLYQS_30
Figure QLYQS_31
calculating to obtain the probability of clicking the (i+1) th webpage simultaneously in the time period of clicking any (i) th webpage by the user, wherein the probability is used as the confidence coefficient between the (i) th webpage and the (i+1) th webpage;
calculating to obtain the probability of clicking the j+1th software in the time period of clicking any j software by the user, wherein the probability is used as the confidence coefficient between the j software and the j+1th software;
s16: setting the confidence threshold as
Figure QLYQS_32
Calculating the confidence coefficient of any two webpages in the combination result of the webpages reserved in the step S14, taking the confidence coefficient mean value as the confidence coefficient of the combination result of the reserved webpages, and selecting the confidence coefficient to be greater than or equal to a confidence coefficient threshold value
Figure QLYQS_33
The webpage combination result of the (2) is used as a candidate webpage behavior track, and the track with the largest webpage number in the candidate webpage behavior track is selected as a user webpage behavior track;
calculating the confidence coefficient of any two pieces of software in the combination result of the software reserved in the step S14, taking the confidence coefficient mean value as the confidence coefficient of the combination result of the reserved software, and selecting the confidence coefficient to be greater than or equal to a confidence coefficient threshold value
Figure QLYQS_34
The software combination result of the software is used as a candidate software behavior track, and the track with the largest software number in the candidate software behavior tracks is selected as a user software behavior track;
s17: filtering repeated webpages in the webpage behavior track of the user; filtering repeated software in the behavior track of the user software; obtaining the multidimensional behavior track of the user
Figure QLYQS_35
Wherein->
Figure QLYQS_36
Representing the filtered behavior trace of the user web page, +.>
Figure QLYQS_37
Representing the filtered behavior trace of the user software.
4. The method for analyzing multi-dimensional and multi-granularity user behavior according to claim 1, wherein the step S2 of constructing and obtaining the user group identification model comprises the steps of:
constructing a user group identification model, wherein the input of the constructed user group identification model is a user multidimensional behavior track of a user to be identified, the output result is a group category of the user, and a group category identification formula based on the user group identification model is as follows:
Figure QLYQS_38
Wherein:
Figure QLYQS_39
representing the probability that the multidimensional behavior track X of the user belongs to the class of the y-th population;
Figure QLYQS_40
s is represented as the behavior trace of the user webpage>
Figure QLYQS_41
Wherein the track point is the user webpage behavior track +.>
Figure QLYQS_42
Is a web page->
Figure QLYQS_43
Weights representing the trace points s +.>
Figure QLYQS_44
Representing the number of times the locus s appears in the class of the group y;
Figure QLYQS_45
representing k as user software behavior trace +.>
Figure QLYQS_46
Wherein the track point is a user software behavior track +.>
Figure QLYQS_47
Is one of the software->
Figure QLYQS_48
Weights representing track points k +.>
Figure QLYQS_49
Representing the number of times that the locus point k appears in the class of the group y;
Figure QLYQS_50
is the total number of group categories;
Figure QLYQS_51
and +.>
Figure QLYQS_52
And solving parameters for the to-be-trained.
5. The method for multidimensional, multi-granularity user behavior analysis according to claim 4, wherein the training of the constructed user group identification model in step S2 comprises:
training the constructed user group identification model, wherein the training process is as follows:
s21: acquiring online user behavior data of M users and extracting user multidimensional behavior tracks, wherein the acquired user multidimensional behavior tracks are not repeated, and the acquisition flow of the user multidimensional behavior tracks is step S1;
S22: carrying out user group category marking on M user multidimensional behavior tracks;
s23: building a training objective function of a user group identification model:
Figure QLYQS_53
Figure QLYQS_54
wherein:
Figure QLYQS_55
representing the probability of occurrence of the class of the y-th population in the collected M user multidimensional behavior tracks;
r represents any locus point in the multidimensional behavior locus of M users,
Figure QLYQS_56
representing track point weights +.>
Figure QLYQS_57
Figure QLYQS_58
Representing the frequency of occurrence of the trace point r in the multi-dimensional behavioural traces of M users, +.>
Figure QLYQS_59
The probability of occurrence of the y-th population class and occurrence of the track point r is represented;
obtaining according to the obtained M user multidimensional behavior tracks
Figure QLYQS_60
And +.>
Figure QLYQS_61
And obtaining the weight of each track point by minimizing the training objective function.
6. The method for analyzing multi-dimensional and multi-granularity user behavior according to claim 5, wherein in step S2, the multi-dimensional user behavior track is input into a user group identification model, and the group category of the user is identified, comprising:
inputting the user multidimensional behavior track X into a user group identification model to obtain probabilities that the user multidimensional behavior track X belongs to different group categories, selecting the group category with the highest probability as an output value of the user group identification model, and identifying to obtain the group category of the user
Figure QLYQS_62
7. The method for analyzing multi-dimensional and multi-granularity user behavior according to claim 1, wherein the constructing a multi-granularity user click intention model in the step S3 comprises:
constructing a multi-granularity user click intention model, wherein the multi-granularity click intention model takes a user multi-dimensional behavior track set of the same category as input and takes the intention degree of the category user on different pages as output;
the constructed multi-granularity user click intention model comprises an embedding layer, a propagation layer and an output layer, wherein the embedding layer is used for initializing user parameters and webpage parameters of the same category of users, the propagation layer is used for constructing a message propagation system, the user parameters and the webpage parameters of the same category are embedded and propagated, the output layer is used for evaluating the intention degree of the same category of users on different webpages, the webpages are mapped to commodity pages, and the intention degree of the webpages is used as the intention degree of the mapped commodity pages;
extracting webpage click time sequence matrixes of M users in the step S2, taking the webpage click time sequence matrixes of the users in the same category as a group, constructing each group to obtain a multi-granularity user click intention model, wherein the construction flow of the multi-granularity user click intention model corresponding to the y group is as follows:
S31: the embedded layer initializes the webpage clicking time sequence matrix of the user in the y group as a user parameter, and initializes the webpage ID names appearing in the user parameter as a webpage parameter, wherein the number of the webpage clicking time sequence matrix of the user in the y group is as follows
Figure QLYQS_63
The number of web page ID names is->
Figure QLYQS_64
S32: the embedded layer constructs initial codes of user parameters and webpage parameters:
Figure QLYQS_65
wherein:
Figure QLYQS_66
represents the u-th webpage click timing matrix +.>
Figure QLYQS_67
And c-th webpage parameter->
Figure QLYQS_68
Is used for the initial encoding of (a),
Figure QLYQS_69
,/>
Figure QLYQS_70
s33: the propagation layer recursively represents the message based on the user parameters, the web page parameters, and the initial encoding of the embedded layer:
Figure QLYQS_71
Figure QLYQS_72
wherein:
Figure QLYQS_73
representation ofWebpage click time sequence matrix after D times of propagation>
Figure QLYQS_74
,/>
Figure QLYQS_75
Figure QLYQS_76
Representing the encoded representation after the D-pass;
Figure QLYQS_77
representing an activation function;
s34: the output layer performs aggregation treatment on the propagation result of the propagation layer:
Figure QLYQS_78
Figure QLYQS_79
wherein:
Figure QLYQS_80
an aggregation processing result representing the user parameter propagation result;
Figure QLYQS_81
representing the c-th webpage parameter->
Figure QLYQS_82
Is a result of the polymerization treatment;
s35: calculated to obtain
Figure QLYQS_83
And->
Figure QLYQS_84
Is expressed by the inner product of (2) and is calculated to obtain +.>
Figure QLYQS_85
And->
Figure QLYQS_86
Inner product representation of individual web page parameters, normalized to +.>
Figure QLYQS_87
Carrying out normalization processing on the inner product representation of each webpage parameter, wherein the normalization processing result is the intention degree of the corresponding webpage parameter; and mapping the webpage to the commodity page, and taking the intention degree of the webpage as the intention degree of the mapped commodity page.
8. The method for analyzing multi-dimensional and multi-granularity user behavior according to claim 7, wherein in the step S4, according to the identified user group category, the user' S intention degree of different pages is ordered from high to low, and pages are sequentially recommended to the user, comprising:
and (2) identifying the obtained user group category according to the step (S2), sequencing the intention degree of the category users on different pages according to the intention degree of the category users on the different pages from high to low based on the intention degree of the category users on different pages output by the multi-granularity user click intention model, and recommending commodity pages to the users in sequence.
CN202310461608.3A 2023-04-26 2023-04-26 Multidimensional and multi-granularity user behavior analysis method Active CN116167829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310461608.3A CN116167829B (en) 2023-04-26 2023-04-26 Multidimensional and multi-granularity user behavior analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310461608.3A CN116167829B (en) 2023-04-26 2023-04-26 Multidimensional and multi-granularity user behavior analysis method

Publications (2)

Publication Number Publication Date
CN116167829A true CN116167829A (en) 2023-05-26
CN116167829B CN116167829B (en) 2023-08-29

Family

ID=86416810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310461608.3A Active CN116167829B (en) 2023-04-26 2023-04-26 Multidimensional and multi-granularity user behavior analysis method

Country Status (1)

Country Link
CN (1) CN116167829B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262661A (en) * 2011-07-18 2011-11-30 南京大学 Web page access forecasting method based on k-order hybrid Markov model
CN104462156A (en) * 2013-09-25 2015-03-25 阿里巴巴集团控股有限公司 Feature extraction and individuation recommendation method and system based on user behaviors
CN106383895A (en) * 2016-09-27 2017-02-08 北京金山安全软件有限公司 Information recommendation method and device and terminal equipment
US20190362408A1 (en) * 2018-05-25 2019-11-28 Target Brands, Inc. Personalized recommendations for unidentified users based on web browsing context
CN110825956A (en) * 2019-09-17 2020-02-21 中国平安人寿保险股份有限公司 Information flow recommendation method and device, computer equipment and storage medium
CN111294620A (en) * 2020-01-22 2020-06-16 北京达佳互联信息技术有限公司 Video recommendation method and device
WO2021139638A1 (en) * 2020-01-06 2021-07-15 阿里巴巴集团控股有限公司 Method and system for processing behavioral data, storage medium, and processor
CN113886204A (en) * 2021-09-29 2022-01-04 平安普惠企业管理有限公司 User behavior data collection method and device, electronic equipment and readable storage medium
CN114266625A (en) * 2021-12-21 2022-04-01 中国平安财产保险股份有限公司 Recommendation method, device and equipment based on new user and storage medium
CN114637917A (en) * 2022-03-28 2022-06-17 中国银行股份有限公司 Information head bar recommendation method and device based on artificial intelligence
CN115098789A (en) * 2022-08-05 2022-09-23 湖南工商大学 Neural network-based multi-dimensional interest fusion recommendation method and device and related equipment
US20220405641A1 (en) * 2019-10-31 2022-12-22 Bigo Technology Pte. Ltd. Method for recommending information, recommendation server, and storage medium
US20230009814A1 (en) * 2020-08-28 2023-01-12 Tencent Technology (Shenzhen) Company Limited Method for training information recommendation model and related apparatus

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262661A (en) * 2011-07-18 2011-11-30 南京大学 Web page access forecasting method based on k-order hybrid Markov model
CN104462156A (en) * 2013-09-25 2015-03-25 阿里巴巴集团控股有限公司 Feature extraction and individuation recommendation method and system based on user behaviors
CN106383895A (en) * 2016-09-27 2017-02-08 北京金山安全软件有限公司 Information recommendation method and device and terminal equipment
US20190362408A1 (en) * 2018-05-25 2019-11-28 Target Brands, Inc. Personalized recommendations for unidentified users based on web browsing context
CN110825956A (en) * 2019-09-17 2020-02-21 中国平安人寿保险股份有限公司 Information flow recommendation method and device, computer equipment and storage medium
US20220405641A1 (en) * 2019-10-31 2022-12-22 Bigo Technology Pte. Ltd. Method for recommending information, recommendation server, and storage medium
WO2021139638A1 (en) * 2020-01-06 2021-07-15 阿里巴巴集团控股有限公司 Method and system for processing behavioral data, storage medium, and processor
CN111294620A (en) * 2020-01-22 2020-06-16 北京达佳互联信息技术有限公司 Video recommendation method and device
US20230009814A1 (en) * 2020-08-28 2023-01-12 Tencent Technology (Shenzhen) Company Limited Method for training information recommendation model and related apparatus
CN113886204A (en) * 2021-09-29 2022-01-04 平安普惠企业管理有限公司 User behavior data collection method and device, electronic equipment and readable storage medium
CN114266625A (en) * 2021-12-21 2022-04-01 中国平安财产保险股份有限公司 Recommendation method, device and equipment based on new user and storage medium
CN114637917A (en) * 2022-03-28 2022-06-17 中国银行股份有限公司 Information head bar recommendation method and device based on artificial intelligence
CN115098789A (en) * 2022-08-05 2022-09-23 湖南工商大学 Neural network-based multi-dimensional interest fusion recommendation method and device and related equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QIWEI CHEN: "End-to-End user behavior retrieval in click-through rate prediction model", 《COMPUTING MACHINERY》, pages 1 - 11 *
YUNPENG XIAO 等: "A click-through rate model of e-commerce based on user interest and temporal behavior", 《EXPERT SYSTEMS WITH APPLICATIONS》, pages 1 - 13 *
陈冬林: "基于客户web时空行为轨迹的兴趣点预测方法", 《科技导报》, pages 74 - 79 *

Also Published As

Publication number Publication date
CN116167829B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN109271512B (en) Emotion analysis method, device and storage medium for public opinion comment information
CN107818344A (en) The method and system that user behavior is classified and predicted
CN106874253A (en) Recognize the method and device of sensitive information
CN112733023A (en) Information pushing method and device, electronic equipment and computer readable storage medium
CN114398557B (en) Information recommendation method and device based on double images, electronic equipment and storage medium
CN112883730B (en) Similar text matching method and device, electronic equipment and storage medium
CN115391669A (en) Intelligent recommendation method and device and electronic equipment
CN111652282B (en) Big data-based user preference analysis method and device and electronic equipment
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN112328657A (en) Feature derivation method, feature derivation device, computer equipment and medium
CN113886708A (en) Product recommendation method, device, equipment and storage medium based on user information
CN113343306B (en) Differential privacy-based data query method, device, equipment and storage medium
CN113505273B (en) Data sorting method, device, equipment and medium based on repeated data screening
CN108875050B (en) Text-oriented digital evidence-obtaining analysis method and device and computer readable medium
CN110968802A (en) User characteristic analysis method, analysis device and readable storage medium
CN110929169A (en) Position recommendation method based on improved Canopy clustering collaborative filtering algorithm
CN113515703A (en) Information recommendation method and device, electronic equipment and readable storage medium
CN110674020B (en) APP intelligent recommendation method and device and computer readable storage medium
CN111950623A (en) Data stability monitoring method and device, computer equipment and medium
CN111667018A (en) Object clustering method and device, computer readable medium and electronic equipment
CN116167829B (en) Multidimensional and multi-granularity user behavior analysis method
CN114841165B (en) User data analysis and display method and device, electronic equipment and storage medium
CN116434955A (en) Staff health state evaluation method and device
CN113918577B (en) Data table identification method and device, electronic equipment and storage medium
CN116168403A (en) Medical data classification model training method, classification method, device and related medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant