CN116362811A - Automatic advertisement delivery management system based on big data - Google Patents

Automatic advertisement delivery management system based on big data Download PDF

Info

Publication number
CN116362811A
CN116362811A CN202310327938.3A CN202310327938A CN116362811A CN 116362811 A CN116362811 A CN 116362811A CN 202310327938 A CN202310327938 A CN 202310327938A CN 116362811 A CN116362811 A CN 116362811A
Authority
CN
China
Prior art keywords
user
advertisement
behavior
interest
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310327938.3A
Other languages
Chinese (zh)
Inventor
段松涛
冯深皇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Chuangyuan Interactive Technology Co ltd
Original Assignee
Shenzhen Chuangyuan Interactive Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Chuangyuan Interactive Technology Co ltd filed Critical Shenzhen Chuangyuan Interactive Technology Co ltd
Priority to CN202310327938.3A priority Critical patent/CN116362811A/en
Publication of CN116362811A publication Critical patent/CN116362811A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an advertisement automatic throwing management system based on big data, which is characterized in that a data set is obtained by acquiring stored log data of a mobile terminal and preprocessing the log data, a click rate model is constructed according to advertisement click behavior logs of users, advertisement self characteristics and user self characteristics, advertisement recommendation information is obtained by matching advertisement owner resources with the users based on shopping behaviors of the users and the click rate model, user interest contents are analyzed and mined out from webpage browsing behaviors of the users, a user interest model is constructed, interests of the users are divided into short-term interests and long-term interests according to the characteristics and shopping behaviors of the users, updating of the user interest model is completed by adopting a sliding time window, advertisements are automatically thrown to the mobile terminal users according to the click rate model and the user interest model, advertisement throwing accuracy and accurate identification of the user interests are improved, and matching degree of the users and the advertisements is improved.

Description

Automatic advertisement delivery management system based on big data
Technical Field
The invention belongs to the technical field of advertisement delivery, and particularly relates to an advertisement automatic delivery management system based on big data.
Background
The advertisement is taken as a core product for carrying product information in modern times, the form of the advertisement also changes along with the transition of times, most of carriers are paper media, off-line entity billboards and the like before the Internet times, and then television advertisements gradually enter the field of vision of people, but the compulsory and time-lapse properties of the advertisement are also relative defects, and along with the development of the Internet times, the Internet advertisements also gradually replace the traditional advertisement industry to reach the new height of the advertisement industry, and become the main carriers of advertisements. However, the internet is used as a carrier for online computing advertisements, the fundamental difference between the internet and the traditional advertisements is that the problem that the advertisement generation value is difficult to track is solved, namely, the result is quantifiable, and based on innovation of technologies such as deep learning and the like, intelligent oriented recommendation effect can be provided, personalized recommendation is achieved, the defect of the traditional advertisements in the sense of sight is overcome, the middle process of the traditional advertisements can be optimized according to related data, and the click conversion benefit of an advertisement platform party is improved by improving the matching degree between the advertisements and users, so that the important problem to be solved is also urgent how to improve the matching degree between the advertisements and the users.
Disclosure of Invention
In view of the above, the invention provides an automatic advertisement delivery management system based on big data, which can improve accurate delivery, data processing of a lifting system and effective management of advertisement delivery, so as to solve the technical problems, and is realized by adopting the following technical scheme.
The invention provides an advertisement automatic delivery management system based on big data, which comprises:
the system comprises a data acquisition unit, a data processing unit and a data processing unit, wherein the data acquisition unit is used for acquiring stored log data of the mobile terminal and preprocessing the log data to obtain a data set, and the data set comprises advertisement click behavior logs of users, advertisement self-characteristics, user self-characteristics and shopping behaviors of the users;
the model building unit is used for building a click rate model according to the advertisement click behavior log of the user, the advertisement self-characteristics and the user self-characteristics, and matching the advertiser resources with the user based on the shopping behavior of the user and the click rate model to obtain advertisement recommendation information;
the interest mining unit is used for analyzing and mining the content of interest of the user from the webpage browsing behaviors of the user to construct a user interest model, dividing the interest of the user into short-term interest and long-term interest according to the characteristics of the user and shopping behaviors, and completing updating of the user interest model by adopting a sliding time window;
and the advertisement putting unit is used for automatically putting advertisements to the mobile terminal user according to the click rate model and the user interest model.
As a further improvement of the technical scheme, the specific implementation process of the advertisement putting unit comprises the following steps:
uploading advertisements by an advertiser through a subscription service obtained by a click rate model of the mobile terminal and a user interest model, and setting advertisement keywords, advertisement categories and delivery dimensions;
when a user accesses a webpage with an advertisement space, the advertisement putting platform sends information of the advertisement space to a mobile terminal meeting the condition;
and selecting an advertisement set of a corresponding category from the advertisement delivery platform according to the user interests corresponding to the mobile terminal, calculating the score of the advertisement by using an advertisement matching algorithm, and pushing the advertisement with the highest score to the user with the highest matching with the user interests.
As a further improvement of the technical scheme, the method for constructing the user interest model by analyzing and mining the content of interest of the user from the webpage browsing behavior of the user comprises the following steps:
the browsing behavior of the user is reflected through mouse movement, mouse clicking, roller scrolling and key pressing, the browsing behavior information used for the user is obtained by embedding a front-end script into a webpage, and the user browses at different paragraphs of the webpage to determine the interested degree of the user on the browsed content;
the calculation expression that the relative browsing speed of the web page of the user is the ratio of the speed of the user browsing the target web page to the average web page browsing speed is
Figure BDA0004153982190000021
Wherein S is i Representing the browsing speed of web page i +.>
Figure BDA0004153982190000031
Wherein size is i Representing the size, t, of the web page text i i Indicating the time the user stays on the page, and t 1 Represents the minimum residence time, t 2 Represents the longest residence time, when the residence time t i If the minimum residence time is less than the minimum residence time, determining that the user does not browse the page and filtering the page; when the residence time t i When the stay time is longer than the maximum stay time, determining that the user opens the page for a long time, and at the moment t i Taking the maximum residence time, S over The expression representing the average browsing speed is +.>
Figure BDA0004153982190000032
Wherein size is all Tine, representing the sum of the sizes of all pages viewed by the user all Representing the sum of the times the user browses all pages;
setting the reference weight of the effective browsing content at the effective browsing speed at the preset average browsing speed as epsilon, and at the browsing speed s t Weight expression of effective browsing content
Figure BDA0004153982190000033
As a further improvement of the above technical solution, the browsing speed of the user at different paragraphs of the web page to determine the interest degree of the user in the browsed content includes:
presetting a user to have n different browsing behaviors by epsilon 12 ...ε n Representing keyword w i Weights in an active browsed document generated by n different browsed behaviors, using f i1 ,f i2 ...f in Representing the word frequency of the feature keyword in n effective browsed documents, tf in vector space model i The expression of (2) is
Figure BDA0004153982190000034
Vector space model combined with user browsing behavior and new tf i The calculated expression of (2) is
Figure BDA0004153982190000035
Table of weightsThe expression is w' (w) i )=tf i ′*idf i Webpage d i And advertisement d j The similarity calculation expression of (2) is +.>
Figure BDA0004153982190000036
Wherein w' (w) ki ) Representing keyword w k At web page d i Weight, w' (w) kj ) Representing keyword w k In advertisement d j Is a weight of (a).
As a further improvement of the technical scheme, the method for constructing the user interest model by analyzing and mining the content of interest of the user from the webpage browsing behavior of the user comprises the following steps:
extracting historical behaviors of a user by using a user interest model to obtain abstract representation of interest states, wherein the interest states are hidden layer state output ht of each time step, and the clicking behaviors of target items in the model are triggered by final interests, and the hidden layer states h in the middle t Cannot be properly supervised, using the input b of the next time step t+1 Input b as tag, i.e. with next time step t+1 To supervise the hidden state h of the current time step of learning and training t The calculation expression of the loss function is
Figure BDA0004153982190000041
Where N represents the logarithm of the behavior sequence, t represents the number of time steps, i.e. the user behavior list length, h t Indicating the hidden state of the current time step,
Figure BDA0004153982190000042
representing the input of the next time step, σ represents the sigmoid function.
As a further improvement of the above technical solution, the classification of the interests of the user into short-term interests and long-term interests according to the characteristics of the user and shopping behavior includes:
an interest extraction layer that introduces an attention mechanism on the model, the score being derived from the output h of the interest extraction layer at each time step t The expression of the relevance size calculation between the advertisement and the current candidate advertisement is that
Figure BDA0004153982190000043
e a Representing an input vector of candidate advertisements and having a dimension n A ×1,n A An input dimension representing candidate advertisement features, W representing a parameter matrix and the dimension being n H ×n A ,n H Dimension indicating hidden state of interest extraction layer, h t Representing the hidden state of the interest extraction layer at a certain time step t, the dimension is n H ×1。
As a further improvement of the technical scheme, the updating of the user interest model is completed by adopting a sliding time window, and the method comprises the following steps:
the sliding time window is a document set used for analyzing the behavior characteristics of the user in the history of the user accessing the webpage, and m browsing documents d recently accessed by the user are selected 1 ,d 2 ,d 3 ...d m-1 ,d m As sliding time window, the access time is t in turn 1 ,t 2 ,t 3 ...t m-1 ,t m Browsing documents are sequentially arranged from left to right according to the increasing order of access time;
word vector v of each document is calculated by adopting a vector space model subjected to semantic expansion by a clustering algorithm i Calculating the similarity between two word vectors, and if the similarity is larger than a certain threshold value alpha, putting the two word vectors into the same word vector cluster, wherein the similarity is expressed as
Figure BDA0004153982190000044
Wherein d is c Representing the centroid of a word vector cluster, d c The expression of (2) is +.>
Figure BDA0004153982190000051
As a further improvement of the technical scheme, presetting C to represent a set of word vector clusters, and selecting the leftmost word vector d during initialization 1 Centroid d as a cluster QE1 Then c= { d QE1 D is taken from the sliding time window from left to right j ,1<j<m, and expanded by semanticsThe vector space model calculates the word vector v j For any d QEi E C, calculating the similarity S of the two by using a vector space model ij =sim(v j ,d QE ) Taking the maximum value of all calculation results, and marking the maximum value as S max =max(S ij ) If S max >Alpha, v j Add to d QEi For a cluster of centroids, recalculate the centroid of the cluster as d' QEi The method comprises the steps of carrying out a first treatment on the surface of the If S max <Alpha, v j Centroid d as a new cluster QEj ,C=C∪d QEj
Processing all browsed documents in the sliding window according to the above steps, wherein the sliding time window slides rightwards to enable a new browsed document d to be browsed m+1 Added thereto, d is carried out according to the above steps m+1 Adding the existing cluster or generating a new cluster, and sliding d at the leftmost end of the time window 1 Remove from it and recalculate d 1 The centroid of the cluster.
As a further improvement of the above technical solution, when the ratio of the number of word vectors in the word vector cluster formed by clustering to the total number of documents in the sliding time window exceeds a certain ratio ζ, it is determined that the cluster is valid, ζ represents a behavior factor;
taking the normalized value of the document access time of the effective cluster as the behavior freshness to be marked as W F Effective cluster D i The average access time E (t) of (a) is calculated as
Figure BDA0004153982190000052
By V E =(E 1 (t),E 2 (t)...E i (t)...E k (t)) represents a vector composed of the average access times of all valid clusters, k represents the data of valid clusters, and valid vector cluster D i Behavior freshness->
Figure BDA0004153982190000055
The expression of (2) is +.>
Figure BDA0004153982190000053
Measuring by the degree of dispersion of web page access corresponding to the vectorThe long-term interest of the user and the discrete degree of the behavior measure the score degree of the behavior characteristics reflected by the effective clusters;
taking the normalized value of the mean square error of the document access time of the effective cluster as the behavior dispersion, and marking the normalized value as W D Then the effective vector cluster D i The mean square error of the browsing time is expressed as
Figure BDA0004153982190000054
Where n represents the total number of documents in the active cluster, using V D =(D 1 (t),D 2 (t)...D i (t)...D k (t)) represents a vector composed of the mean square error of access times of all valid clusters, and k represents the number of valid clusters;
effective vector cluster D i Behavior dispersion of (c)
Figure BDA0004153982190000061
The expression used is +.>
Figure BDA0004153982190000062
Figure BDA0004153982190000063
The final weight of the effective cluster is determined by measuring the behavior dispersion of the long-term behavior characteristics of the user, measuring the behavior dispersion of the short-term behavior characteristics of the user and measuring the behavior freshness of the short-term behavior characteristics of the user>
Figure BDA0004153982190000064
The discrete factor is that the proportion of the behavior dispersion in the final weight is recorded as lambda, lambda E [0,1]Effective cluster D i The final weight of (2) is calculated as
Figure BDA0004153982190000065
Figure BDA0004153982190000066
When 0 is<λ<1 represents a user's long-term interest and short-term interest simultaneously, when λ=0Time means only concern for long-term interest.
As a further improvement of the technical scheme, the click rate model is constructed according to the advertisement click action log of the user, the advertisement self-characteristics and the user self-characteristics, and the method comprises the following steps:
data collection, wherein the data collection comprises offline data and online data, and is used for offline and online training of a model, and the data is derived from a related type behavior log;
and constructing the relevant characteristics of the user and the advertisement according to the data collection, and selecting the characteristics to form an input format required by the click rate model.
The invention provides an advertisement automatic throwing management system based on big data, which is characterized in that a data set is obtained by acquiring stored log data of a mobile terminal and preprocessing the log data, a click rate model is constructed according to advertisement click behavior logs of users, advertisement self characteristics and user self characteristics, advertisement recommendation information is obtained by matching advertisement owner resources with the users based on shopping behaviors of the users and the click rate model, user interest contents are analyzed and mined out from webpage browsing behaviors of the users, a user interest model is constructed, interests of the users are divided into short-term interests and long-term interests according to the characteristics and shopping behaviors of the users, updating of the user interest model is completed by adopting a sliding time window, advertisements are automatically thrown to the mobile terminal users according to the click rate model and the user interest model, advertisement throwing accuracy and accurate identification of the user interests are improved, and matching degree of the users and the advertisements is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a big data based advertisement automation delivery management system provided by the invention;
FIG. 2 is a flow chart of the automated advertisement delivery management method based on big data provided by the invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
Referring to fig. 1, the invention provides an advertisement automation delivery management system based on big data, comprising:
the system comprises a data acquisition unit, a data processing unit and a data processing unit, wherein the data acquisition unit is used for acquiring stored log data of the mobile terminal and preprocessing the log data to obtain a data set, and the data set comprises advertisement click behavior logs of users, advertisement self-characteristics, user self-characteristics and shopping behaviors of the users;
the model building unit is used for building a click rate model according to the advertisement click behavior log of the user, the advertisement self-characteristics and the user self-characteristics, and matching the advertiser resources with the user based on the shopping behavior of the user and the click rate model to obtain advertisement recommendation information;
the interest mining unit is used for analyzing and mining the content of interest of the user from the webpage browsing behaviors of the user to construct a user interest model, dividing the interest of the user into short-term interest and long-term interest according to the characteristics of the user and shopping behaviors, and completing updating of the user interest model by adopting a sliding time window;
and the advertisement putting unit is used for automatically putting advertisements to the mobile terminal user according to the click rate model and the user interest model.
In this embodiment, the specific implementation process of the advertisement delivery unit includes: uploading advertisements by an advertiser through a subscription service obtained by a click rate model of the mobile terminal and a user interest model, and setting advertisement keywords, advertisement categories and delivery dimensions; when a user accesses a webpage with an advertisement space, the advertisement putting platform sends information of the advertisement space to a mobile terminal meeting the condition; and selecting an advertisement set of a corresponding category from the advertisement delivery platform according to the user interests corresponding to the mobile terminal, calculating the score of the advertisement by using an advertisement matching algorithm, and pushing the advertisement with the highest score to the user with the highest matching with the user interests. Constructing a click rate model according to the advertisement click behavior log of the user, the advertisement self-characteristics and the user self-characteristics, comprising: data collection, wherein the data collection comprises offline data and online data, and is used for offline and online training of a model, and the data is derived from a related type behavior log; and constructing the relevant characteristics of the user and the advertisement according to the data collection, and selecting the characteristics to form an input format required by the click rate model.
The word vector clustering algorithm clusters the vector space model subjected to semantic expansion according to the keywords, and calculates the weight of each word vector cluster formed by clustering
Figure BDA0004153982190000081
Combining to calculate centroid d of each word vector cluster QEi The calculation expression for evaluating the matching score of the advertisement and the user is +.>
Figure BDA0004153982190000082
Wherein cluster centroid d of word vector QEi With advertisement alpha k Similarity sim (d) QEik ) And calculating by using the improved vector space model. The advertisement and each interest cluster centroid have a score, the final score with the largest value is selected as the final score of the advertisement, the advertisements to be put are ordered according to the score, and the advertisement with the highest score is put to the user. The behavior targeted advertisement takes the user as a starting point, analyzes and digs out the content interested by the user from the webpage browsing behavior of the user to construct a user interest model, and puts corresponding advertisements on the interests of the user, so that the users with different interests can see different advertisements when browsing the same webpage, and the accurate putting target of the advertisements can be realized. Classifying user interests into short-term interests and long-term interests through analysis of user behaviors and adoptingThe mechanism of sliding the time window completes the updating of the user interest model.
It should be appreciated that the importance of the browsed content corresponding to different browsed behaviors is obviously different, such as printing a page, saving a page, adding a bookmark can show that the user has a strong interest in the page, so that the weight of the corresponding effective browsed content is larger, while the editing behavior of the user in the editable area is usually larger than the weight of the link clicking behavior, such as that the keyword input by the user is more important than the keyword of clicking the link, because the content edited by the user can express the interest degree of the user in the content. The influence of the browsing behavior characteristics of the user is considered in calculating the keyword weight, so that the interest of the user can be expressed more accurately. The user's interest may be represented by two behaviors, one being a long-term behavior, corresponding to the long-term interest of the user, which does not change for a long period of time, the behavior representing that the user maintains a steady interest in something and does not change for a long period of time, the behavior representing that the user maintains a steady interest in something, the behavior representing that the user shows a strong interest in something for a short period of time, and a large number of web pages related to the thing are accessed in a short period of time, and the information is not paid attention to after the period of time. In order to mine long-term behavior features and short-term behavior features of a user, it is necessary to cluster browsed documents in a sliding time long window. Because the user has great randomness in accessing the web pages, the web pages cannot reflect the behavior characteristics of the user, and small clusters can be formed after clustering, and the noise clusters need to be removed by introducing a behavior factor. In measuring the importance of the short-term interests of the user, the closer the documents accessed in a centralized way are, the more the short-term interests of the user can be represented, and the short-term interests of the user can be measured through the webpage access time corresponding to the vector. In measuring the importance of the long-term interest of the user, the more discrete documents are distributed, the more the long-term interest of the user can be represented. Because content targeted advertisement delivery is simple in extracting webpage keywords and delivering advertisements according to the topics of the webpage, the interests of users are not fully mined from the perspective of webpage content, so that the effect is not ideal in advertisement delivery, advertisements which are not interesting to some users are often delivered, and therefore the accuracy and the user experience of advertisement delivery are improved.
Optionally, analyzing and mining the content of interest of the user from the web browsing behavior of the user to construct a user interest model includes:
the browsing behavior of the user is reflected through mouse movement, mouse clicking, roller scrolling and key pressing, the browsing behavior information used for the user is obtained by embedding a front-end script into a webpage, and the user browses at different paragraphs of the webpage to determine the interested degree of the user on the browsed content;
the calculation expression that the relative browsing speed of the web page of the user is the ratio of the speed of the user browsing the target web page to the average web page browsing speed is
Figure BDA0004153982190000101
Wherein S is i Representing the browsing speed of web page i +.>
Figure BDA0004153982190000102
Wherein size is i Representing the size, t, of the web page text i i Indicating the time the user stays on the page, and t 1 Represents the minimum residence time, t 2 Represents the longest residence time, when the residence time t i If the minimum residence time is less than the minimum residence time, determining that the user does not browse the page and filtering the page; when the residence time t i When the stay time is longer than the maximum stay time, determining that the user opens the page for a long time, and at the moment t i Taking the maximum residence time, S over The expression representing the average browsing speed is +.>
Figure BDA0004153982190000103
Wherein size is all Tine, representing the sum of the sizes of all pages viewed by the user all Representing the sum of the times the user browses all pages;
setting the reference weight of the effective browsing content at the effective browsing speed at the preset average browsing speed as epsilon, and at the browsing speed s t Weight table of effective browsing contentReach type
Figure BDA0004153982190000104
In this embodiment, the user browsing speed at different paragraphs of the web page to determine the interest degree of the user in the browsed content includes: presetting a user to have n different browsing behaviors by epsilon 12 ...ε n Representing keyword w i Weights in an active browsed document generated by n different browsed behaviors, using f i1 ,f i2 ...f in Representing the word frequency of the feature keyword in n effective browsed documents, tf in vector space model i The expression of (2) is
Figure BDA0004153982190000105
Vector space model combined with user browsing behavior and new tf i The calculated expression of (2) is
Figure BDA0004153982190000106
The expression of the weight is w' (w) i )=tf i ′*idf i Webpage d i And advertisement d j The similarity calculation expression of (2) is +.>
Figure BDA0004153982190000107
Wherein w' (w) ki ) Representing keyword w k At web page d i Weight, w' (w) kj ) Representing keyword w k In advertisement d j Is a weight of (a).
It should be noted that, analyzing and mining the content of interest of the user from the web browsing behavior of the user to construct the user interest model includes: extracting historical behaviors of a user by using a user interest model to obtain abstract representation of interest states, wherein the interest states are hidden layer state output h of each time step t The clicking behavior of the target item in the model is triggered by the final interest, and the middle hidden layer state h t Cannot be properly supervised, using the input b of the next time step t+1 Input b as tag, i.e. with next time step t+1 To supervise the hidden state h of the current time step of learning and training t The calculation expression of the loss function is
Figure BDA0004153982190000111
Where N represents the logarithm of the behavior sequence, t represents the number of time steps, i.e. the user behavior list length, h t Indicating the hidden state of the current time step,
Figure BDA0004153982190000112
representing the input of the next time step, σ represents the sigmoid function. The method has the advantages that the resources of the advertiser and the users are matched and recommended, the optimization is continuously carried out to achieve better matching degree, the conversion with limited budget to achieve maximum efficiency is achieved at the advertiser end, the platform side is used for accurately matching the advertisements with the users, accordingly, the income of the platform side is improved, the advertisement conversion rate of the advertiser is improved, the ordering problem of candidate resources, namely advertisement resources, of the users is solved, the optimization matching can be carried out on the basis, the conversion rate of the advertisements is improved, and the bidding and putting problems are balanced.
Optionally, classifying the interests of the user into short-term interests and long-term interests according to the characteristics of the user and shopping behavior includes:
an interest extraction layer that introduces an attention mechanism on the model, the score being derived from the output h of the interest extraction layer at each time step t The expression of the relevance size calculation between the advertisement and the current candidate advertisement is that
Figure BDA0004153982190000113
e a Representing an input vector of candidate advertisements and having a dimension n A ×1,n A An input dimension representing candidate advertisement features, W representing a parameter matrix and the dimension being n H ×n A ,n H Dimension indicating hidden state of interest extraction layer, h t Representing the hidden state of the interest extraction layer at a certain time step t, the dimension is n H ×1。
In this embodiment, updating the user interest model using a sliding time window includes: sliding a time window to access a web page for a userDocument set used for analyzing user behavior characteristics in history record, and m browsing documents d recently accessed by user are selected 1 ,d 2 ,d 3 ...d m-1 ,d m As sliding time window, the access time is t in turn 1 ,t 2 ,t 3 ...t m-1 ,t m Browsing documents are sequentially arranged from left to right according to the increasing order of access time; word vector v of each document is calculated by adopting a vector space model subjected to semantic expansion by a clustering algorithm i Calculating the similarity between two word vectors, and if the similarity is larger than a certain threshold value alpha, putting the two word vectors into the same word vector cluster, wherein the similarity is expressed as
Figure BDA0004153982190000121
Wherein d is c Representing the centroid of a word vector cluster, d c The expression of (2) is +.>
Figure BDA0004153982190000122
It should be noted that, preset C represents a set of word vector clusters, and select the leftmost word vector d during initialization 1 Centroid d as a cluster QE1 Then c= { d QE1 D is taken from the sliding time window from left to right j ,1<j<m, and calculating word vector v by using the semantically expanded vector space model j For any d QE i epsilon C, and calculating the similarity S of the two by using a vector space model ij =sim(v j ,d QE ) Taking the maximum value of all calculation results, and marking the maximum value as S max =max(S ij ) If S max >Alpha, v j Add to d QEi For a cluster of centroids, recalculate the centroid of the cluster as d' QEi The method comprises the steps of carrying out a first treatment on the surface of the If S max <Alpha, v j Centroid d as a new cluster QEj ,C=C∪d QEj The method comprises the steps of carrying out a first treatment on the surface of the Processing all browsed documents in the sliding window according to the above steps, wherein the sliding time window slides rightwards to enable a new browsed document d to be browsed m+1 Added thereto, d is carried out according to the above steps m+1 Adding existing clusters or generating new clusters, sliding time windowD at the leftmost end of the mouth 1 Remove from it and recalculate d 1 The centroid of the cluster. In order to mine long-term interests and short-term interests of users, a sliding time window mechanism is introduced into an algorithm to cluster a vector space model after semantic expansion according to keywords, behavioral freshness and behavioral dispersion are introduced to distinguish short-term interests and long-term interests, weights and centroids of effective clusters formed by clustering are calculated, the similarity of the centroids and text advertisements is calculated by using the vector space model, matching scores of the advertisements are calculated by combining the similarity and the weights of the effective clusters, the same advertisement has different scores with different effective clusters, the highest score is taken as a final score of the advertisement, the scores of all advertisements are ordered from high to low, and the advertisement with the highest score is put into the users.
Optionally, when the ratio of the number of word vectors in the word vector cluster formed by clustering to the total number of documents in the sliding time window exceeds a certain ratio ζ, ζ represents a behavior factor, determining that the cluster is valid;
taking the normalized value of the document access time of the effective cluster as the behavior freshness to be marked as W F Effective cluster D i The average access time E (t) of (a) is calculated as
Figure BDA0004153982190000131
By V E =(E 1 (t),E 2 (t)...E i (t)...E k (t)) represents a vector composed of the average access times of all valid clusters, k represents the data of valid clusters, and valid vector cluster D i Behavior freshness->
Figure BDA0004153982190000132
The expression of (2) is +.>
Figure BDA0004153982190000133
The long-term interest of the user is measured through the webpage access discrete degree corresponding to the vector, and the score degree of the behavior characteristic reflected by the effective cluster is measured through the behavior discrete degree;
taking the normalized value of the mean square error of the document access time of the effective cluster as the behavior dispersion, and marking the normalized value as W D Then the effective vector cluster D i The mean square error of the browsing time is expressed as
Figure BDA0004153982190000134
Where n represents the total number of documents in the active cluster, using V D =(D 1 (t),D 2 (t)...D i (t)...D k (t)) represents a vector composed of the mean square error of access times of all valid clusters, and k represents the number of valid clusters;
effective vector cluster D i Behavior dispersion of (c)
Figure BDA0004153982190000135
The expression used is +.>
Figure BDA0004153982190000136
Figure BDA0004153982190000137
The final weight of the effective cluster is determined by measuring the behavior dispersion of the long-term behavior characteristics of the user, measuring the behavior dispersion of the short-term behavior characteristics of the user and measuring the behavior freshness of the short-term behavior characteristics of the user>
Figure BDA0004153982190000138
The discrete factor is that the proportion of the behavior dispersion in the final weight is recorded as lambda, lambda E [0,1]Effective cluster D i The final weight of (2) is calculated as
Figure BDA0004153982190000139
Figure BDA00041539821900001310
When 0 is<λ<1 means that the user is concerned with both long-term and short-term interests, and λ=0 means that only long-term interests are concerned.
Referring to fig. 2, the invention further provides an advertisement automatic delivery management method based on big data, which specifically comprises the following steps:
s1: acquiring stored log data of a mobile terminal, and preprocessing the log data to obtain a data set, wherein the data set comprises advertisement click behavior logs of users, advertisement self-characteristics, user self-characteristics and shopping behaviors of the users;
s2: constructing a click rate model according to the advertisement click behavior log of the user, the advertisement self-characteristics and the user self-characteristics, and matching the resources of the advertiser with the user based on the shopping behavior of the user and the click rate model to obtain advertisement recommendation information;
s3: analyzing and mining interesting content of a user from webpage browsing behaviors of the user, constructing a user interest model, dividing the interests of the user into short-term interests and long-term interests according to the characteristics and shopping behaviors of the user, and completing updating of the user interest model by adopting a sliding time window;
s4: and automatically putting advertisements to the mobile terminal user according to the click rate model and the user interest model.
In the embodiment, the stored log data of the mobile terminal are obtained, the data set is obtained by preprocessing the log data, the click rate model is constructed according to the advertisement click behavior log, the advertisement self-characteristics and the user self-characteristics of the user, the resources of the advertiser and the user are matched based on the shopping behavior and the click rate model of the user to obtain advertisement recommendation information, the content interested by the user is analyzed and mined from the webpage browsing behavior of the user to construct the user interest model, the interests of the user are divided into short-term interests and long-term interests according to the characteristics and the shopping behavior of the user, the sliding time window is adopted to complete updating of the user interest model, the advertisement is automatically put to the mobile terminal user according to the click rate model and the user interest model, the advertisement putting accuracy is improved, the user interest is accurately identified, and the matching degree of the user and the advertisement is improved.
Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the present invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims (10)

1. An automated advertisement delivery management system based on big data, comprising:
the system comprises a data acquisition unit, a data processing unit and a data processing unit, wherein the data acquisition unit is used for acquiring stored log data of the mobile terminal and preprocessing the log data to obtain a data set, and the data set comprises advertisement click behavior logs of users, advertisement self-characteristics, user self-characteristics and shopping behaviors of the users;
the model building unit is used for building a click rate model according to the advertisement click behavior log of the user, the advertisement self-characteristics and the user self-characteristics, and matching the advertiser resources with the user based on the shopping behavior of the user and the click rate model to obtain advertisement recommendation information;
the interest mining unit is used for analyzing and mining the content of interest of the user from the webpage browsing behaviors of the user to construct a user interest model, dividing the interest of the user into short-term interest and long-term interest according to the characteristics of the user and shopping behaviors, and completing updating of the user interest model by adopting a sliding time window;
and the advertisement putting unit is used for automatically putting advertisements to the mobile terminal user according to the click rate model and the user interest model.
2. The automated big data based advertisement delivery management system of claim 1, wherein the specific implementation of the advertisement delivery unit comprises:
uploading advertisements by an advertiser through a subscription service obtained by a click rate model of the mobile terminal and a user interest model, and setting advertisement keywords, advertisement categories and delivery dimensions;
when a user accesses a webpage with an advertisement space, the advertisement putting platform sends information of the advertisement space to a mobile terminal meeting the condition;
and selecting an advertisement set of a corresponding category from the advertisement delivery platform according to the user interests corresponding to the mobile terminal, calculating the score of the advertisement by using an advertisement matching algorithm, and pushing the advertisement with the highest score to the user with the highest matching with the user interests.
3. The automated big data based advertising management system of claim 1, wherein building the user interest model from analyzing and mining content of interest to the user from the user's web browsing behavior comprises:
the browsing behavior of the user is reflected through mouse movement, mouse clicking, roller scrolling and key pressing, the browsing behavior information used for the user is obtained by embedding a front-end script into a webpage, and the user browses at different paragraphs of the webpage to determine the interested degree of the user on the browsed content;
the calculation expression that the relative browsing speed of the web page of the user is the ratio of the speed of the user browsing the target web page to the average web page browsing speed is
Figure FDA0004153982160000021
Wherein S is i Representing the browsing speed of web page i +.>
Figure FDA0004153982160000022
Wherein size is i Representing the size, t, of the web page text i i Indicating the time the user stays on the page, and t 1 Represents the minimum residence time, t 2 Represents the longest residence time, when the residence time t i Less than the minimum residence timeWhen the user does not browse the page, determining and filtering the page; when the residence time t i When the stay time is longer than the maximum stay time, determining that the user opens the page for a long time, and at the moment t i Taking the maximum residence time, S over The expression representing the average browsing speed is +.>
Figure FDA0004153982160000023
Wherein size is all Tine, representing the sum of the sizes of all pages viewed by the user all Representing the sum of the times the user browses all pages;
setting the reference weight of the effective browsing content at the effective browsing speed at the preset average browsing speed as epsilon, and at the browsing speed s t Weight expression of effective browsing content
Figure FDA0004153982160000024
4. The automated big data based advertising delivery management system of claim 3, wherein the user's speed of browsing at different paragraphs of the web page to determine the user's interest level in the browsed content comprises:
presetting a user to have n different browsing behaviors by epsilon 12 ...ε n Representing keyword w i Weights in an active browsed document generated by n different browsed behaviors, using f i1 ,f i2 ...f in Representing the word frequency of the feature keyword in n effective browsed documents, tf in vector space model i The expression of (2) is
Figure FDA0004153982160000027
Vector space model combined with user browsing behavior and new tf i The calculation expression of (2) is +.>
Figure FDA0004153982160000025
The expression of the weight is w' (w) i )=tf i ′*idf i Webpage d i And advertisement d j The similarity calculation expression of (2) is
Figure FDA0004153982160000026
Wherein w' (w) ki ) Representing keyword w k At web page d i Weight, w' (w) kj ) Representing keyword w k In advertisement d j Is a weight of (a).
5. The automated big data based advertising management system of claim 1, wherein building the user interest model from analyzing and mining content of interest to the user from the user's web browsing behavior comprises:
extracting historical behaviors of a user by using a user interest model to obtain abstract representation of interest states, wherein the interest states are hidden layer state output h of each time step t The clicking behavior of the target item in the model is triggered by the final interest, and the middle hidden layer state h t Cannot be properly supervised, using the input b of the next time step t+1 Input b as tag, i.e. with next time step t+1 To supervise the hidden state h of the current time step of learning and training t The calculation expression of the loss function is
Figure FDA0004153982160000031
Where N represents the logarithm of the behavior sequence, t represents the number of time steps, i.e. the user behavior list length, h t Indicating the hidden state of the current time step,
Figure FDA0004153982160000032
representing the input of the next time step, σ represents the sigmoid function.
6. The automated big data based advertising management system of claim 1, wherein classifying the interests of the user into short-term interests and long-term interests based on the user's own characteristics and shopping behavior comprises:
an interest extraction layer that introduces an attention mechanism on the model, the score being derived from the output h of the interest extraction layer at each time step t The expression of the relevance size calculation between the advertisement and the current candidate advertisement is that
Figure FDA0004153982160000033
e a Representing an input vector of candidate advertisements and having a dimension n A ×1,n A An input dimension representing candidate advertisement features, W representing a parameter matrix and the dimension being n H ×n A ,n H Dimension indicating hidden state of interest extraction layer, h t Representing the hidden state of the interest extraction layer at a certain time step t, the dimension is n H ×1。
7. The automated big data based advertising management system of claim 1, wherein updating the user interest model is accomplished using a sliding time window, comprising:
the sliding time window is a document set used for analyzing the behavior characteristics of the user in the history of the user accessing the webpage, and m browsing documents d recently accessed by the user are selected 1 ,d 2 ,d 3 ...d m-1 ,d m As sliding time window, the access time is t in turn 1 ,t 2 ,t 3 ...t m-1 ,t m Browsing documents are sequentially arranged from left to right according to the increasing order of access time;
word vector v of each document is calculated by adopting a vector space model subjected to semantic expansion by a clustering algorithm i Calculating the similarity between two word vectors, and if the similarity is larger than a certain threshold value alpha, putting the two word vectors into the same word vector cluster, wherein the similarity is expressed as
Figure FDA0004153982160000041
Wherein d is c Representing the centroid of a word vector cluster, d c The expression of (2) is +.>
Figure FDA0004153982160000042
8. The automated big data based advertisement delivery management system of claim 7, further comprising:
presetting C to represent a set of word vector clusters, and selecting the leftmost word vector d during initialization 1 Centroid d as a cluster QE1 Then c= { d QE1 D is taken from the sliding time window from left to right j The word vector v is calculated by using a vector space model after semantic expansion, wherein j is more than 1 and less than m j For any d QEi E C, calculating the similarity S of the two by using a vector space model ij =sim(v j ,d QE ) Taking the maximum value of all calculation results, and marking the maximum value as S max =max(S ij ) If S max > alpha, v j Add to d QEi For a cluster of centroids, recalculate the centroid of the cluster as d' QEi The method comprises the steps of carrying out a first treatment on the surface of the If S max < alpha, v j Centroid d as a new cluster QEj ,C=C∪d QEj
Processing all browsed documents in the sliding window according to the above steps, wherein the sliding time window slides rightwards to enable a new browsed document d to be browsed m+1 Added thereto, d is carried out according to the above steps m+1 Adding the existing cluster or generating a new cluster, and sliding d at the leftmost end of the time window 1 Remove from it and recalculate d 1 The centroid of the cluster.
9. The automated advertisement placement management system based on big data according to claim 7, wherein when a ratio of the number of word vectors in a cluster of word vectors formed by clustering to the total number of documents in the sliding time window exceeds a certain ratio ζ, ζ represents a behavior factor, the cluster is determined to be valid;
taking the normalized value of the document access time of the effective cluster as the behavior freshness to be marked as W F Effective cluster D i The average access time E (t) of (a) is calculated as
Figure FDA0004153982160000043
By V E =(E 1 (t),E 2 (t)...E i (t)...E k (t)) represents a vector composed of the average access times of all valid clusters, k represents the data of valid clusters, and valid vector cluster D i Behavior freshness->
Figure FDA0004153982160000057
The expression of (2) is +.>
Figure FDA0004153982160000051
The long-term interest of the user is measured through the webpage access discrete degree corresponding to the vector, and the score degree of the behavior characteristic reflected by the effective cluster is measured through the behavior discrete degree;
taking the normalized value of the mean square error of the document access time of the effective cluster as the behavior dispersion, and marking the normalized value as W D Then the effective vector cluster D i The mean square error of the browsing time is expressed as
Figure FDA0004153982160000052
Where n represents the total number of documents in the active cluster, using V D =(D 1 (t),D 2 (t)...D i (t)...D k (t)) represents a vector composed of the mean square error of access times of all valid clusters, and k represents the number of valid clusters;
effective vector cluster D i Behavior dispersion of (c)
Figure FDA0004153982160000058
The expression used is +.>
Figure FDA0004153982160000053
Figure FDA0004153982160000054
Behavior dispersion measuring method for measuring long-term behavior characteristics of user, behavior dispersion measuring method for measuring short-term behavior characteristics of user and new behavior measuring method for measuring short-term behavior characteristics of userFreshness jointly determines the final weight of the valid clusters
Figure FDA0004153982160000059
The discrete factor is that the proportion of the behavior dispersion in the final weight is recorded as lambda, lambda E [0,1]Effective cluster D i The final weight of (2) is calculated as
Figure FDA0004153982160000055
Figure FDA0004153982160000056
When 0 < lambda < 1, it means that the user is concerned with both long-term and short-term interests, and when lambda=0, it means that the user is concerned with only long-term interests.
10. The automated big data based advertisement delivery management system of claim 1, wherein constructing the click rate model from the user's advertisement click behavior log, the advertisement self-characteristics, and the user self-characteristics comprises:
data collection, wherein the data collection comprises offline data and online data, and is used for offline and online training of a model, and the data is derived from a related type behavior log;
and constructing the relevant characteristics of the user and the advertisement according to the data collection, and selecting the characteristics to form an input format required by the click rate model.
CN202310327938.3A 2023-03-24 2023-03-24 Automatic advertisement delivery management system based on big data Pending CN116362811A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310327938.3A CN116362811A (en) 2023-03-24 2023-03-24 Automatic advertisement delivery management system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310327938.3A CN116362811A (en) 2023-03-24 2023-03-24 Automatic advertisement delivery management system based on big data

Publications (1)

Publication Number Publication Date
CN116362811A true CN116362811A (en) 2023-06-30

Family

ID=86919269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310327938.3A Pending CN116362811A (en) 2023-03-24 2023-03-24 Automatic advertisement delivery management system based on big data

Country Status (1)

Country Link
CN (1) CN116362811A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843393A (en) * 2023-07-18 2023-10-03 北京吉欣科技有限公司 Intelligent advertisement management method and system
CN117575700A (en) * 2024-01-15 2024-02-20 太逗科技集团有限公司 Advertisement delivery system based on delivery effect monitoring
CN117670435A (en) * 2024-02-01 2024-03-08 威海双子星软件科技有限公司 Web application cross popularization system based on computer software and hardware integration

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843393A (en) * 2023-07-18 2023-10-03 北京吉欣科技有限公司 Intelligent advertisement management method and system
CN116843393B (en) * 2023-07-18 2024-04-19 成都红户里科技有限公司 Intelligent advertisement management method and system
CN117575700A (en) * 2024-01-15 2024-02-20 太逗科技集团有限公司 Advertisement delivery system based on delivery effect monitoring
CN117575700B (en) * 2024-01-15 2024-03-15 太逗科技集团有限公司 Advertisement delivery system based on delivery effect monitoring
CN117670435A (en) * 2024-02-01 2024-03-08 威海双子星软件科技有限公司 Web application cross popularization system based on computer software and hardware integration

Similar Documents

Publication Publication Date Title
CN109492157B (en) News recommendation method and theme characterization method based on RNN and attention mechanism
CN106599022B (en) User portrait forming method based on user access data
US11238211B2 (en) Automatic hyperlinking of documents
CN116362811A (en) Automatic advertisement delivery management system based on big data
WO2019218508A1 (en) Topic sentiment joint probability-based electronic commerce false comment recognition method
CN104933239A (en) Hybrid model based personalized position information recommendation system and realization method therefor
Xu et al. Web content mining
WO2013049529A1 (en) Method and apparatus for unsupervised learning of multi-resolution user profile from text analysis
Yang et al. A topic model for co-occurring normal documents and short texts
Lyu et al. Weighted multi-information constrained matrix factorization for personalized travel location recommendation based on geo-tagged photos
CN116703485B (en) Advertisement accurate marketing method and system based on big data
CN111160019A (en) Public opinion monitoring method, device and system
Chung et al. Categorization for grouping associative items using data mining in item-based collaborative filtering
Zhu et al. Real-time personalized twitter search based on semantic expansion and quality model
Baishya et al. SAFER: sentiment analysis-based fake review detection in e-commerce using deep learning
CN115878841A (en) Short video recommendation method and system based on improved bald eagle search algorithm
CN112966103B (en) Mixed attention mechanism text title matching method based on multi-task learning
Yi et al. Analysis of stock market public opinion based on web crawler and deep learning technologies including 1DCNN and LSTM
CN113222687A (en) Deep learning-based recommendation method and device
Ahmed et al. Word embedding based news classification by using CNN
Kae et al. Categorization of display ads using image and landing page features
Panchal et al. The social hashtag recommendation for image and video using deep learning approach
CN115510269A (en) Video recommendation method, device, equipment and storage medium
CN112989196B (en) Book recommendation method based on personalized recall algorithm LFM
Annam et al. Entropy based informative content density approach for efficient web content extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication