CN115619457A - Advertisement putting method based on user browsing habit data analysis - Google Patents

Advertisement putting method based on user browsing habit data analysis Download PDF

Info

Publication number
CN115619457A
CN115619457A CN202211463099.XA CN202211463099A CN115619457A CN 115619457 A CN115619457 A CN 115619457A CN 202211463099 A CN202211463099 A CN 202211463099A CN 115619457 A CN115619457 A CN 115619457A
Authority
CN
China
Prior art keywords
commodity
user
candidate
idf
commodities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211463099.XA
Other languages
Chinese (zh)
Other versions
CN115619457B (en
Inventor
刘晓东
嵇晨
於雯雯
冯思雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingfu Technology Co ltd
Information Technology Nanjing Co ltd
Original Assignee
Jingfu Technology Co ltd
Information Technology Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingfu Technology Co ltd, Information Technology Nanjing Co ltd filed Critical Jingfu Technology Co ltd
Priority to CN202211463099.XA priority Critical patent/CN115619457B/en
Publication of CN115619457A publication Critical patent/CN115619457A/en
Application granted granted Critical
Publication of CN115619457B publication Critical patent/CN115619457B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • G06Q30/0256User search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of marketing data processing, in particular to an advertisement putting method based on user browsing habit data analysis. The method obtains a commodity visual stay time histogram and a hotword extension characteristic benchmark of each user by counting the visual stay time and TF-IDF information of each commodity. And calculating the candidate cost of each candidate advertisement in the candidate commodity advertisement set according to the hot word extension characteristic reference of the user. And further matching the target users through the commodity visual retention time histogram between the users, the hot word extension feature standard and the candidate cost information to obtain the matched users of the target users, and taking the candidate commodity with the maximum candidate cost in the intersection between the candidate commodity advertisement set of the target users and the browsing record set of the matched users as the push commodity. The invention avoids the information limitation of advertisement delivery, and can guide the user to browse commodities with richer types according to the browsing habit and the browsing content of the user.

Description

Advertisement putting method based on user browsing habit data analysis
Technical Field
The invention relates to the technical field of marketing data processing, in particular to an advertisement putting method based on user browsing habit data analysis.
Background
The method for processing the value generated by the user data has long been known, only the traditional data is mainly structured data, and along with the development of network technology, unstructured data calculated by ZB is generated every day on the Internet, and the data continuously influences the experience of the user on the Internet and also becomes a breakthrough of advertisement marketing technology.
The user browsing is unstructured and sparse, the current advertisement system mainly recommends similar type products based on user classification, and the information seen by the user is easily limited by the passive response advertisement delivery technology. At present, in order to reduce the influence, the advertisement delivery system randomly adds some advertisements with higher popularity to enrich the content seen by the user, but the advertisement delivery system brings continuous bad experience, and even some advertisements irrelevant to the user can become a source of offending the user.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide an advertisement delivery method based on user browsing habit data analysis, which adopts the following technical scheme:
the invention provides an advertisement delivery method based on user browsing habit data analysis, which comprises the following steps:
obtaining the visual stay time of each user on each commodity in the historical database; the commodities comprise browsing page commodities and retrieval page commodities; constructing a browsing page commodity TF-IDF set and a retrieval page commodity TF-IDF set according to the browsing record of each user;
acquiring the cross heat of each commodity according to the difference distance between the TF-IDF set elements of the commodities on the browsing page and the TF-IDF set elements of the commodities on the retrieval page; screening the commodities according to the cross popularity to obtain hot word commodities, and taking the average TF-IDF of the hot word commodities as a hot word extension characteristic standard of a corresponding user; obtaining a commodity visual retention time histogram of each user;
obtaining a first cost according to TF-IDF of each candidate commodity in the candidate commodity advertisement set and hot word extension feature benchmark difference; obtaining a second cost according to TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set; the sum of the first cost and the second cost is used as a candidate cost of each candidate commodity;
judging whether an intersection exists between the candidate commodity advertisement set of the target user and the browsing record sets of other users, if so, obtaining a matching distance between the target user and the other users according to the commodity visual retention time histogram similarity, the hot word extension feature reference similarity and the maximum candidate cost in the intersection, and obtaining a matching user with the target user according to the matching distance; and delivering the advertisement of the candidate commodity corresponding to the maximum candidate cost in the intersection of the matched user and the target user to the target user.
Further, the obtaining the visual dwell time of each user on each item in the historical database comprises:
acquiring visual retention time of a retrieval page of a commodity retrieval page browsed by a user, wherein the visual retention time of commodities of all the retrieval pages on the retrieval page is equal to that of the retrieval page corresponding to the retrieval page;
and obtaining the visual retention time of a browsing page of a commodity detail page browsed by a user, and taking the visual retention time of the browsing page as the visual retention time of the commodity of the corresponding browsing page.
Further, the obtaining the cross-heating degree of each commodity according to the difference distance between the TF-IDF collection element of the browsing page commodity and the TF-IDF collection element of the retrieval page commodity comprises the following steps:
obtaining a first neighbor sample set of each commodity in the non-belonged TF-IDF set;
obtaining the cross heat of each commodity according to a cross heat formula, wherein the cross heat formula comprises the following steps:
Figure 100002_DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 234600DEST_PATH_IMAGE002
is as follows
Figure 89424DEST_PATH_IMAGE003
The cross-heat of the individual articles,
Figure 806844DEST_PATH_IMAGE004
is a first
Figure 874157DEST_PATH_IMAGE003
The number of samples in the first set of neighbor samples corresponding to an item,
Figure 617423DEST_PATH_IMAGE005
is a first
Figure 377569DEST_PATH_IMAGE003
The TF-IDF corresponding to each commodity,
Figure 316706DEST_PATH_IMAGE006
is the first neighbor sample set
Figure 453290DEST_PATH_IMAGE007
A TF-IDF to which the TF-IDF is applied,
Figure 59851DEST_PATH_IMAGE008
a function is found for the cosine similarity.
Further, the screening the commodities according to the cross popularity to obtain the hotword commodities comprises:
obtaining a difference distance according to the cross heat difference and the TF-IDF difference between the commodities, and grouping the commodities by using a GMM algorithm according to the difference distance to obtain at least two commodity categories; and sorting the commodity categories according to the cross heat degree in each commodity category, selecting a front preset number of commodity categories as hot word categories, and taking the commodities in the hot word categories as hot word commodities.
Further, the obtaining a difference distance according to the cross-heat difference and the TF-IDF difference between the commodities comprises:
taking the cosine distance of TF-IDF between commodities as TF-IDF difference; taking the absolute value of the difference value of the cross heat degrees between the commodities as the difference of the cross heat degrees; the product of the cross-heat difference and the TF-IDF difference is taken as the difference distance between the commodities.
Further, the obtaining a first price according to the TF-IDF of each candidate commodity in the candidate commodity advertisement set and the hotword extension feature benchmark difference comprises:
and taking the mahalanobis distance between the TF-IDF of each candidate commodity in the candidate commodity advertisement set and the hot word extension feature benchmark as a first cost.
Further, the obtaining a second price according to the TF-IDF difference between each candidate item in the candidate item advertisement set and other candidate items comprises:
obtaining a second neighbor set of each candidate commodity in the candidate commodity advertisement set, and obtaining the maximum TF-IDF difference of each candidate commodity and a sample in the corresponding second neighbor set; taking the median of all the maximum TF-IDF differences in the candidate commodity advertisement set as basic data to obtain the ratio of the maximum TF-IDF difference to the basic data of each candidate commodity in the candidate commodity advertisement set; setting the second price of the candidate commodity with the ratio smaller than one as one; and setting the second price of the candidate commodity with the ratio larger than one as the corresponding ratio.
Further, the method for acquiring the matching distance includes:
obtaining a matching distance according to a matching distance formula, wherein the matching distance formula comprises:
Figure 459740DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 151752DEST_PATH_IMAGE010
for the user
Figure 826447DEST_PATH_IMAGE003
And the user
Figure 19007DEST_PATH_IMAGE007
The matching distance between the two is less than the matching distance,
Figure 386534DEST_PATH_IMAGE011
for the user
Figure 300263DEST_PATH_IMAGE003
The histogram of the visual stay time of the commodity of (1),
Figure 513070DEST_PATH_IMAGE012
for the user
Figure 828645DEST_PATH_IMAGE007
The histogram of the visual stay time of the commodity of (1),
Figure 835915DEST_PATH_IMAGE013
is composed of
Figure 502520DEST_PATH_IMAGE011
And
Figure 521947DEST_PATH_IMAGE012
the degree of similarity between the two images,
Figure 692028DEST_PATH_IMAGE014
for the user
Figure 401358DEST_PATH_IMAGE003
The hot word of (2) extends the feature reference,
Figure 24101DEST_PATH_IMAGE015
for the user
Figure 844289DEST_PATH_IMAGE007
The hot word of (2) extends the feature reference,
Figure 868877DEST_PATH_IMAGE016
for the user
Figure 749108DEST_PATH_IMAGE003
And the user
Figure 121796DEST_PATH_IMAGE007
The largest candidate cost in the intersection between them,
Figure 214517DEST_PATH_IMAGE017
a function is obtained for the cosine distance.
The invention has the following beneficial effects:
1. the embodiment of the invention obtains TF-IDF information and visual retention time of each user browsed commodity according to a historical database of user browsing data, represents the visual retention time distribution characteristics in the user browsing process according to a commodity retention time histogram, and represents the browsing semantic characteristics of each user according to a hot word extension characteristic standard. The advertisement recommendation method and the advertisement recommendation system further combine the candidate cost of each candidate commodity in the candidate commodity advertisement set and the matching relation between the users to carry out advertisement recommendation on the target users, wherein the recommendation process considers the browsing retrieval cost of the users and the browsing habits of the matched users, and can provide advertisement putting combinations which are attractive and can represent a user group, so that the experience of the users and the advertisement putting hit rate can be improved, the users can be dynamically guided to see more and more novel products, and the browsing habits of the users are met while the information limitation is avoided.
2. According to the embodiment of the invention, browsed commodities are divided into browsing page commodities and retrieval page commodities according to the browsing types of users, and the cross heat is obtained according to the difference of elements in TF-IDF sets of two different types of commodities, so that the subsequent hot word extension characteristic datum is closer to the browsing habits of the users, and the reference of the hot word extension characteristic datum is increased.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an advertisement delivery method based on user browsing habit data analysis according to an embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined invention, the following detailed description is provided with reference to the accompanying drawings and preferred embodiments for an advertisement delivery method based on user browsing habit data analysis, and its specific implementation, structure, features and effects. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following describes a specific scheme of the advertisement delivery method based on user browsing habit data analysis, which is provided by the present invention, in detail with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of an advertisement delivery method based on user browsing habit data analysis according to an embodiment of the present invention is shown, where the method includes:
step S1: obtaining the visual stay time of each user on each commodity in the historical database; the commodities comprise browsing page commodities and retrieval page commodities; and constructing a browsing page commodity TF-IDF set and a retrieval page commodity TF-IDF set according to the browsing records of each user.
When a user browses commodities on a shopping website, a website background can construct a historical database of each user according to various information such as retrieval contents, browsing duration and the like of the user, namely data in the historical database comprise browsing habit characteristics of the user. Further, considering that the browsing behavior and the retrieval behavior of the user are accompanied relationships but the characteristics of the browsing behavior and the retrieval behavior are different in the user's purpose, the user can open browsing for a period of time based on the retrieval, and the browsing also causes the user to modify the retrieval keywords, so that the commodities in the user history database are divided into browsing page commodities and retrieval page commodities. The browsing page commodity is a commodity of which the user browses a commodity detail page; the search page commodities are commodities that the user only browses on the search page, and the commodity information contained in the search page is less, and one search page contains various commodities.
In the browsing process of the user, the browsing time reflects the attention degree of the user to the commodity, that is, the longer the browsing time of the user to a commodity is, the higher the attention degree of the user to the commodity is. And the browsing time of the user for the commodities can also represent the shopping habit of the user, namely, the longer the browsing time, the more careful the user selects the commodities. Therefore, the statistics of the historical data in the historical database are performed to obtain the visual stay time of each user on each commodity in the historical database, and the method specifically comprises the following steps:
and obtaining the visual retention time of the search page of a commodity search page browsed by the user, wherein the visual retention time of commodities on all the search pages on the search page is equal to that of the search page corresponding to the search page. And obtaining the visual retention time of a browsing page of a commodity detail page browsed by a user, and taking the visual retention time of the browsing page as the visual retention time of the commodity of the corresponding browsing page. It should be noted that, for the browsed page goods, the retention time in the detail page of the browsed page goods is related to the total interaction time of the detail page, that is, the time represented by the detail page sliding is integrated, in the implementation process, the implementer can delay several seconds after each detail page sliding so as to more accurately represent the visual retention time of the browsed page goods, and the specific delay time may be specifically set according to an actual situation, which is not limited herein.
It should be noted that, in order to facilitate subsequent histogram statistics, all visual dwell times are extremely poorly normalized in the embodiment of the present invention, i.e., the visual dwell times are normalized. For longer visual dwell times, it is close to 1; for shorter visual dwell times, indicating that the user is not paying attention to such goods, it is close to 0.
Because browsing and searching behaviors are different, in order to find potential features, the word set features of each commodity in the historical database need to be counted, and after the word set statistics is carried out on the historical database, the word frequency of all content words such as titles, introduction texts and the like in all commodities can be obtained.
TF-IDF is a statistical method for evaluating the importance of a word to one of a set of documents or a corpus to obtain semantic information of a term by means of statistical word frequency (TF) and inverse text frequency Index (IDF). The word frequency information of different keywords in each commodity can be obtained by analyzing the historical database, namely the TF-IDF of each commodity is a vector containing multiple elements. And constructing a browsing page commodity TF-IDF set and a retrieval page commodity TF-IDF set according to the browsing records of each user in the historical database. It should be noted that TF-IDF is a well-known technical means for those skilled in the art, and the detailed algorithmic process is not described in detail.
Step S2: acquiring the cross heat of each commodity according to the difference distance between the TF-IDF set elements of the commodities on the browsing page and the TF-IDF set elements of the commodities on the retrieval page; screening the commodities according to the cross popularity to obtain hot word commodities, and taking the average TF-IDF of the hot word commodities as a hot word extension characteristic standard of a corresponding user; a commodity visual stay time histogram is obtained for each user.
According to the browsing practical situation of the user, for a browsing page commodity, the browsing page commodity should appear on the retrieval page first, and then the detail page is entered by clicking of the user, so that the browsing page commodity and the retrieval page commodity have cross characteristics. For a browsing page commodity, the more similar commodities in the search page commodity set, the higher the commodity popularity of the browsing page; the search page commodity is the same. Therefore, the cross-heat of each commodity is obtained according to the difference distance between the commodity TF-IDF set elements of the browsing page and the commodity TF-IDF set elements of the retrieval page, the cross-heat reflects the attention degree of a user to a certain commodity in the commodity browsing process, and the larger the cross-heat is, the more the retrieval times or browsing times of the user are. The specific method for acquiring the cross heat comprises the following steps:
a first set of neighbor samples for each commodity in the non-belonging TF-IDF set is obtained. For a target browsing page commodity, a first neighbor sample set of the target browsing page commodity is selected from a corresponding TF-IDF set of the target browsing page commodity according to the similarity of TF-IDF, namely all samples in the first neighbor sample set are TF-IDF of the target browsing page commodity and are a plurality of samples with the largest TF-IDF similarity from the target browsing page commodity; the same applies to the search page commodity. It should be noted that the number of samples in the first neighboring sample set may be specifically set according to a specific implementation scenario, and is not limited herein.
Obtaining the cross heat of each commodity according to a cross heat formula, wherein the cross heat formula comprises the following steps:
Figure 890349DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 675903DEST_PATH_IMAGE002
is as follows
Figure 538817DEST_PATH_IMAGE003
The cross-heat of the individual articles,
Figure 435228DEST_PATH_IMAGE004
is as follows
Figure 434408DEST_PATH_IMAGE003
The number of samples in the first set of neighbor samples corresponding to an item,
Figure 405511DEST_PATH_IMAGE005
is as follows
Figure 755721DEST_PATH_IMAGE003
The TF-IDF corresponding to each commodity,
Figure 721403DEST_PATH_IMAGE006
is the first in the intersection set
Figure 840669DEST_PATH_IMAGE007
A TF-IDF, a first identification number (TF-IDF),
Figure 968025DEST_PATH_IMAGE008
a function is solved for the cosine similarity.
In the cross-heat formula,
Figure 539952DEST_PATH_IMAGE018
the cosine distance is represented, wherein 1 in the denominator has the function of preventing the denominator from being 0, namely the whole formula is the reciprocal of the average cosine distance of one commodity, and the larger the average distance is, the cooler the corresponding commodity is, and the smaller the cross heat is.
The larger the cross heat degree is, the more concerned the user is about the corresponding commodity, so that the commodities can be screened according to the cross heat degree to obtain the hot word commodities, wherein the semantic information reflected by the TF-IDF information of the hot word commodities represents the semantic information which is concerned by the user, and therefore the average TF-IDF of the hot word commodities is used as the hot word extension characteristic standard of the corresponding user. The hot word extension feature benchmark represents the attention information of the user to the commodity keyword. It should be noted that, because TF-IDF is a vector, when calculating the average, an average reference should be constructed for each dimension of data of all features to obtain a hotword extension feature reference.
Specifically, screening the commodities according to the cross heat degree, and obtaining the hot-word commodities comprises the following steps:
and obtaining a difference distance according to the cross heat difference between the commodities and the TF-IDF difference, and grouping the commodities by using a GMM algorithm according to the difference distance to obtain at least two commodity categories. And sorting the commodity categories according to the cross heat degree in each commodity category, selecting a front preset number of commodity categories as hot word categories, and taking the commodities in the hot word categories as hot word commodities. The obtaining of the difference distance according to the cross heat difference and the TF-IDF difference among commodities comprises the following steps:
taking the cosine distance of TF-IDF between commodities as TF-IDF difference; taking the absolute value of the difference value of the cross heat degrees between the commodities as the difference of the cross heat degrees; the product of the cross-heat difference and the TF-IDF difference is taken as the difference distance between the commodities. I.e. the difference distanceThe expression of (c) is:
Figure 43746DEST_PATH_IMAGE019
wherein, in the step (A),
Figure 283097DEST_PATH_IMAGE020
is the difference distance between article a and article b,
Figure 312845DEST_PATH_IMAGE021
is the cross-heat of the article a,
Figure 637647DEST_PATH_IMAGE022
is the cross-heat of the article b,
Figure 945132DEST_PATH_IMAGE023
is the TF-IDF of the commercial product a,
Figure 507832DEST_PATH_IMAGE024
is the TF-IDF of the commercial product b,
Figure 242569DEST_PATH_IMAGE008
a function is obtained for the cosine similarity. The cosine distance of the TF-IDF between the commodities is calculated to be used for restraining the cross heat difference, if the semantic similarity between the commodities is small, and the cross heat difference is small, the cross heat difference is probably caused by small intersection of browsing and searching of a user, so that errors of the commodities can be corrected by only the cross heat difference through restraint, and different commodity types can be further distinguished.
It should be noted that the GMM algorithm is a classification algorithm well known to those skilled in the art, and specific algorithm steps are not described again. The number of commodity categories obtained after the algorithm is executed may be specifically set according to a specific implementation scenario, which is not limited in the embodiment of the present invention. In the embodiment of the invention, the preset number is set to be half of the number of the commodity categories, and the commodity categories can be sequenced according to the average cross heat of all samples in the commodity categories, so that the hot word categories are selected.
And further obtaining a commodity visual retention time histogram of the user every month, and according to the expression of the visual retention time in the step S1, the visual retention time distribution characteristics reflected by the commodity visual retention time histogram can express the browsing habit and the browsing style of the user. In the embodiment of the invention, the visual retention time is divided into 10 levels, namely 10 columnar bodies exist in a commodity visual retention time histogram, the abscissa is the visual retention time level, and the ordinate is the corresponding occurrence frequency.
And step S3: obtaining a first cost according to TF-IDF of each candidate commodity in the candidate commodity advertisement set and hot word extension characteristic benchmark difference; obtaining a second cost according to TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set; the sum of the first cost and the second cost is used as a candidate cost for each candidate item.
The candidate cost of each candidate commodity in the candidate commodity advertisement set reflects the difference between the candidate commodity and the commodity frequently browsed by the user, and the practice of internet marketing shows that some products which are difficult to see and are not frequently browsed by the user exist in the home page of the shopping website, so that the user is attracted to continuously browse the commodity on the platform. Therefore, the larger the candidate cost is, the more difficult the user can see the corresponding candidate goods in the candidate goods advertisement set under the condition of keeping the existing browsing habit, the more the user should push out the corresponding candidate goods in the subsequent advertisement push, so that the user can generate interest in the shopping platform and the candidate goods, and the exposure of the candidate advertisements is increased.
The candidate cost is divided into two parts, wherein the first cost is obtained according to the TF-IDF of each candidate commodity in the candidate commodity advertisement set and the hot word extension characteristic benchmark difference, namely the first cost reflects the difference between the semantic characteristics of the common browsing of the user and the semantic characteristics of the candidate commodities; the second price is obtained according to TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set, and reflects the unique degree of the candidate commodities in the candidate commodity advertisement set.
The specific method for acquiring the first cost comprises the following steps: the Mahalanobis distance between the TF-IDF of each candidate commodity in the candidate commodity advertisement set and the hot word extension feature benchmark is taken as a first cost. It should be noted that mahalanobis distance is a well-known technical means for those skilled in the art, and is not described herein. The larger the first price, the more irrelevant the corresponding candidate item is to indicate the type of the item that the user often focuses on.
The specific method for acquiring the second cost comprises the following steps: obtaining a second neighbor set of each candidate commodity in the candidate commodity advertisement set, and obtaining the maximum TF-IDF difference of each candidate commodity and a sample in the corresponding second neighbor set; taking the median of all the maximum TF-IDF differences in the candidate commodity advertisement set as basic data to obtain the ratio of the maximum TF-IDF difference to the basic data of each candidate commodity in the candidate commodity advertisement set; setting the second price of the candidate commodity with the ratio smaller than one as one; and setting the second price of the candidate commodity with the ratio larger than one as the corresponding ratio. It should be noted that the second neighbor set is obtained according to the TF-IDF similarity between each candidate product and other candidate products, that is, a plurality of other candidate products most similar to the target candidate product TF-IDF are selected as the second neighbor set of the target candidate product, and the number of samples in the second neighbor set may be specifically set according to a specific scenario, which is not described herein again. For each candidate commodity, the larger the maximum TF-IDF difference in the second neighbor set is, the more discrete the distribution of the corresponding second neighbor set is, i.e. the more special the corresponding candidate commodity is in the word set space, the more difficult the user can retrieve the corresponding candidate commodity.
And taking the sum of the first price and the second price as the candidate price of each candidate commodity.
And step S4: judging whether an intersection exists between the candidate commodity advertisement set of the target user and the browsing record sets of other users, if so, obtaining a matching distance between the target user and the other users according to the commodity visual retention time histogram similarity, the hot word extension feature reference similarity and the maximum candidate cost in the intersection, and obtaining a matching user with the target user according to the matching distance; and delivering the advertisement of the candidate commodity corresponding to the maximum candidate cost in the intersection of the matched user and the target user to the target user.
When the advertisement is put, in order to avoid the limitation of information, the browsing habits among different users are considered while the candidate commodity advertisement set is considered, so that the finally pushed candidate commodity can be ensured to bring freshness to the target user, and the receiving degree of the target user to the pushed candidate commodity can be ensured.
Firstly, whether an intersection exists between a candidate commodity advertisement set of a target user and browsing record sets of other users needs to be judged, if the intersection does not exist, two completely different user groups exist between the two users, and the target user cannot be pushed with advertisements according to browsing information of the other users; if the intersection exists, the other users can provide reference for pushing of the target user, further, the matching distance between the target user and the other users is obtained according to the commodity visual retention time histogram similarity, the hot word extension feature benchmark similarity and the maximum candidate cost in the intersection, the matching user with the target user is obtained according to the matching distance, namely the matching user is one of the other users which is most matched with the target user, and therefore the advertisement of the candidate commodity corresponding to the maximum candidate cost in the intersection of the matching user and the target user can be delivered to the target user.
In the embodiment of the present invention, the KM matching algorithm is selected according to the matching distance to obtain the matching user with the target user, and the KM algorithm is a technical means well known to those skilled in the art and will not be described herein.
The specific method for obtaining the matching distance comprises the following steps:
obtaining a matching distance according to a matching distance formula, wherein the matching distance formula comprises:
Figure 789088DEST_PATH_IMAGE009
wherein, the first and the second end of the pipe are connected with each other,
Figure 634685DEST_PATH_IMAGE010
for the user
Figure 585979DEST_PATH_IMAGE003
And the user
Figure 226039DEST_PATH_IMAGE007
The matching distance between the two or more of the two,
Figure 259854DEST_PATH_IMAGE011
for the user
Figure 643562DEST_PATH_IMAGE003
The visual residence time histogram of the commercial product of (1),
Figure DEST_PATH_IMAGE026A
for the user
Figure 56220DEST_PATH_IMAGE007
The histogram of the visual stay time of the commodity of (1),
Figure 129830DEST_PATH_IMAGE013
is composed of
Figure 650941DEST_PATH_IMAGE011
And
Figure 572761DEST_PATH_IMAGE012
the degree of similarity between the two images,
Figure 495718DEST_PATH_IMAGE014
for the user
Figure 477580DEST_PATH_IMAGE003
The hot word of (2) extends the feature reference,
Figure 485987DEST_PATH_IMAGE015
for the user
Figure 211498DEST_PATH_IMAGE007
The hot word of (2) extends the feature reference,
Figure 988961DEST_PATH_IMAGE016
for the user
Figure 398515DEST_PATH_IMAGE003
And the user
Figure 363060DEST_PATH_IMAGE007
The largest candidate cost in the intersection between them,
Figure 157841DEST_PATH_IMAGE017
a function is obtained for the cosine distance.
In the matching distance formula, the distance between the two matching points,
Figure 524231DEST_PATH_IMAGE027
the difference between the commodity visual retention time histograms is expressed, and the commodity visual retention time histogram difference can reflect the difference of viewing time lengths when a user browses and opens commodities and can also reflect the difference of browsing habits such as the fact that the user browses a large quantity of commodities or watches a large number of commodities independently;
Figure 113475DEST_PATH_IMAGE028
the hot word extension characteristic benchmark difference is represented, and the larger the hot word extension characteristic benchmark difference is, the larger the difference between the washed numbers of the two users and the concerned contents is. The larger the two differences are, the more irrelevant the browsing habits and the types of the browsed commodities between the two users are, the larger the matching distance is; the larger the matching distance is, the more the corresponding candidate product is the product to be pushed, and the smaller the matching distance is.
The reason for obtaining the matching distance is that the information amount of different commodities is different, for example, the browsing duration characteristic difference between some scientific and technological products, cosmetics and some daily necessities during browsing is large, and the attention degree and the reading careful degree of a user can be represented; therefore, when the browsing habit is similar to the type of the browsed commodity, the characteristic difference of the read commodity information is further matched.
The advertisement of the candidate commodity corresponding to the maximum candidate cost in the intersection of the matching user and the target user is delivered to the target user, a commodity advertisement pushing result with a heuristic effect can be provided for the target user, and because the matching user with the browsing content and similar browsing habit of the target user browses the product once, the product can more easily allow the target user to pay attention to other products with more characteristics, so that the target user is guided to browse the commodity, and the advertisement delivery effect and the effect of attracting the user to browse are maximized.
In summary, the embodiment of the invention obtains the commodity visual retention time histogram and the hotword extension feature benchmark of each user by counting the visual retention time and the TF-IDF information of each commodity. And calculating the candidate cost of each candidate advertisement in the candidate commodity advertisement set according to the hot word extension characteristic benchmark of the user. And further matching the target users through the commodity visual retention time histogram, the hot word extension characteristic standard and the candidate cost information among the users to obtain the matched users of the target users, and taking the candidate commodity with the maximum candidate cost in the intersection between the candidate commodity advertisement set of the target users and the browsing record set of the matched users as the push commodity. The embodiment of the invention avoids the information limitation of advertisement delivery, and can guide the user to browse the commodities with richer types according to the browsing habit and the browsing content of the user.
It should be noted that: the sequence of the above embodiments of the present invention is only for description, and does not represent the advantages or disadvantages of the embodiments. The processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (8)

1. An advertisement delivery method based on user browsing habit data analysis is characterized by comprising the following steps:
obtaining the visual stay time of each user on each commodity in the historical database; the commodities comprise browsing page commodities and retrieval page commodities; constructing a browsing page commodity TF-IDF set and a retrieval page commodity TF-IDF set according to the browsing record of each user;
obtaining the cross heat of each commodity according to the difference distance between the TF-IDF set elements of the commodities on the browsing page and the TF-IDF set elements of the commodities on the retrieval page; screening the commodities according to the cross popularity to obtain hot word commodities, and taking the average TF-IDF of the hot word commodities as a hot word extension characteristic standard of a corresponding user; obtaining a commodity visual retention time histogram of each user;
obtaining a first cost according to TF-IDF of each candidate commodity in the candidate commodity advertisement set and hot word extension characteristic benchmark difference; obtaining a second cost according to TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set; the sum of the first cost and the second cost is used as a candidate cost of each candidate commodity;
judging whether an intersection exists between the candidate commodity advertisement set of the target user and the browsing record sets of other users, if so, obtaining a matching distance between the target user and the other users according to the commodity visual retention time histogram similarity, the hot word extension feature reference similarity and the maximum candidate cost in the intersection, and obtaining a matching user with the target user according to the matching distance; and delivering the advertisement of the candidate commodity corresponding to the maximum candidate cost in the intersection of the matched user and the target user to the target user.
2. The advertisement delivery method based on analysis of user browsing habit data according to claim 1, wherein said obtaining the visual stay time of each user on each commodity in the history database comprises:
acquiring visual retention time of a retrieval page of a commodity retrieval page browsed by a user, wherein the visual retention time of commodities of all the retrieval pages on the retrieval page is equal to that of the retrieval page corresponding to the retrieval page;
and obtaining the visual retention time of a browsing page of a commodity detail page browsed by a user, and taking the visual retention time of the browsing page as the visual retention time of the commodity of the corresponding browsing page.
3. The method of claim 1, wherein the obtaining the cross-heating degree of each commodity according to the difference distance between the TF-IDF collective elements of the commodities on the browsing page and the TF-IDF collective elements of the commodities on the retrieval page comprises:
obtaining a first neighbor sample set of each commodity in the non-belonged TF-IDF set;
obtaining the cross heat of each commodity according to a cross heat formula, wherein the cross heat formula comprises the following steps:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 41223DEST_PATH_IMAGE002
is as follows
Figure 108536DEST_PATH_IMAGE003
The cross-heat of the individual articles,
Figure 595012DEST_PATH_IMAGE004
is as follows
Figure 620737DEST_PATH_IMAGE003
The number of samples in the first set of neighbor samples corresponding to an item,
Figure 825453DEST_PATH_IMAGE005
is as follows
Figure 696457DEST_PATH_IMAGE003
The TF-IDF corresponding to each commodity,
Figure 37440DEST_PATH_IMAGE006
is the first neighbor sample set
Figure 236996DEST_PATH_IMAGE007
A TF-IDF to which the TF-IDF is applied,
Figure 929008DEST_PATH_IMAGE008
a function is found for the cosine similarity.
4. The advertisement delivery method based on user browsing habit data analysis as claimed in claim 1, wherein the step of screening the commodities according to the cross-popularity to obtain the hotword commodities comprises the steps of:
obtaining a difference distance according to the cross heat difference and the TF-IDF difference between the commodities, and grouping the commodities by using a GMM algorithm according to the difference distance to obtain at least two commodity categories; and sorting the commodity categories according to the cross heat degree in each commodity category, selecting the commodity categories with the preset number as hot word categories, and taking the commodities in the hot word categories as hot word commodities.
5. The advertisement delivery method based on analysis of user browsing habit data according to claim 4, wherein said obtaining the difference distance according to the cross-heat difference and TF-IDF difference between commodities comprises:
taking the cosine distance of TF-IDF between commodities as TF-IDF difference; taking the absolute value of the difference value of the cross heat degrees between the commodities as the difference of the cross heat degrees; the product of the cross-heat difference and the TF-IDF difference is taken as the difference distance between the commodities.
6. The method of claim 1, wherein the obtaining a first price according to the TF-IDF of each candidate product in the candidate product advertisement set and the benchmark difference of the hotword extension features comprises:
and taking the mahalanobis distance between the TF-IDF of each candidate commodity in the candidate commodity advertisement set and the hot word extension feature benchmark as a first cost.
7. The advertisement delivery method based on analysis of user browsing habit data according to claim 1, wherein said obtaining the second price according to the TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set comprises:
obtaining a second neighbor set of each candidate commodity in the candidate commodity advertisement set, and obtaining the maximum TF-IDF difference of each candidate commodity and a sample in the corresponding second neighbor set; taking the median of all the maximum TF-IDF differences in the candidate commodity advertisement set as basic data to obtain the ratio of the maximum TF-IDF difference to the basic data of each candidate commodity in the candidate commodity advertisement set; setting the second price of the candidate commodity with the ratio smaller than one as one; and setting the second price of the candidate commodity with the ratio larger than one as the corresponding ratio.
8. The advertisement delivery method based on the analysis of the user browsing habit data according to claim 1, wherein the method for obtaining the matching distance comprises:
obtaining a matching distance according to a matching distance formula, wherein the matching distance formula comprises:
Figure 338124DEST_PATH_IMAGE009
wherein, the first and the second end of the pipe are connected with each other,
Figure 799192DEST_PATH_IMAGE010
for the user
Figure 166720DEST_PATH_IMAGE003
And the user
Figure 80449DEST_PATH_IMAGE007
The matching distance between the two or more of the two,
Figure 558835DEST_PATH_IMAGE011
for the user
Figure 139989DEST_PATH_IMAGE003
The histogram of the visual stay time of the commodity of (1),
Figure 144330DEST_PATH_IMAGE012
for the user
Figure DEST_PATH_IMAGE014A
The visual residence time histogram of the commercial product of (1),
Figure 686301DEST_PATH_IMAGE015
is composed of
Figure 437219DEST_PATH_IMAGE011
And
Figure 872879DEST_PATH_IMAGE012
the degree of similarity between the two images is determined,
Figure 51051DEST_PATH_IMAGE016
for the user
Figure 196163DEST_PATH_IMAGE003
The hot word of (2) extends the feature reference,
Figure DEST_PATH_IMAGE018A
for the user
Figure 626138DEST_PATH_IMAGE007
The hot word of (2) extends the feature reference,
Figure DEST_PATH_IMAGE019
for the user
Figure 853988DEST_PATH_IMAGE003
And the user
Figure 468640DEST_PATH_IMAGE007
The largest candidate cost in the intersection between them,
Figure 844258DEST_PATH_IMAGE020
a function is obtained for the cosine distance.
CN202211463099.XA 2022-11-22 2022-11-22 Advertisement putting method based on user browsing habit data analysis Active CN115619457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211463099.XA CN115619457B (en) 2022-11-22 2022-11-22 Advertisement putting method based on user browsing habit data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211463099.XA CN115619457B (en) 2022-11-22 2022-11-22 Advertisement putting method based on user browsing habit data analysis

Publications (2)

Publication Number Publication Date
CN115619457A true CN115619457A (en) 2023-01-17
CN115619457B CN115619457B (en) 2023-03-28

Family

ID=84877625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211463099.XA Active CN115619457B (en) 2022-11-22 2022-11-22 Advertisement putting method based on user browsing habit data analysis

Country Status (1)

Country Link
CN (1) CN115619457B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703485A (en) * 2023-08-04 2023-09-05 山东创亿智慧信息科技发展有限责任公司 Advertisement accurate marketing method and system based on big data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218368A (en) * 2012-01-20 2013-07-24 深圳市腾讯计算机系统有限公司 Method and device for discovering hot words
US20150032717A1 (en) * 2006-05-02 2015-01-29 Surf Canyon Incorporated Real time implicit user modeling for personalized search
CN104834638A (en) * 2014-02-10 2015-08-12 腾讯科技(深圳)有限公司 Hot word presentation method and device and electronic equipment
CN110377817A (en) * 2019-06-13 2019-10-25 百度在线网络技术(北京)有限公司 Search entry method for digging and device and its application in multimedia resource
US20190354588A1 (en) * 2018-05-17 2019-11-21 Babylon Partners Limited Device and method for natural language processing
US10764490B1 (en) * 2019-06-24 2020-09-01 RoundhouseOne Inc. Computer vision system that detects people, identification of their location in a field of view

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150032717A1 (en) * 2006-05-02 2015-01-29 Surf Canyon Incorporated Real time implicit user modeling for personalized search
CN103218368A (en) * 2012-01-20 2013-07-24 深圳市腾讯计算机系统有限公司 Method and device for discovering hot words
CN104834638A (en) * 2014-02-10 2015-08-12 腾讯科技(深圳)有限公司 Hot word presentation method and device and electronic equipment
US20190354588A1 (en) * 2018-05-17 2019-11-21 Babylon Partners Limited Device and method for natural language processing
CN110377817A (en) * 2019-06-13 2019-10-25 百度在线网络技术(北京)有限公司 Search entry method for digging and device and its application in multimedia resource
US10764490B1 (en) * 2019-06-24 2020-09-01 RoundhouseOne Inc. Computer vision system that detects people, identification of their location in a field of view

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈彬;张荣梅;: "智能推荐系统研究综述" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703485A (en) * 2023-08-04 2023-09-05 山东创亿智慧信息科技发展有限责任公司 Advertisement accurate marketing method and system based on big data
CN116703485B (en) * 2023-08-04 2023-10-20 山东创亿智慧信息科技发展有限责任公司 Advertisement accurate marketing method and system based on big data

Also Published As

Publication number Publication date
CN115619457B (en) 2023-03-28

Similar Documents

Publication Publication Date Title
CN109685631B (en) Personalized recommendation method based on big data user behavior analysis
CN111709812A (en) E-commerce platform commodity recommendation method and system based on user dynamic classification
CN104866474B (en) Individuation data searching method and device
Sivapalan et al. Recommender systems in e-commerce
CN105224699B (en) News recommendation method and device
US10354308B2 (en) Distinguishing accessories from products for ranking search results
US20090281906A1 (en) Music Recommendation using Emotional Allocation Modeling
CN112434151A (en) Patent recommendation method and device, computer equipment and storage medium
CN112200601B (en) Item recommendation method, device and readable storage medium
CN104679771A (en) Individual data searching method and device
CN105426528A (en) Retrieving and ordering method and system for commodity data
CN109963175B (en) Television product accurate recommendation method and system based on explicit and implicit potential factor model
Bouras et al. Improving news articles recommendations via user clustering
CN111724235A (en) Online commodity recommendation method based on user novelty
CN115544242B (en) Big data-based similar commodity model selection recommendation method
CN113065062A (en) News recommendation method and system based on user reading time behavior
CN115619457B (en) Advertisement putting method based on user browsing habit data analysis
CN112434232A (en) Internet-based product keyword advertisement putting method and system
CN111915409B (en) Item recommending method, device, equipment and storage medium based on item
CN116385048B (en) Intelligent marketing method and system for agricultural products
Liu et al. Fast recommendation on latent collaborative relations
CN115878903A (en) Intelligent information recommendation method based on big data
Kolesnikov et al. Predicting CTR of new ads via click prediction
CN113704630B (en) Information pushing method and device, readable storage medium and electronic equipment
CN113076481B (en) Document recommendation system and method based on maturity technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant