CN115619457A

CN115619457A - Advertisement putting method based on user browsing habit data analysis

Info

Publication number: CN115619457A
Application number: CN202211463099.XA
Authority: CN
Inventors: 刘晓东; 嵇晨; 於雯雯; 冯思雨
Original assignee: Jingfu Technology Co ltd; Information Technology Nanjing Co ltd
Current assignee: Jingfu Technology Co ltd; Information Technology Nanjing Co ltd
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2023-01-17
Anticipated expiration: 2042-11-22
Also published as: CN115619457B

Abstract

The invention relates to the technical field of marketing data processing, in particular to an advertisement putting method based on user browsing habit data analysis. The method obtains a commodity visual stay time histogram and a hotword extension characteristic benchmark of each user by counting the visual stay time and TF-IDF information of each commodity. And calculating the candidate cost of each candidate advertisement in the candidate commodity advertisement set according to the hot word extension characteristic reference of the user. And further matching the target users through the commodity visual retention time histogram between the users, the hot word extension feature standard and the candidate cost information to obtain the matched users of the target users, and taking the candidate commodity with the maximum candidate cost in the intersection between the candidate commodity advertisement set of the target users and the browsing record set of the matched users as the push commodity. The invention avoids the information limitation of advertisement delivery, and can guide the user to browse commodities with richer types according to the browsing habit and the browsing content of the user.

Description

Advertisement putting method based on user browsing habit data analysis

Technical Field

The invention relates to the technical field of marketing data processing, in particular to an advertisement putting method based on user browsing habit data analysis.

Background

The method for processing the value generated by the user data has long been known, only the traditional data is mainly structured data, and along with the development of network technology, unstructured data calculated by ZB is generated every day on the Internet, and the data continuously influences the experience of the user on the Internet and also becomes a breakthrough of advertisement marketing technology.

The user browsing is unstructured and sparse, the current advertisement system mainly recommends similar type products based on user classification, and the information seen by the user is easily limited by the passive response advertisement delivery technology. At present, in order to reduce the influence, the advertisement delivery system randomly adds some advertisements with higher popularity to enrich the content seen by the user, but the advertisement delivery system brings continuous bad experience, and even some advertisements irrelevant to the user can become a source of offending the user.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide an advertisement delivery method based on user browsing habit data analysis, which adopts the following technical scheme:

the invention provides an advertisement delivery method based on user browsing habit data analysis, which comprises the following steps:

obtaining the visual stay time of each user on each commodity in the historical database; the commodities comprise browsing page commodities and retrieval page commodities; constructing a browsing page commodity TF-IDF set and a retrieval page commodity TF-IDF set according to the browsing record of each user;

acquiring the cross heat of each commodity according to the difference distance between the TF-IDF set elements of the commodities on the browsing page and the TF-IDF set elements of the commodities on the retrieval page; screening the commodities according to the cross popularity to obtain hot word commodities, and taking the average TF-IDF of the hot word commodities as a hot word extension characteristic standard of a corresponding user; obtaining a commodity visual retention time histogram of each user;

obtaining a first cost according to TF-IDF of each candidate commodity in the candidate commodity advertisement set and hot word extension feature benchmark difference; obtaining a second cost according to TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set; the sum of the first cost and the second cost is used as a candidate cost of each candidate commodity;

judging whether an intersection exists between the candidate commodity advertisement set of the target user and the browsing record sets of other users, if so, obtaining a matching distance between the target user and the other users according to the commodity visual retention time histogram similarity, the hot word extension feature reference similarity and the maximum candidate cost in the intersection, and obtaining a matching user with the target user according to the matching distance; and delivering the advertisement of the candidate commodity corresponding to the maximum candidate cost in the intersection of the matched user and the target user to the target user.

Further, the obtaining the visual dwell time of each user on each item in the historical database comprises:

acquiring visual retention time of a retrieval page of a commodity retrieval page browsed by a user, wherein the visual retention time of commodities of all the retrieval pages on the retrieval page is equal to that of the retrieval page corresponding to the retrieval page;

and obtaining the visual retention time of a browsing page of a commodity detail page browsed by a user, and taking the visual retention time of the browsing page as the visual retention time of the commodity of the corresponding browsing page.

Further, the obtaining the cross-heating degree of each commodity according to the difference distance between the TF-IDF collection element of the browsing page commodity and the TF-IDF collection element of the retrieval page commodity comprises the following steps:

obtaining a first neighbor sample set of each commodity in the non-belonged TF-IDF set;

obtaining the cross heat of each commodity according to a cross heat formula, wherein the cross heat formula comprises the following steps:

wherein the content of the first and second substances,

is as follows

The cross-heat of the individual articles,

is a first

The number of samples in the first set of neighbor samples corresponding to an item,

is a first

The TF-IDF corresponding to each commodity,

is the first neighbor sample set

A TF-IDF to which the TF-IDF is applied,

a function is found for the cosine similarity.

Further, the screening the commodities according to the cross popularity to obtain the hotword commodities comprises:

obtaining a difference distance according to the cross heat difference and the TF-IDF difference between the commodities, and grouping the commodities by using a GMM algorithm according to the difference distance to obtain at least two commodity categories; and sorting the commodity categories according to the cross heat degree in each commodity category, selecting a front preset number of commodity categories as hot word categories, and taking the commodities in the hot word categories as hot word commodities.

Further, the obtaining a difference distance according to the cross-heat difference and the TF-IDF difference between the commodities comprises:

taking the cosine distance of TF-IDF between commodities as TF-IDF difference; taking the absolute value of the difference value of the cross heat degrees between the commodities as the difference of the cross heat degrees; the product of the cross-heat difference and the TF-IDF difference is taken as the difference distance between the commodities.

Further, the obtaining a first price according to the TF-IDF of each candidate commodity in the candidate commodity advertisement set and the hotword extension feature benchmark difference comprises:

and taking the mahalanobis distance between the TF-IDF of each candidate commodity in the candidate commodity advertisement set and the hot word extension feature benchmark as a first cost.

Further, the obtaining a second price according to the TF-IDF difference between each candidate item in the candidate item advertisement set and other candidate items comprises:

obtaining a second neighbor set of each candidate commodity in the candidate commodity advertisement set, and obtaining the maximum TF-IDF difference of each candidate commodity and a sample in the corresponding second neighbor set; taking the median of all the maximum TF-IDF differences in the candidate commodity advertisement set as basic data to obtain the ratio of the maximum TF-IDF difference to the basic data of each candidate commodity in the candidate commodity advertisement set; setting the second price of the candidate commodity with the ratio smaller than one as one; and setting the second price of the candidate commodity with the ratio larger than one as the corresponding ratio.

Further, the method for acquiring the matching distance includes:

obtaining a matching distance according to a matching distance formula, wherein the matching distance formula comprises:

wherein the content of the first and second substances,

for the user

And the user

The matching distance between the two is less than the matching distance,

for the user

The histogram of the visual stay time of the commodity of (1),

for the user

The histogram of the visual stay time of the commodity of (1),

is composed of

And

the degree of similarity between the two images,

for the user

The hot word of (2) extends the feature reference,

for the user

The hot word of (2) extends the feature reference,

for the user

And the user

The largest candidate cost in the intersection between them,

a function is obtained for the cosine distance.

The invention has the following beneficial effects:

1. the embodiment of the invention obtains TF-IDF information and visual retention time of each user browsed commodity according to a historical database of user browsing data, represents the visual retention time distribution characteristics in the user browsing process according to a commodity retention time histogram, and represents the browsing semantic characteristics of each user according to a hot word extension characteristic standard. The advertisement recommendation method and the advertisement recommendation system further combine the candidate cost of each candidate commodity in the candidate commodity advertisement set and the matching relation between the users to carry out advertisement recommendation on the target users, wherein the recommendation process considers the browsing retrieval cost of the users and the browsing habits of the matched users, and can provide advertisement putting combinations which are attractive and can represent a user group, so that the experience of the users and the advertisement putting hit rate can be improved, the users can be dynamically guided to see more and more novel products, and the browsing habits of the users are met while the information limitation is avoided.

2. According to the embodiment of the invention, browsed commodities are divided into browsing page commodities and retrieval page commodities according to the browsing types of users, and the cross heat is obtained according to the difference of elements in TF-IDF sets of two different types of commodities, so that the subsequent hot word extension characteristic datum is closer to the browsing habits of the users, and the reference of the hot word extension characteristic datum is increased.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of an advertisement delivery method based on user browsing habit data analysis according to an embodiment of the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined invention, the following detailed description is provided with reference to the accompanying drawings and preferred embodiments for an advertisement delivery method based on user browsing habit data analysis, and its specific implementation, structure, features and effects. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following describes a specific scheme of the advertisement delivery method based on user browsing habit data analysis, which is provided by the present invention, in detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of an advertisement delivery method based on user browsing habit data analysis according to an embodiment of the present invention is shown, where the method includes:

step S1: obtaining the visual stay time of each user on each commodity in the historical database; the commodities comprise browsing page commodities and retrieval page commodities; and constructing a browsing page commodity TF-IDF set and a retrieval page commodity TF-IDF set according to the browsing records of each user.

When a user browses commodities on a shopping website, a website background can construct a historical database of each user according to various information such as retrieval contents, browsing duration and the like of the user, namely data in the historical database comprise browsing habit characteristics of the user. Further, considering that the browsing behavior and the retrieval behavior of the user are accompanied relationships but the characteristics of the browsing behavior and the retrieval behavior are different in the user's purpose, the user can open browsing for a period of time based on the retrieval, and the browsing also causes the user to modify the retrieval keywords, so that the commodities in the user history database are divided into browsing page commodities and retrieval page commodities. The browsing page commodity is a commodity of which the user browses a commodity detail page; the search page commodities are commodities that the user only browses on the search page, and the commodity information contained in the search page is less, and one search page contains various commodities.

In the browsing process of the user, the browsing time reflects the attention degree of the user to the commodity, that is, the longer the browsing time of the user to a commodity is, the higher the attention degree of the user to the commodity is. And the browsing time of the user for the commodities can also represent the shopping habit of the user, namely, the longer the browsing time, the more careful the user selects the commodities. Therefore, the statistics of the historical data in the historical database are performed to obtain the visual stay time of each user on each commodity in the historical database, and the method specifically comprises the following steps:

and obtaining the visual retention time of the search page of a commodity search page browsed by the user, wherein the visual retention time of commodities on all the search pages on the search page is equal to that of the search page corresponding to the search page. And obtaining the visual retention time of a browsing page of a commodity detail page browsed by a user, and taking the visual retention time of the browsing page as the visual retention time of the commodity of the corresponding browsing page. It should be noted that, for the browsed page goods, the retention time in the detail page of the browsed page goods is related to the total interaction time of the detail page, that is, the time represented by the detail page sliding is integrated, in the implementation process, the implementer can delay several seconds after each detail page sliding so as to more accurately represent the visual retention time of the browsed page goods, and the specific delay time may be specifically set according to an actual situation, which is not limited herein.

It should be noted that, in order to facilitate subsequent histogram statistics, all visual dwell times are extremely poorly normalized in the embodiment of the present invention, i.e., the visual dwell times are normalized. For longer visual dwell times, it is close to 1; for shorter visual dwell times, indicating that the user is not paying attention to such goods, it is close to 0.

Because browsing and searching behaviors are different, in order to find potential features, the word set features of each commodity in the historical database need to be counted, and after the word set statistics is carried out on the historical database, the word frequency of all content words such as titles, introduction texts and the like in all commodities can be obtained.

TF-IDF is a statistical method for evaluating the importance of a word to one of a set of documents or a corpus to obtain semantic information of a term by means of statistical word frequency (TF) and inverse text frequency Index (IDF). The word frequency information of different keywords in each commodity can be obtained by analyzing the historical database, namely the TF-IDF of each commodity is a vector containing multiple elements. And constructing a browsing page commodity TF-IDF set and a retrieval page commodity TF-IDF set according to the browsing records of each user in the historical database. It should be noted that TF-IDF is a well-known technical means for those skilled in the art, and the detailed algorithmic process is not described in detail.

Step S2: acquiring the cross heat of each commodity according to the difference distance between the TF-IDF set elements of the commodities on the browsing page and the TF-IDF set elements of the commodities on the retrieval page; screening the commodities according to the cross popularity to obtain hot word commodities, and taking the average TF-IDF of the hot word commodities as a hot word extension characteristic standard of a corresponding user; a commodity visual stay time histogram is obtained for each user.

According to the browsing practical situation of the user, for a browsing page commodity, the browsing page commodity should appear on the retrieval page first, and then the detail page is entered by clicking of the user, so that the browsing page commodity and the retrieval page commodity have cross characteristics. For a browsing page commodity, the more similar commodities in the search page commodity set, the higher the commodity popularity of the browsing page; the search page commodity is the same. Therefore, the cross-heat of each commodity is obtained according to the difference distance between the commodity TF-IDF set elements of the browsing page and the commodity TF-IDF set elements of the retrieval page, the cross-heat reflects the attention degree of a user to a certain commodity in the commodity browsing process, and the larger the cross-heat is, the more the retrieval times or browsing times of the user are. The specific method for acquiring the cross heat comprises the following steps:

a first set of neighbor samples for each commodity in the non-belonging TF-IDF set is obtained. For a target browsing page commodity, a first neighbor sample set of the target browsing page commodity is selected from a corresponding TF-IDF set of the target browsing page commodity according to the similarity of TF-IDF, namely all samples in the first neighbor sample set are TF-IDF of the target browsing page commodity and are a plurality of samples with the largest TF-IDF similarity from the target browsing page commodity; the same applies to the search page commodity. It should be noted that the number of samples in the first neighboring sample set may be specifically set according to a specific implementation scenario, and is not limited herein.

wherein the content of the first and second substances,

is as follows

The cross-heat of the individual articles,

is as follows

The TF-IDF corresponding to each commodity,

is the first in the intersection set

A TF-IDF, a first identification number (TF-IDF),

a function is solved for the cosine similarity.

In the cross-heat formula,

the cosine distance is represented, wherein 1 in the denominator has the function of preventing the denominator from being 0, namely the whole formula is the reciprocal of the average cosine distance of one commodity, and the larger the average distance is, the cooler the corresponding commodity is, and the smaller the cross heat is.

The larger the cross heat degree is, the more concerned the user is about the corresponding commodity, so that the commodities can be screened according to the cross heat degree to obtain the hot word commodities, wherein the semantic information reflected by the TF-IDF information of the hot word commodities represents the semantic information which is concerned by the user, and therefore the average TF-IDF of the hot word commodities is used as the hot word extension characteristic standard of the corresponding user. The hot word extension feature benchmark represents the attention information of the user to the commodity keyword. It should be noted that, because TF-IDF is a vector, when calculating the average, an average reference should be constructed for each dimension of data of all features to obtain a hotword extension feature reference.

Specifically, screening the commodities according to the cross heat degree, and obtaining the hot-word commodities comprises the following steps:

and obtaining a difference distance according to the cross heat difference between the commodities and the TF-IDF difference, and grouping the commodities by using a GMM algorithm according to the difference distance to obtain at least two commodity categories. And sorting the commodity categories according to the cross heat degree in each commodity category, selecting a front preset number of commodity categories as hot word categories, and taking the commodities in the hot word categories as hot word commodities. The obtaining of the difference distance according to the cross heat difference and the TF-IDF difference among commodities comprises the following steps:

taking the cosine distance of TF-IDF between commodities as TF-IDF difference; taking the absolute value of the difference value of the cross heat degrees between the commodities as the difference of the cross heat degrees; the product of the cross-heat difference and the TF-IDF difference is taken as the difference distance between the commodities. I.e. the difference distanceThe expression of (c) is:

wherein, in the step (A),

is the difference distance between article a and article b,

is the cross-heat of the article a,

is the cross-heat of the article b,

is the TF-IDF of the commercial product a,

is the TF-IDF of the commercial product b,

a function is obtained for the cosine similarity. The cosine distance of the TF-IDF between the commodities is calculated to be used for restraining the cross heat difference, if the semantic similarity between the commodities is small, and the cross heat difference is small, the cross heat difference is probably caused by small intersection of browsing and searching of a user, so that errors of the commodities can be corrected by only the cross heat difference through restraint, and different commodity types can be further distinguished.

It should be noted that the GMM algorithm is a classification algorithm well known to those skilled in the art, and specific algorithm steps are not described again. The number of commodity categories obtained after the algorithm is executed may be specifically set according to a specific implementation scenario, which is not limited in the embodiment of the present invention. In the embodiment of the invention, the preset number is set to be half of the number of the commodity categories, and the commodity categories can be sequenced according to the average cross heat of all samples in the commodity categories, so that the hot word categories are selected.

And further obtaining a commodity visual retention time histogram of the user every month, and according to the expression of the visual retention time in the step S1, the visual retention time distribution characteristics reflected by the commodity visual retention time histogram can express the browsing habit and the browsing style of the user. In the embodiment of the invention, the visual retention time is divided into 10 levels, namely 10 columnar bodies exist in a commodity visual retention time histogram, the abscissa is the visual retention time level, and the ordinate is the corresponding occurrence frequency.

And step S3: obtaining a first cost according to TF-IDF of each candidate commodity in the candidate commodity advertisement set and hot word extension characteristic benchmark difference; obtaining a second cost according to TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set; the sum of the first cost and the second cost is used as a candidate cost for each candidate item.

The candidate cost of each candidate commodity in the candidate commodity advertisement set reflects the difference between the candidate commodity and the commodity frequently browsed by the user, and the practice of internet marketing shows that some products which are difficult to see and are not frequently browsed by the user exist in the home page of the shopping website, so that the user is attracted to continuously browse the commodity on the platform. Therefore, the larger the candidate cost is, the more difficult the user can see the corresponding candidate goods in the candidate goods advertisement set under the condition of keeping the existing browsing habit, the more the user should push out the corresponding candidate goods in the subsequent advertisement push, so that the user can generate interest in the shopping platform and the candidate goods, and the exposure of the candidate advertisements is increased.

The candidate cost is divided into two parts, wherein the first cost is obtained according to the TF-IDF of each candidate commodity in the candidate commodity advertisement set and the hot word extension characteristic benchmark difference, namely the first cost reflects the difference between the semantic characteristics of the common browsing of the user and the semantic characteristics of the candidate commodities; the second price is obtained according to TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set, and reflects the unique degree of the candidate commodities in the candidate commodity advertisement set.

The specific method for acquiring the first cost comprises the following steps: the Mahalanobis distance between the TF-IDF of each candidate commodity in the candidate commodity advertisement set and the hot word extension feature benchmark is taken as a first cost. It should be noted that mahalanobis distance is a well-known technical means for those skilled in the art, and is not described herein. The larger the first price, the more irrelevant the corresponding candidate item is to indicate the type of the item that the user often focuses on.

The specific method for acquiring the second cost comprises the following steps: obtaining a second neighbor set of each candidate commodity in the candidate commodity advertisement set, and obtaining the maximum TF-IDF difference of each candidate commodity and a sample in the corresponding second neighbor set; taking the median of all the maximum TF-IDF differences in the candidate commodity advertisement set as basic data to obtain the ratio of the maximum TF-IDF difference to the basic data of each candidate commodity in the candidate commodity advertisement set; setting the second price of the candidate commodity with the ratio smaller than one as one; and setting the second price of the candidate commodity with the ratio larger than one as the corresponding ratio. It should be noted that the second neighbor set is obtained according to the TF-IDF similarity between each candidate product and other candidate products, that is, a plurality of other candidate products most similar to the target candidate product TF-IDF are selected as the second neighbor set of the target candidate product, and the number of samples in the second neighbor set may be specifically set according to a specific scenario, which is not described herein again. For each candidate commodity, the larger the maximum TF-IDF difference in the second neighbor set is, the more discrete the distribution of the corresponding second neighbor set is, i.e. the more special the corresponding candidate commodity is in the word set space, the more difficult the user can retrieve the corresponding candidate commodity.

And taking the sum of the first price and the second price as the candidate price of each candidate commodity.

And step S4: judging whether an intersection exists between the candidate commodity advertisement set of the target user and the browsing record sets of other users, if so, obtaining a matching distance between the target user and the other users according to the commodity visual retention time histogram similarity, the hot word extension feature reference similarity and the maximum candidate cost in the intersection, and obtaining a matching user with the target user according to the matching distance; and delivering the advertisement of the candidate commodity corresponding to the maximum candidate cost in the intersection of the matched user and the target user to the target user.

When the advertisement is put, in order to avoid the limitation of information, the browsing habits among different users are considered while the candidate commodity advertisement set is considered, so that the finally pushed candidate commodity can be ensured to bring freshness to the target user, and the receiving degree of the target user to the pushed candidate commodity can be ensured.

Firstly, whether an intersection exists between a candidate commodity advertisement set of a target user and browsing record sets of other users needs to be judged, if the intersection does not exist, two completely different user groups exist between the two users, and the target user cannot be pushed with advertisements according to browsing information of the other users; if the intersection exists, the other users can provide reference for pushing of the target user, further, the matching distance between the target user and the other users is obtained according to the commodity visual retention time histogram similarity, the hot word extension feature benchmark similarity and the maximum candidate cost in the intersection, the matching user with the target user is obtained according to the matching distance, namely the matching user is one of the other users which is most matched with the target user, and therefore the advertisement of the candidate commodity corresponding to the maximum candidate cost in the intersection of the matching user and the target user can be delivered to the target user.

In the embodiment of the present invention, the KM matching algorithm is selected according to the matching distance to obtain the matching user with the target user, and the KM algorithm is a technical means well known to those skilled in the art and will not be described herein.

The specific method for obtaining the matching distance comprises the following steps:

wherein, the first and the second end of the pipe are connected with each other,

for the user

And the user

The matching distance between the two or more of the two,

for the user

The visual residence time histogram of the commercial product of (1),

for the user

The histogram of the visual stay time of the commodity of (1),

is composed of

And

the degree of similarity between the two images,

for the user

The hot word of (2) extends the feature reference,

for the user

The hot word of (2) extends the feature reference,

for the user

And the user

The largest candidate cost in the intersection between them,

a function is obtained for the cosine distance.

In the matching distance formula, the distance between the two matching points,

the difference between the commodity visual retention time histograms is expressed, and the commodity visual retention time histogram difference can reflect the difference of viewing time lengths when a user browses and opens commodities and can also reflect the difference of browsing habits such as the fact that the user browses a large quantity of commodities or watches a large number of commodities independently;

the hot word extension characteristic benchmark difference is represented, and the larger the hot word extension characteristic benchmark difference is, the larger the difference between the washed numbers of the two users and the concerned contents is. The larger the two differences are, the more irrelevant the browsing habits and the types of the browsed commodities between the two users are, the larger the matching distance is; the larger the matching distance is, the more the corresponding candidate product is the product to be pushed, and the smaller the matching distance is.

The reason for obtaining the matching distance is that the information amount of different commodities is different, for example, the browsing duration characteristic difference between some scientific and technological products, cosmetics and some daily necessities during browsing is large, and the attention degree and the reading careful degree of a user can be represented; therefore, when the browsing habit is similar to the type of the browsed commodity, the characteristic difference of the read commodity information is further matched.

The advertisement of the candidate commodity corresponding to the maximum candidate cost in the intersection of the matching user and the target user is delivered to the target user, a commodity advertisement pushing result with a heuristic effect can be provided for the target user, and because the matching user with the browsing content and similar browsing habit of the target user browses the product once, the product can more easily allow the target user to pay attention to other products with more characteristics, so that the target user is guided to browse the commodity, and the advertisement delivery effect and the effect of attracting the user to browse are maximized.

In summary, the embodiment of the invention obtains the commodity visual retention time histogram and the hotword extension feature benchmark of each user by counting the visual retention time and the TF-IDF information of each commodity. And calculating the candidate cost of each candidate advertisement in the candidate commodity advertisement set according to the hot word extension characteristic benchmark of the user. And further matching the target users through the commodity visual retention time histogram, the hot word extension characteristic standard and the candidate cost information among the users to obtain the matched users of the target users, and taking the candidate commodity with the maximum candidate cost in the intersection between the candidate commodity advertisement set of the target users and the browsing record set of the matched users as the push commodity. The embodiment of the invention avoids the information limitation of advertisement delivery, and can guide the user to browse the commodities with richer types according to the browsing habit and the browsing content of the user.

It should be noted that: the sequence of the above embodiments of the present invention is only for description, and does not represent the advantages or disadvantages of the embodiments. The processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An advertisement delivery method based on user browsing habit data analysis is characterized by comprising the following steps:

obtaining the cross heat of each commodity according to the difference distance between the TF-IDF set elements of the commodities on the browsing page and the TF-IDF set elements of the commodities on the retrieval page; screening the commodities according to the cross popularity to obtain hot word commodities, and taking the average TF-IDF of the hot word commodities as a hot word extension characteristic standard of a corresponding user; obtaining a commodity visual retention time histogram of each user;

obtaining a first cost according to TF-IDF of each candidate commodity in the candidate commodity advertisement set and hot word extension characteristic benchmark difference; obtaining a second cost according to TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set; the sum of the first cost and the second cost is used as a candidate cost of each candidate commodity;

2. The advertisement delivery method based on analysis of user browsing habit data according to claim 1, wherein said obtaining the visual stay time of each user on each commodity in the history database comprises:

3. The method of claim 1, wherein the obtaining the cross-heating degree of each commodity according to the difference distance between the TF-IDF collective elements of the commodities on the browsing page and the TF-IDF collective elements of the commodities on the retrieval page comprises:

wherein the content of the first and second substances,

is as follows

The cross-heat of the individual articles,

is as follows

The TF-IDF corresponding to each commodity,

is the first neighbor sample set

A TF-IDF to which the TF-IDF is applied,

a function is found for the cosine similarity.

4. The advertisement delivery method based on user browsing habit data analysis as claimed in claim 1, wherein the step of screening the commodities according to the cross-popularity to obtain the hotword commodities comprises the steps of:

obtaining a difference distance according to the cross heat difference and the TF-IDF difference between the commodities, and grouping the commodities by using a GMM algorithm according to the difference distance to obtain at least two commodity categories; and sorting the commodity categories according to the cross heat degree in each commodity category, selecting the commodity categories with the preset number as hot word categories, and taking the commodities in the hot word categories as hot word commodities.

5. The advertisement delivery method based on analysis of user browsing habit data according to claim 4, wherein said obtaining the difference distance according to the cross-heat difference and TF-IDF difference between commodities comprises:

6. The method of claim 1, wherein the obtaining a first price according to the TF-IDF of each candidate product in the candidate product advertisement set and the benchmark difference of the hotword extension features comprises:

7. The advertisement delivery method based on analysis of user browsing habit data according to claim 1, wherein said obtaining the second price according to the TF-IDF difference between each candidate commodity and other candidate commodities in the candidate commodity advertisement set comprises:

8. The advertisement delivery method based on the analysis of the user browsing habit data according to claim 1, wherein the method for obtaining the matching distance comprises: