WO2019128311A1

WO2019128311A1 - Advertisement similarity processing method and apparatus, calculation device, and storage medium

Info

Publication number: WO2019128311A1
Application number: PCT/CN2018/105093
Authority: WO
Inventors: 刘夏龙
Original assignee: 广东神马搜索科技有限公司
Priority date: 2017-12-29
Filing date: 2018-09-11
Publication date: 2019-07-04
Also published as: CN108269122A; CN108269122B

Abstract

An advertisement similarity processing method and apparatus, a calculation device, and a storage medium. The method comprises: obtaining an advertisement text set, and obtaining an advertisement click set (101); determining semantic similarity between a first advertisement and a second advertisement according to the advertisement text set (102); determining click similarity between the first advertisement and the second advertisement according to a user click set (103); and determining similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity (104). The similarity among all advertisements is determined, so that similar advertisements can be pushed to a user during advertisement pushing.

Description

Method and device for processing similarity of advertisement, computing device and storage medium

Technical field

The present invention relates to the field of advertising technologies, and in particular, to a similarity processing method and apparatus for an advertisement, a computing device, and a storage medium.

Background technique

With the development of media technology, advertising is increasingly being applied to media technology. Advertising is widely used as an important means of promoting products; when advertising, it is necessary to consider the similarity between advertisements, so as to push advertisements of similar products to the user's terminal, thereby facilitating users to obtain more product information.

In the prior art, when analyzing the similarity between advertisements, the keyword information of the advertisement is generally obtained, and then the advertisement information is determined according to the keyword information of the advertisement, and then the similar advertisement is pushed to the appropriate user. group.

However, in the prior art, due to the constant change of the advertising user and the complexity of the advertising text, when analyzing the similarity between the advertisements, it is easy to extract the wrong keyword information, and then analyze the similarity between the advertisements. It’s not accurate. Further, the ads that are pushed to the user community are not similar ads, and the ads are pushed incorrectly.

Summary of the invention

The invention provides a similarity processing method and device for advertising, which is used to solve the problem that the similarity between the analyzed advertisements is not accurate.

In one aspect, the present invention provides a similarity processing method for an advertisement, including:

Obtaining a set of advertisement text, wherein the set of advertisement text includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring the user Clicking on the collection, wherein the user clicks the set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of other advertisements that the at least one user clicked, the first advertisement and the second advertisement It is also an advertisement that at least one of the above users has clicked;

Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts;

Determining a click similarity between the first advertisement and the second advertisement according to the user clicking the set;

And determining similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.

Further, determining a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set, including:

Establishing a semantic similarity objective function according to the set of advertisement texts;

Solving the semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement and an advertisement text of the second advertisement in an optimal state of the semantic similarity objective function Overall vectorized representation;

The semantic similarity is determined according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.

Further, the establishing a semantic similarity objective function according to the set of advertisement texts includes:

Establishing a first preset function of the _tth feature information in the set of advertisement texts according to the set of advertisement texts

Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w _tk , . . . , w _t+k ;W) denotes a formalized function, and W denotes a set of the advertisement text The _{t t} feature information, w _tk represents the _tkth feature information in the advertisement text set, w _t+k represents the t+kth feature information in the advertisement text set, and k represents the semantic similarity to be established The window size of the degree objective function, t ∈ [k, T], T represents the sum of the number of feature information in the set of advertisement texts, and k, t, and T are positive integers;

Establishing a first probability distribution function according to the set of advertisement texts

Where i∈[tk,t+k], i is a positive integer; w _t represents the t-th feature information in the advertisement text set;

Preset information according to a first function of the advertisement text w _t of the feature set, and the first probability distribution function, establishing the semantic similarity objective function

Further, the semantic similarity is

Wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.

Further, determining a click similarity between the first advertisement and the second advertisement according to the user click set, including:

Establishing a click similarity objective function according to the user clicking the set;

Solving the click similarity objective function to determine a vectorized representation of the first advertisement and an vectorized representation of the second advertisement in an optimal state of the click similarity objective function;

The click similarity is determined based on the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.

Further, establishing a click similarity objective function according to the user clicking the set, including:

Establishing, according to the user click set, a second preset function of the feature information of the w′ _t advertisements in the user click set

Where b' represents a preset second deviation value, U' represents a preset second parameter vector, h'( _w'tk , ..., w't _+k ; W') represents a formalized function, and W' represents The user clicks on the feature information of the w′ _t advertisements in the set, and w′ _t′ _-k′ represents the feature information of the _t′ -k′ advertisements in the user click collection, w′ _t′+k′ Representing that the user clicks on the feature information of the t'+k'th advertisement in the set, and k' represents the window size of the click similarity degree objective function to be established, t'∈[k', T'], T' The user clicks on the sum of the number of advertisements in the collection, and k', t', and T' are all positive integers;

Establishing a second probability distribution function according to the user clicking on the set

Where i'∈[t'-k', t'+k'], i' is a positive integer; w't _' indicates that the user clicks on the feature information of the t'th advertisement in the set;

Establishing the click similarity objective function according to a second preset function of the feature information of the w′ _t advertisements in the set of the user clicks, and the second probability distribution function

Further, the click similarity is

Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.

Further, determining the similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity, including:

Obtaining a frequency of user clicks of the second advertisement;

The similarity information is determined according to the user click frequency, the semantic similarity, and the click similarity.

Further, the similarity information is Sim=(1/log(TF))*Sim _content +Sim _session ;

Wherein, TF represents the user click frequency, Sim _content represents the semantic similarity, and Sim _session represents the click similarity.

In another aspect, the present invention provides an advertisement similarity processing apparatus, including:

An obtaining unit, configured to obtain an advertisement text set, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and advertisement text of the second advertisement Overall feature information, feature information of each of the second advertisements, feature information of the advertisement text of at least one other advertisement, and characteristics of each of the other advertisements of the at least one other advertisement Information, and obtaining a user click set, wherein the user click set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of at least one other advertisement that the user clicked, first The advertisement and the second advertisement are also advertisements that the at least one user has clicked on;

a first determining unit, configured to determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set;

a second determining unit, configured to determine, according to the user click set, a click similarity between the first advertisement and the second advertisement;

a third determining unit, configured to determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.

Further, the first determining unit includes:

a first establishing module, configured to establish a semantic similarity objective function according to the set of advertisement texts;

a first solving module, configured to solve the semantic similarity objective function to determine a vectorized representation of the overall advertisement text of the first advertisement in an optimal state of the semantic similarity objective function, and a vectorized representation of the overall advertising text of the second advertisement;

The first determining module is configured to determine the semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.

Further, the first establishing module includes:

a first establishing submodule, configured to establish, according to the advertisement text set, a first preset function of the _tth feature information in the advertisement text set

Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w _tk , . . . , w _t+k ; W) denotes a formalized function, and W denotes in the advertisement text set The _{t t} feature information, w _tk represents the _tkth feature information in the advertisement text set, w _t+k represents the t+kth feature information in the advertisement text set, and k represents the semantic similarity to be established The window size of the degree objective function, t ∈ [k, T], T represents the sum of the number of feature information in the set of advertisement texts, and k, t, and T are positive integers;

a second establishing submodule, configured to establish a first probability distribution function according to the set of advertisement texts

Establishing a third sub-module, a first predetermined function of the advertisement text w _t features set information, the semantic similarity objective function and the first probability distribution function, according to established

Further, the semantic similarity is

Further, the second determining unit includes:

a second establishing module, configured to establish a click similarity objective function according to the user clicking the set;

a second solving module, configured to solve the click similarity objective function to determine a vectorized representation of the first advertisement and an optional second advertisement in an optimal state of the click similarity objective function Vectorized representation

a second determining module, configured to determine the click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.

Further, the second establishing module includes:

a fourth establishing submodule, configured to establish, according to the user click set, a second preset function of the feature information of the w′ _t advertisements in the user click set

a fifth establishing submodule, configured to establish a second probability distribution function according to the user clicking the set

a sixth establishing submodule, configured to establish the click similarity objective function according to a second preset function of the feature information of the w′ _t advertisements in the user click set, and the second probability distribution function

Further, the click similarity is

Further, the third determining unit includes:

An obtaining module, configured to acquire a user click frequency of the second advertisement;

And a third determining module, configured to determine the similarity information according to the user click frequency, the semantic similarity, and the click similarity.

In another aspect, the present invention provides a computing device comprising:

Processor;

A memory having stored thereon executable code that, when executed by the processor, causes the processor to perform the method of any of the above.

In another aspect, the present invention provides a non-transitory machine readable storage medium having stored thereon executable code that, when executed by a processor of an electronic device, causes the processor to perform the above One of the methods described.

The method and device for processing similarity of an advertisement provided by the present invention, by acquiring an advertisement text set, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and The feature information of the advertisement text as a whole, the feature information of each word in the second advertisement, the feature information of the advertisement text of at least one other advertisement, and each of the words of each of the other advertisements of the at least one other advertisement Feature information, and obtaining a user click set, wherein the user clicks the feature including the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked, the first advertisement and the second advertisement And the advertisement that is clicked by the at least one user; determining a semantic similarity between the first advertisement and the second advertisement according to the advertisement text collection; determining a click similarity between the first advertisement and the second advertisement according to the user clicking the collection; Determine the first ad and the first based on semantic similarity and click similarity The similarity between the advertising information. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted. And, from the perspective of the behavior of the user clicking on the advertisement, to obtain a large number of advertisements clicked by the users belonging to the same group, and then, the advertisements clicked by the users belonging to the same group constitute a user click collection, Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements; and the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements; The semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement A similar advertisement can accurately determine the similarity between advertisements degree. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.

DRAWINGS

The accompanying drawings, which are incorporated in the specification

1 is a schematic flowchart of a method for processing similarity of an advertisement according to an embodiment of the present application;

2 is a schematic diagram of a click session log in an advertisement similarity processing method according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a neural network model in a similarity processing method for an advertisement according to an embodiment of the present disclosure;

FIG. 4 is a schematic flowchart diagram of another method for processing similarity of an advertisement according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of an apparatus for processing similarity of an advertisement according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of another similarity processing apparatus for an advertisement according to an embodiment of the present invention.

FIG. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention.

The embodiments of the present disclosure have been shown by the above-described drawings, which will be described in more detail later. The drawings and the text are not intended to limit the scope of the present disclosure in any way, and the description of the present disclosure will be described by those skilled in the art by reference to the specific embodiments.

Detailed ways

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. The following description of the drawings refers to the same or similar elements in the different figures unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present disclosure. Instead, they are merely examples of devices and methods consistent with aspects of the present disclosure as detailed in the appended claims.

First, the nouns involved in the present invention are explained:

Word Embedding: refers to word embedding technology; specifically, the word is vectorized, the abstraction of the entity becomes a mathematical description, and it can be modeled and applied to many tasks, such as comparing similarities between words. It can be determined directly by the cosine distance metric between vectors.

DSSM (Deep Structured Semantic Model): This is a neural network model, also known as sent2vec.

Stochastic Gradient Descent (SGD) is a common method for solving unconstrained optimization problems. It has the advantage of simple implementation. The stochastic gradient descent method is an iterative algorithm. Each step needs to solve the gradient vector of the objective function.

The specific application scenario of the present invention is as follows. With the development of media technology and terminal technology, more and more advertisements need to be put into media technology; advertisements can be pushed to users, and users can be divided into multiple user groups according to user characteristics, and then to each user group. Push similar ads; or push a series of similar ads directly to the user. So how to accurately determine which advertisements are similar, that is, the similarity between advertisements, is a problem that needs to be solved.

The similarity processing method and apparatus for advertising provided by the present invention are intended to solve the above technical problems of the prior art.

The technical solutions of the present invention and how the technical solutions of the present application solve the above technical problems will be described in detail below with reference to specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

FIG. 1 is a schematic flowchart diagram of a method for processing similarity of an advertisement according to an embodiment of the present application. As shown in Figure 1, the method includes:

Step 101: Obtain an advertisement text set, where the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature including the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked. The first advertisement and the second advertisement are also advertisements that the at least one user clicked.

In this embodiment, specifically, the execution subject of the embodiment may be an advertisement similarity processing device, a server, or other device that can perform the method of the embodiment.

First, you need to get every ad provided by the advertiser; then analyze each ad, and then split each ad into multiple words; and then get a set of advertising text. Included in the advertisement text set, feature information of the entire advertisement text of each of the plurality of advertisements, and feature information of each word of each of the plurality of advertisements; wherein the plurality of advertisements are to be analyzed The first ad and the second ad. Moreover, the feature information of the entire advertisement text of each advertisement is a vector, and the feature information of each word is also a vector.

For example, an advertisement text collection is generated according to ten thousand advertisements, and the advertisement text collection includes feature information of the advertisement text as a whole of the advertisement 1 , feature information of the word 1 of the advertisement 1 , feature information of the word 2 of the advertisement 1 , and advertisement Feature information of word 3, feature information of advertisement text of advertisement 2, feature information of word 2 of advertisement 2, feature information of word 3 of advertisement 2, feature information of word 4 of advertisement 2, advertisement text of advertisement 3 Overall feature information, feature information of the word 2 of the advertisement 3, feature information of the word 3 of the advertisement 3, feature information of the word 4 of the advertisement 3, feature information of the entire advertisement text of the advertisement 4, and feature information of the word 4 of the advertisement 4. The feature information of the word 5 of the advertisement 4, the feature information of the word 6 of the advertisement 4, and the like; wherein the labels of the different words represent different words, the advertisement 1 is the first advertisement, and the advertisement 2 is the second advertisement. It is necessary to analyze the similarity between advertisement 1 and advertisement 2.

Moreover, it is necessary to obtain advertisements that have been clicked by a plurality of users, and to advertise an advertisement that has been clicked by a plurality of users into one user click collection. Specifically, first, each user's Click Session log is obtained, and according to each user's Click Session log, an advertisement that each user clicks is determined; then, each user clicks on the advertisement. Put into a user click collection; the user click collection includes feature information of each advertisement in the advertisement that the plurality of users clicked, wherein the advertisements that the plurality of users clicked have the first to be analyzed One ad and the second ad. It can be seen that the first advertisement and the second advertisement are also advertisements that the user clicked. The feature information of each advertisement is a vector. For example, because a group of users with the same interest has a preference for clicks on advertisements, advertisements clicked by users belonging to the same group also reflect the similarity of the advertisements themselves; and thus can be obtained by users who belong to the same group. An advertisement that clicks on an ad that has been clicked by a user belonging to the same group to form a user click collection, and then images and categorizes the advertisements. 2 is a schematic diagram of a click session log in an advertisement similarity processing method according to an embodiment of the present application. As shown in FIG. 2, by analyzing a user's click behavior, the content of an advertisement that the user has clicked may be obtained. . Moreover, the user clicks on a large amount of advertisement click behavior, wherein each advertisement click behavior corresponds to an advertisement, and the massive click behavior can avoid the noise deviation problem between the advertisements.

For example, the 10,000 advertisements that the users belonging to the same group have clicked can be obtained, and the 10,000 advertisements constitute a user click collection, and the user clicks the collection to include the feature information of the advertisement 1 and the feature information of the advertisement 2 The feature information of the advertisement 3, the feature information of the advertisement 4, and the like, wherein the advertisement 1 is the first advertisement and the advertisement 2 is the second advertisement, and the similarity between the advertisement 1 and the advertisement 2 needs to be analyzed.

Step 102: Determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set.

In this embodiment, specifically, according to the neural network model and the Word Embedding technology, the feature information of the entire advertisement text of each advertisement in the advertisement text collection, and the feature information of each word in each advertisement are analyzed, The first set of advertisements and the second advertisements to be analyzed are included in the set of advertisement texts, and the semantic similarity between the first advertisements and the second advertisements can be determined. In this embodiment, the semantic similarity characterizes the extent to which the second advertisement is like the first advertisement.

FIG. 3 is a schematic structural diagram of a neural network model in an advertisement similarity processing method according to an embodiment of the present application. As shown in FIG. 3, the first layer in the neural network model is a classifier; the neural network; The second layer in the model is the Average/Concatenate layer, which represents a connection form of the lower layer network to the upper layer network; the last layer in the neural network model represents the advertisement matrix. (Paragraph matrix), that is, the vectorized representation of all advertisements. For example, D represents an advertisement, paragraph is the meaning of a paragraph, Paragraph refers to an advertisement, and W is a prefix of a word (Word) in each advertisement.

Step 103: Determine a click similarity between the first advertisement and the second advertisement according to the user clicking the set.

In this embodiment, specifically, the neural network algorithm and the Word Embedding technology are used to model the user click set, wherein the neural network algorithm has a continuous bag of words (Cbow) and skip- The gram structure, here, the neural network algorithm can adopt a skip-gram structure; and then analyze the feature information of each advertisement to obtain the click similarity between the first advertisement and the second advertisement. In this embodiment, the click similarity characterizes the extent to which the second advertisement is like the first advertisement.

Step 104: Determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.

In an optional implementation manner, the step 104 includes: determining, according to the semantic similarity and the click similarity, the similarity information between the first advertisement and the second advertisement, including: acquiring a user click frequency of the second advertisement; The similarity information is determined according to the user click frequency, semantic similarity, and click similarity.

In an optional implementation manner, the similarity information is Sim=(1/log(TF))*Sim _content +Sim _session ; wherein TF represents a user click frequency, Sim _content represents a semantic similarity, and Sim _session represents a click. Similarity.

In this embodiment, specifically, the similarity information may be calculated according to the calculated semantic similarity and click similarity. Specifically, since it is necessary to calculate how much the second advertisement is like the first advertisement, and first obtain the user click frequency TF of the second advertisement, the user click frequency TF is the number of times the second advertisement is clicked by the user; The frequency TF, the semantic similarity Sim _content, and the click similarity Sim _{session are} used to calculate the similarity information between the first advertisement and the second advertisement. The calculation formula of the similarity information may be various, and the embodiment provides a preference. The calculation method can obtain the similarity information as Sim=(1/log(TF))*Sim _content +Sim _session .

In this embodiment, the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks. The first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted. And, from the perspective of the behavior of the user clicking on the advertisement, to obtain a large number of advertisements clicked by the users belonging to the same group, and then, the advertisements clicked by the users belonging to the same group constitute a user click collection, Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements; and the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements; The semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Similarity. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.

FIG. 4 is a schematic flowchart diagram of another method for processing similarity of an advertisement according to an embodiment of the present application. As shown in FIG. 4, the method includes:

Step 201: Obtain an advertisement text set, where the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature including the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked. The first advertisement and the second advertisement are also advertisements that the at least one user clicked.

In this embodiment, specifically, the execution subject of the embodiment may be an advertisement similarity processing device, a server, or other device that can perform the method of the embodiment. This step can be referred to step 101 of FIG. 1 and will not be described again.

Step 202: Establish a semantic similarity objective function according to the set of advertisement texts.

In an optional implementation manner, step 202 specifically includes the following steps:

Step 2021: Establish a first preset function of the _tth feature information in the advertisement text set according to the advertisement text set.

Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w _tk ,..., w _t+k ;W) denotes a formal function, and W denotes a wth in the advertisement text set _t feature information, w _tk represents the tkth feature information in the advertisement text set, w _t+k represents the t+kth feature information in the advertisement text set, and k represents the window size of the semantic similarity objective function to be established, t ∈[k,T],T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are all positive integers.

Step 2022: Establish a first probability distribution function according to the advertisement text set.

Where i∈[tk,t+k], i is a positive integer; w _t represents the t-th feature information in the advertisement text set.

Step 2023, the first predetermined function of information in accordance with a first set of advertising texts w _t of features, and a first probability distribution function, a semantic similarity objective function

In this embodiment, specifically, after step 201, for the set of advertisement texts, a semantic similarity objective function to be solved needs to be established.

Specifically, for sentences and words included in the advertisement text collection, a multi-layered Deep Structured Semantic Models (DSSM) may be used to perform bi-char preprocessing on sentences such as sentences and words, for example, The word is directly processed by the text.

Then, based on all text ads feature information collection, establish a first predetermined function in the ad text for the second w _t a set of characteristic information

It can be seen that for each feature information in the advertisement text set, a first preset function is subsequently established. In the formula of the first preset function, b represents a preset first deviation value, U represents a preset first parameter vector; h(w _tk , ..., w _t+k ; W) representation a function, where W represents the _tth feature information in the set of advertisement texts, w _tk represents the tkth feature information in the set of advertisement texts, and w _t+k represents the t+kth feature information in the set of advertisement texts, k The window size of the semantic similarity objective function to be established, t∈[k,T], T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are positive integers; and each of the advertisement text sets A feature information is a vector.

Then, according to the first preset function of the _{t tth} feature information

And all the feature information in the set of advertisement texts to establish a first probability distribution function

In the first probability distribution function, i ∈ [tk, t + k], i is a positive integer; w _t represents the t-th feature information in the advertisement text set.

Then, the first preset function of the w _tth feature information

Substituting the first probability distribution function

Then, since the first preset function of each feature information in the advertisement text set can be obtained, the first preset function of each feature information can be substituted into the first probability distribution function respectively.

Semantic similarity objective function

Step 203: Solving a semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement in an optimal state of the semantic similarity objective function, and a vectorized representation of the entire advertisement text of the second advertisement .

In this embodiment, specifically, the semantic similarity objective function obtained in step 202 is solved by using a cross entropy method to determine each of the advertisement text sets in the optimal state of the semantic similarity objective function. a vectorized representation of the feature information, that is, a vectorized representation of the overall advertisement text of the first advertisement, a vectorized representation of each word in the first advertisement, a vectorized representation of the overall advertisement text of the second advertisement, and a second advertisement A vectorized representation of each word, a vectorized representation of the overall advertising text of at least one other advertisement, and a vectorized representation of each of the other advertisements of at least one other advertisement.

The optimal state of the semantic similarity objective function may be the maximum value of the semantic similarity objective function, or the optimal state of the semantic similarity objective function may be that the value of the semantic similarity objective function is within a preset range.

Step 204: Determine a semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.

In an alternative embodiment, the semantic similarity is

In this embodiment, specifically, after step 203, the vectorized representation A of the entire advertisement text of the first advertisement and the vectorized representation B of the entire advertisement text of the second advertisement are obtained, and the value of the cosine of the two is obtained. And calculating the semantic similarity between the first advertisement and the second advertisement as

Where J represents the dimension of vector A, and the dimension of vector A is the same as the dimension of vector B, j ∈ [1, J], j, J are positive integers, a _j is the jth value of vector A, b _j is The jth value of vector B.

Step 205: Establish a click similarity objective function according to the user clicking the set.

In an optional implementation manner, step 205 specifically includes the following steps:

Step 2051: Establish a second preset function of the feature information of the w′ _t advertisements in the user click set according to the user clicking the set.

Where b' represents a preset second deviation value, U' represents a preset second parameter vector, h'( _w'tk , ..., w't _+k ; W') represents a formalized function, and W' represents The user clicks on the feature information of the _w′th advertisements in the collection, w′ _t′ _-k′ indicates that the user clicks on the feature information of the _t′ -k′ advertisements in the collection, and w′ _t′+k′ indicates that the user clicks on the collection. The feature information of the t'+k' advertisement in the middle, k' represents the window size of the click similarity objective function to be established, t'∈[k', T'], T' indicates the user clicks on the advertisement in the set The sum of numbers, k', t', T' are all positive integers.

Step 2052: Establish a second probability distribution function according to the user clicking the set.

Where i'∈[t'-k',t'+k'], i' is a positive integer; w't _' indicates that the user clicks on the feature information of the t'th advertisement in the collection.

Step 2053, in accordance with a second predetermined characteristic information function of the user clicks set w _'t ads, and a second probability distribution function, a similarity objective function clicking

In this embodiment, specifically, for the user to click on the feature information in the set, the normalized pre-processing may be performed first.

Then, according to all the feature information in the user clicking on the set, a second preset function is established for the feature information of the w′ _t advertisements in the user click collection.

It can be seen that for the user to click on each feature information in the set, a second preset function is subsequently established. In the formula of the second preset function, b' represents a preset second deviation value, U' represents a preset second parameter vector; h'( _w'tk , ..., w't _+k ; W is ') represents a function of formalization, wherein, W' represents _'t ad of the characteristic information, w' w the user clicks on the set of _{t'-k 'denotes} the first set of user clicks t'-k' th ad The feature information, w′ _t′+k′ indicates that the user clicks on the feature information of the t′+k′ advertisements in the collection, and k′ represents the window size of the click similarity objective function to be established, t′∈[k′,T '], T' represents the sum of the number of advertisements in the user's click collection, k', t', T' are all positive integers; and the user clicks on each feature information in the set as a vector.

Then, according to the second preset function of the feature information of the w′ _t advertisements

And the user clicks on all the feature information in the set to establish a second probability distribution function.

In the second probability distribution function, i'∈[t'-k', t'+k'], i' is a positive integer; w't _' indicates that the user clicks on the feature information of the t'th advertisement in the set.

Then, the second feature information of a predetermined function w _'t ad of

Substituting into the second probability distribution function

Then, since the second preset function of the user clicks each feature information in the set, the second preset function of each feature information can be substituted into the second probability distribution function.

Thus, the click similarity objective function can be obtained.

Step 206: Solving the click similarity objective function to determine a vectorized representation of the first advertisement and an vectorized representation of the second advertisement in an optimal state of clicking the similarity objective function.

In this embodiment, specifically, for the click similarity objective function obtained in step 205, the SGD method may be used to solve the problem, and the user clicks on each of the sets in the optimal state of the click similarity objective function. A vectorized representation of the feature information, ie, a vectorized representation of the first advertisement, a vectorized representation of the second advertisement, a vectorized representation of the third advertisement, and so on. Wherein, the user clicks on each advertisement in the collection as an advertisement that the user clicks. Preferably, each advertisement in the user click collection is an advertisement that the user belonging to the same group has clicked.

The optimal state of the click similarity objective function may be the maximum value of the click similarity objective function, or the optimal state of the click similarity objective function may be that the value of the click similarity objective function is within a preset range.

Step 207: Determine a click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.

In an optional embodiment, the click similarity is

In this embodiment, specifically, after step 206, according to the vectorized representation C of the first advertisement and the vectorized representation D of the second advertisement, the values of the cosine of the two are obtained, thereby calculating the first advertisement and The click similarity between the second ads is

Where J' represents the dimension of the vector C, and the dimension of the vector C is the same as the dimension of the vector D, j'∈[1, J'], j', J' are all positive integers, and a _j' is the _jth of the vector C 'Value, b _j' is the j'th value of the vector D.

Step 208: Determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.

In an optional implementation manner, the step 208 specifically includes: determining, according to the semantic similarity and the click similarity, the similarity information between the first advertisement and the second advertisement, including: acquiring a user click frequency of the second advertisement; The similarity information is determined according to the user click frequency, semantic similarity, and click similarity.

In this embodiment, specifically, this step can be omitted in step 104 of FIG. 1 .

FIG. 5 is a schematic structural diagram of an apparatus for processing similarity of an advertisement according to an embodiment of the present invention. As shown in FIG. 5, the apparatus of this embodiment may include:

The obtaining unit 31 is configured to obtain an advertisement text set, where the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and overall characteristics of the advertisement text of the second advertisement. Information, feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring the user click collection The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked. The first advertisement and the second advertisement are also clicked by the at least one user. ad.

The first determining unit 32 is configured to determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set.

The second determining unit 33 is configured to determine a click similarity between the first advertisement and the second advertisement according to the user click set.

The third determining unit 34 is configured to determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.

The similarity processing device of the advertisement of the embodiment of the present invention can be used for the similarity processing method of the advertisement provided by the embodiment of the present invention, and the implementation principle thereof is similar, and details are not described herein again.

In this embodiment, the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks. The first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted. And, from the perspective of the behavior of the user clicking on the advertisement, to obtain a large number of advertisements clicked by the users belonging to the same group, and then, the advertisements clicked by the users belonging to the same group constitute a user click collection, Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements; and the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements; The semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Degree. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.

FIG. 6 is a schematic structural diagram of another similarity processing apparatus for an advertisement according to an embodiment of the present invention. On the basis of the embodiment shown in FIG. 5, as shown in FIG. 6, in the apparatus provided in this embodiment, the first determination is performed. Unit 32, comprising:

The first establishing module 321 is configured to establish a semantic similarity objective function according to the set of advertisement texts.

The first solving module 322 is configured to solve the semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement and an advertisement text of the second advertisement in an optimal state of the semantic similarity objective function. The overall vectorized representation.

The first determining module 323 is configured to determine a semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.

The first establishing module 321 includes:

a first establishing sub-module 3211, configured to establish, according to the advertisement text set, a first preset function of the _tth feature information in the advertisement text set

a second establishing sub-module 3212, configured to establish a first probability distribution function according to the set of advertisement texts

Establishing a third sub-module 3213, a first information according to a first predetermined function is used to set advertising texts w _t of features, and a first probability distribution function, a semantic similarity objective function

Semantic similarity is

The second determining unit 33 includes:

The second establishing module 331 is configured to establish a click similarity objective function according to the user clicking the set.

The second solving module 332 is configured to solve the click similarity objective function to determine a vectorized representation of the first advertisement and a vectorized representation of the second advertisement in an optimal state of clicking the similarity objective function.

The second determining module 333 is configured to determine the click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.

The second establishing module 331 includes:

The fourth establishing sub-module 3311 is configured to establish, according to the user click set, a second preset function of the feature information of the w′ _t advertisements in the user click collection

a fifth establishing sub-module 3312, configured to establish a second probability distribution function according to the user clicking the set

Establishing a sixth sub-module 3313, for a second predetermined function of the feature information in accordance with user clicks set w _'t of the ad, and a second probability distribution function, a similarity objective function clicking

Click similarity to

The third determining unit 34 includes:

The obtaining module 341 is configured to obtain a user click frequency of the second advertisement.

The third determining module 342 is configured to determine the similarity information according to the user click frequency, the semantic similarity, and the click similarity.

The similarity information is Sim=(1/log(TF))*Sim _content +Sim _session ; where TF represents the frequency of user clicks, Sim _content represents semantic similarity, and Sim _session represents click similarity.

The similarity processing device of the advertisement of the embodiment of the present invention can perform another similarity processing method for the advertisement provided by the embodiment of the present invention, and the implementation principle thereof is similar, and details are not described herein again.

FIG. 7 is a block diagram showing the structure of a computing device that can be used to implement the similarity processing method of the above advertisement according to an embodiment of the present invention.

Referring to FIG. 7, computing device 700 includes a memory 710 and a processor 720.

The processor 720 can be a multi-core processor or multiple processors. In some embodiments, processor 720 can include a general purpose main processor and one or more special coprocessors, such as a graphics processing unit (GPU), a digital signal processor (DSP), and the like. In some embodiments, the processor 720 can be implemented using a custom circuit, such as an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA).

Memory 710 can include various types of storage units, such as system memory, read only memory (ROM), and persistent storage. Among them, the ROM can store static data or instructions required by the processor 720 or other modules of the computer. The persistent storage device can be a readable and writable storage device. The persistent storage device may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (eg, magnetic or optical disk, flash memory) as the permanent storage device. In other embodiments, the persistent storage device can be a removable storage device (eg, a floppy disk, an optical drive). The system memory can be a readable and writable storage device or a volatile read/write storage device, such as dynamic random access memory. System memory can store instructions and data that some or all of the processors need at runtime. Moreover, memory 710 can include any combination of computer readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read only memory), and magnetic disks and/or optical disks can also be employed. In some embodiments, the memory 710 can include a removable storage device readable and/or writable, such as a compact disc (CD), a read-only digital versatile disc (eg, a DVD-ROM, a dual layer DVD-ROM), Read-only Blu-ray discs, ultra-density discs, flash cards (such as SD cards, min SD cards, Micro-SD cards, etc.), magnetic floppy disks, and so on. The computer readable storage medium does not include a carrier wave and an instantaneous electronic signal transmitted by wireless or wire.

The executable code is stored on the memory 710, and when the executable code is processed by the processor 720, the processor 720 can be caused to perform the similarity processing method of the advertisement described above.

The method and apparatus for processing similarity of an advertisement according to the present invention have been described in detail above with reference to the accompanying drawings.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of cells is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.

The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform portions of the methods of various embodiments of the present invention. step. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Other embodiments of the present disclosure will be apparent to those skilled in the <RTIgt; The present invention is intended to cover any variations, uses, or adaptations of the present disclosure, which are in accordance with the general principles of the present disclosure and include common general knowledge or common technical means in the art that are not disclosed in the present disclosure. . The specification and examples are to be regarded as illustrative only,

It is to be understood that the invention is not limited to the details of the details and The scope of the disclosure is to be limited only by the appended claims.

Claims

A similarity processing method for an advertisement, which is characterized by comprising:

Obtaining a set of advertisement text, wherein the set of advertisement text includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring the user Clicking on the collection, wherein the user clicks the set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of at least one other advertisement that the user clicked, the first advertisement and the The second advertisement is also an advertisement that the at least one user clicked on;

Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts;

Determining a click similarity between the first advertisement and the second advertisement according to the user clicking the set;

And determining similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
The method according to claim 1, wherein determining a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set comprises:

Establishing a semantic similarity objective function according to the set of advertisement texts;

Solving the semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement and an advertisement text of the second advertisement in an optimal state of the semantic similarity objective function Overall vectorized representation;

The semantic similarity is determined according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
The method according to claim 2, wherein said establishing a semantic similarity objective function according to said set of advertisement texts comprises:

Establishing a first preset function of the tth feature information in the set of advertisement texts according to the set of advertisement texts
Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w tk , . . . , w t+k ; W) denotes a formal function, and W denotes the advertisement text collection The t tth feature information in the t tk represents the tkth feature information in the advertisement text set, w t+k represents the t+kth feature information in the advertisement text set, and k represents the to-be-established The window size of the semantic similarity objective function, t∈[k,T], T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are positive integers;

Establishing a first probability distribution function according to the set of advertisement texts
Where i∈[tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set;

Preset information according to a first function of the advertisement text w t of the feature set, and the first probability distribution function, establishing the semantic similarity objective function
The method of claim 2 wherein said semantic similarity is
Wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.
The method according to claim 1, wherein determining a click similarity between the first advertisement and the second advertisement according to the user click set comprises:

Establishing a click similarity objective function according to the user clicking the set;

Solving the click similarity objective function to determine a vectorized representation of the first advertisement and an vectorized representation of the second advertisement in an optimal state of the click similarity objective function;

The click similarity is determined based on the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
The method according to claim 5, wherein the click similarity objective function is established according to the user click set, including:

Establishing, according to the user click set, a second preset function of the feature information of the w′ t advertisements in the user click set
Wherein b' represents a preset second deviation value, U' represents a preset second parameter vector, and h'( w'tk , ..., w't +k ; W') represents a formalized function, 'represents the user clicks on the set of w' wherein the ad information of t, w 't'-k' indicates the user clicks' ad the characteristic information, w 'in the first set t'-k t' + k' indicates that the user clicks on the feature information of the t'+k'th advertisement in the set, and k' represents the window size of the click similarity degree objective function to be established, t'∈[k', T'], T ' indicates that the user clicks on the sum of the number of advertisements in the collection, k', t', T' are all positive integers;

Establishing a second probability distribution function according to the user clicking on the set
Where i'∈[t'-k', t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the set;

Establishing the click similarity objective function according to a second preset function of the feature information of the w′ t advertisements in the set of the user clicks, and the second probability distribution function
The method of claim 5 wherein said click similarity is
Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.
The method according to any one of claims 1 to 7, wherein the similarity information between the first advertisement and the second advertisement is determined according to the semantic similarity and the click similarity, include:

Obtaining a frequency of user clicks of the second advertisement;

The similarity information is determined according to the user click frequency, the semantic similarity, and the click similarity.
The method according to claim 8, wherein the similarity information is Sim=(1/log(TF))*Sim content +Sim session ;

Wherein, TF represents the user click frequency, Sim content represents the semantic similarity, and Sim session represents the click similarity.
An apparatus for processing similarity of an advertisement, comprising:

An obtaining unit, configured to obtain an advertisement text set, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and advertisement text of the second advertisement Overall feature information, feature information of each of the second advertisements, feature information of the advertisement text of at least one other advertisement, and characteristics of each of the other advertisements of the at least one other advertisement Information, and obtaining a user click set, wherein the user click set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of at least one other advertisement that the user clicked, first The advertisement and the second advertisement are also advertisements that the at least one user has clicked on;

a first determining unit, configured to determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set;

a second determining unit, configured to determine, according to the user click set, a click similarity between the first advertisement and the second advertisement;

a third determining unit, configured to determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
The device according to claim 10, wherein the first determining unit comprises:

a first establishing module, configured to establish a semantic similarity objective function according to the set of advertisement texts;

a first solving module, configured to solve the semantic similarity objective function to determine a vectorized representation of the overall advertisement text of the first advertisement in an optimal state of the semantic similarity objective function, and a vectorized representation of the overall advertising text of the second advertisement;

The first determining module is configured to determine the semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
The device according to claim 11, wherein the first establishing module comprises:

a first establishing submodule, configured to establish, according to the advertisement text set, a first preset function of the tth feature information in the advertisement text set
Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w tk , . . . , w t+k ; W) denotes a formal function, and W denotes the advertisement text collection The t tth feature information in the t tk represents the tkth feature information in the advertisement text set, w t+k represents the t+kth feature information in the advertisement text set, and k represents the to-be-established The window size of the semantic similarity objective function, t∈[k,T], T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are positive integers;

a second establishing submodule, configured to establish a first probability distribution function according to the set of advertisement texts
Where i∈[tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set;

Establishing a third sub-module, a first predetermined function of the advertisement text w t features set information, the semantic similarity objective function and the first probability distribution function, according to established
The apparatus of claim 11 wherein said semantic similarity is
Wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.
The device according to claim 10, wherein the second determining unit comprises:

a second establishing module, configured to establish a click similarity objective function according to the user clicking the set;

a second solving module, configured to solve the click similarity objective function to determine a vectorized representation of the first advertisement and an optional second advertisement in an optimal state of the click similarity objective function Vectorized representation

a second determining module, configured to determine the click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
The device according to claim 14, wherein the second establishing module comprises:

a fourth establishing submodule, configured to establish, according to the user click set, a second preset function of the feature information of the w′ t advertisements in the user click set
Wherein b' represents a preset second deviation value, U' represents a preset second parameter vector, and h'( w'tk , ..., w't +k ; W') represents a formalized function, 'represents the user clicks on the set of w' wherein the ad information of t, w 't'-k' indicates the user clicks' ad the characteristic information, w 'in the first set t'-k t' + k' indicates that the user clicks on the feature information of the t'+k'th advertisement in the set, and k' represents the window size of the click similarity degree objective function to be established, t'∈[k', T'], T ' indicates that the user clicks on the sum of the number of advertisements in the collection, k', t', T' are all positive integers;

a fifth establishing submodule, configured to establish a second probability distribution function according to the user clicking the set
Where i'∈[t'-k', t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the set;

a sixth establishing submodule, configured to establish the click similarity objective function according to a second preset function of the feature information of the w′ t advertisements in the user click set, and the second probability distribution function
The device according to claim 14, wherein said click similarity is
Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.
The device according to any one of claims 10-16, wherein the third determining unit comprises:

An obtaining module, configured to acquire a user click frequency of the second advertisement;

And a third determining module, configured to determine the similarity information according to the user click frequency, the semantic similarity, and the click similarity.
The apparatus according to claim 17, wherein the similarity information is Sim=(1/log(TF))*Sim content +Sim session ;

Wherein, TF represents the user click frequency, Sim content represents the semantic similarity, and Sim session represents the click similarity.
A computing device comprising:

Processor;

A memory having executable code stored thereon that, when executed by the processor, causes the processor to perform the method of any of claims 1-9.
A non-transitory machine readable storage medium having stored thereon executable code that, when executed by a processor of an electronic device, causes the processor to perform any of claims 1-9 Said method.