WO2019128311A1 - Advertisement similarity processing method and apparatus, calculation device, and storage medium - Google Patents

Advertisement similarity processing method and apparatus, calculation device, and storage medium Download PDF

Info

Publication number
WO2019128311A1
WO2019128311A1 PCT/CN2018/105093 CN2018105093W WO2019128311A1 WO 2019128311 A1 WO2019128311 A1 WO 2019128311A1 CN 2018105093 W CN2018105093 W CN 2018105093W WO 2019128311 A1 WO2019128311 A1 WO 2019128311A1
Authority
WO
WIPO (PCT)
Prior art keywords
advertisement
similarity
feature information
click
text
Prior art date
Application number
PCT/CN2018/105093
Other languages
French (fr)
Chinese (zh)
Inventor
刘夏龙
Original Assignee
广东神马搜索科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东神马搜索科技有限公司 filed Critical 广东神马搜索科技有限公司
Publication of WO2019128311A1 publication Critical patent/WO2019128311A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0254Targeted advertisements based on statistics

Definitions

  • the present invention relates to the field of advertising technologies, and in particular, to a similarity processing method and apparatus for an advertisement, a computing device, and a storage medium.
  • Advertising is widely used as an important means of promoting products; when advertising, it is necessary to consider the similarity between advertisements, so as to push advertisements of similar products to the user's terminal, thereby facilitating users to obtain more product information.
  • the keyword information of the advertisement is generally obtained, and then the advertisement information is determined according to the keyword information of the advertisement, and then the similar advertisement is pushed to the appropriate user. group.
  • the invention provides a similarity processing method and device for advertising, which is used to solve the problem that the similarity between the analyzed advertisements is not accurate.
  • the present invention provides a similarity processing method for an advertisement, including:
  • the set of advertisement text includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring the user Clicking on the collection, wherein the user clicks the set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of other advertisements that the at least one user clicked, the first advertisement and the second advertisement It is also an advertisement that at least one of the above users has clicked;
  • determining a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set including:
  • the semantic similarity is determined according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
  • the establishing a semantic similarity objective function according to the set of advertisement texts includes:
  • t t feature information w tk represents the tkth feature information in the advertisement text set
  • w t+k represents the t+kth feature information in the advertisement text set
  • k represents the semantic similarity to be established
  • the window size of the degree objective function, t ⁇ [k, T], T represents the sum of the number of feature information in the set of advertisement texts, and k, t, and T are positive integers;
  • Preset information according to a first function of the advertisement text w t of the feature set, and the first probability distribution function, establishing the semantic similarity objective function
  • A represents a vectorized representation of the entire advertisement text of the first advertisement
  • B represents a vectorized representation of the entire advertisement text of the second advertisement.
  • determining a click similarity between the first advertisement and the second advertisement according to the user click set including:
  • the click similarity is determined based on the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
  • a second preset function of the feature information of the w′ t advertisements in the user click set Where b' represents a preset second deviation value, U' represents a preset second parameter vector, h'( w'tk , ..., w't +k ; W') represents a formalized function, and W' represents The user clicks on the feature information of the w′ t advertisements in the set, and w′ t′ -k′ represents the feature information of the t′ -k′ advertisements in the user click collection, w′ t′+k′ Representing that the user clicks on the feature information of the t'+k'th advertisement in the set, and k' represents the window size of the click similarity degree objective function to be established, t' ⁇ [k', T'], T' The user clicks on the sum of the number of advertisements in the collection, and k', t', and T' are all positive integers;
  • C represents a vectorized representation of the first advertisement
  • D represents a vectorized representation of the second advertisement
  • determining the similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity including:
  • the similarity information is determined according to the user click frequency, the semantic similarity, and the click similarity.
  • TF represents the user click frequency
  • Sim content represents the semantic similarity
  • Sim session represents the click similarity
  • the present invention provides an advertisement similarity processing apparatus, including:
  • An obtaining unit configured to obtain an advertisement text set, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and advertisement text of the second advertisement Overall feature information, feature information of each of the second advertisements, feature information of the advertisement text of at least one other advertisement, and characteristics of each of the other advertisements of the at least one other advertisement Information, and obtaining a user click set, wherein the user click set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of at least one other advertisement that the user clicked, first The advertisement and the second advertisement are also advertisements that the at least one user has clicked on;
  • a first determining unit configured to determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set
  • a second determining unit configured to determine, according to the user click set, a click similarity between the first advertisement and the second advertisement
  • a third determining unit configured to determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
  • the first determining unit includes:
  • a first establishing module configured to establish a semantic similarity objective function according to the set of advertisement texts
  • a first solving module configured to solve the semantic similarity objective function to determine a vectorized representation of the overall advertisement text of the first advertisement in an optimal state of the semantic similarity objective function, and a vectorized representation of the overall advertising text of the second advertisement;
  • the first determining module is configured to determine the semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
  • the first establishing module includes:
  • a first establishing submodule configured to establish, according to the advertisement text set, a first preset function of the tth feature information in the advertisement text set
  • b denotes a preset first deviation value
  • U denotes a preset first parameter vector, h(w tk , . . .
  • t t feature information w tk represents the tkth feature information in the advertisement text set
  • w t+k represents the t+kth feature information in the advertisement text set
  • k represents the semantic similarity to be established
  • the window size of the degree objective function, t ⁇ [k, T], T represents the sum of the number of feature information in the set of advertisement texts, and k, t, and T are positive integers;
  • a second establishing submodule configured to establish a first probability distribution function according to the set of advertisement texts Where i ⁇ [tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set;
  • A represents a vectorized representation of the entire advertisement text of the first advertisement
  • B represents a vectorized representation of the entire advertisement text of the second advertisement.
  • the second determining unit includes:
  • a second establishing module configured to establish a click similarity objective function according to the user clicking the set
  • a second solving module configured to solve the click similarity objective function to determine a vectorized representation of the first advertisement and an optional second advertisement in an optimal state of the click similarity objective function Vectorized representation
  • a second determining module configured to determine the click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
  • the second establishing module includes:
  • a fourth establishing submodule configured to establish, according to the user click set, a second preset function of the feature information of the w′ t advertisements in the user click set
  • b' represents a preset second deviation value
  • U' represents a preset second parameter vector
  • W' represents a formalized function
  • W' represents The user clicks on the feature information of the w′ t advertisements in the set
  • w′ t′ -k′ represents the feature information of the t′ -k′ advertisements in the user click collection, w′ t′+k′
  • k' represents the window size of the click similarity degree objective function to be established, t' ⁇ [k', T'], T'
  • the user clicks on the sum of the number of advertisements in the collection, and k', t', and T' are all positive integers;
  • a fifth establishing submodule configured to establish a second probability distribution function according to the user clicking the set
  • i' ⁇ [t'-k', t'+k'] i' is a positive integer
  • w't ' indicates that the user clicks on the feature information of the t'th advertisement in the set
  • a sixth establishing submodule configured to establish the click similarity objective function according to a second preset function of the feature information of the w′ t advertisements in the user click set, and the second probability distribution function
  • C represents a vectorized representation of the first advertisement
  • D represents a vectorized representation of the second advertisement
  • the third determining unit includes:
  • An obtaining module configured to acquire a user click frequency of the second advertisement
  • a third determining module configured to determine the similarity information according to the user click frequency, the semantic similarity, and the click similarity.
  • TF represents the user click frequency
  • Sim content represents the semantic similarity
  • Sim session represents the click similarity
  • the present invention provides a computing device comprising:
  • a memory having stored thereon executable code that, when executed by the processor, causes the processor to perform the method of any of the above.
  • the present invention provides a non-transitory machine readable storage medium having stored thereon executable code that, when executed by a processor of an electronic device, causes the processor to perform the above One of the methods described.
  • the method and device for processing similarity of an advertisement provided by the present invention, by acquiring an advertisement text set, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and The feature information of the advertisement text as a whole, the feature information of each word in the second advertisement, the feature information of the advertisement text of at least one other advertisement, and each of the words of each of the other advertisements of the at least one other advertisement Feature information, and obtaining a user click set, wherein the user clicks the feature including the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked, the first advertisement and the second advertisement And the advertisement that is clicked by the at least one user; determining a semantic similarity between the first advertisement and the second advertisement according to the advertisement text collection; determining a click similarity between the first advertisement and the second advertisement according to the user clicking the collection; Determine the first ad and the first based on semantic similarity and click similarity The similarity between the advertising
  • the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted.
  • the advertisements clicked by the users belonging to the same group constitute a user click collection
  • Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements; and the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements;
  • the semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement
  • a similar advertisement can accurately determine the similarity between advertisements degree.
  • the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement
  • FIG. 1 is a schematic flowchart of a method for processing similarity of an advertisement according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a click session log in an advertisement similarity processing method according to an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a neural network model in a similarity processing method for an advertisement according to an embodiment of the present disclosure
  • FIG. 4 is a schematic flowchart diagram of another method for processing similarity of an advertisement according to an embodiment of the present disclosure
  • FIG. 5 is a schematic structural diagram of an apparatus for processing similarity of an advertisement according to an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of another similarity processing apparatus for an advertisement according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
  • Word Embedding refers to word embedding technology; specifically, the word is vectorized, the abstraction of the entity becomes a mathematical description, and it can be modeled and applied to many tasks, such as comparing similarities between words. It can be determined directly by the cosine distance metric between vectors.
  • DSSM Deep Structured Semantic Model
  • Stochastic Gradient Descent is a common method for solving unconstrained optimization problems. It has the advantage of simple implementation.
  • the stochastic gradient descent method is an iterative algorithm. Each step needs to solve the gradient vector of the objective function.
  • the specific application scenario of the present invention is as follows. With the development of media technology and terminal technology, more and more advertisements need to be put into media technology; advertisements can be pushed to users, and users can be divided into multiple user groups according to user characteristics, and then to each user group. Push similar ads; or push a series of similar ads directly to the user. So how to accurately determine which advertisements are similar, that is, the similarity between advertisements, is a problem that needs to be solved.
  • the similarity processing method and apparatus for advertising provided by the present invention are intended to solve the above technical problems of the prior art.
  • FIG. 1 is a schematic flowchart diagram of a method for processing similarity of an advertisement according to an embodiment of the present application. As shown in Figure 1, the method includes:
  • Step 101 Obtain an advertisement text set, where the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature including the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked.
  • the first advertisement and the second advertisement are also advertisements that the at least one user clicked.
  • the execution subject of the embodiment may be an advertisement similarity processing device, a server, or other device that can perform the method of the embodiment.
  • the advertiser analyzes each ad, and then split each ad into multiple words; and then get a set of advertising text.
  • the advertisement text set feature information of the entire advertisement text of each of the plurality of advertisements, and feature information of each word of each of the plurality of advertisements; wherein the plurality of advertisements are to be analyzed The first ad and the second ad.
  • the feature information of the entire advertisement text of each advertisement is a vector, and the feature information of each word is also a vector.
  • an advertisement text collection is generated according to ten thousand advertisements, and the advertisement text collection includes feature information of the advertisement text as a whole of the advertisement 1 , feature information of the word 1 of the advertisement 1 , feature information of the word 2 of the advertisement 1 , and advertisement Feature information of word 3, feature information of advertisement text of advertisement 2, feature information of word 2 of advertisement 2, feature information of word 3 of advertisement 2, feature information of word 4 of advertisement 2, advertisement text of advertisement 3
  • feature information of the word 2 of the advertisement 3 feature information of the word 3 of the advertisement 3
  • feature information of the word 4 of the advertisement 3 feature information of the entire advertisement text of the advertisement 4, and feature information of the word 4 of the advertisement 4.
  • each user's Click Session log is obtained, and according to each user's Click Session log, an advertisement that each user clicks is determined; then, each user clicks on the advertisement.
  • the user click collection includes feature information of each advertisement in the advertisement that the plurality of users clicked, wherein the advertisements that the plurality of users clicked have the first to be analyzed One ad and the second ad. It can be seen that the first advertisement and the second advertisement are also advertisements that the user clicked.
  • the feature information of each advertisement is a vector.
  • advertisements clicked by users belonging to the same group also reflect the similarity of the advertisements themselves; and thus can be obtained by users who belong to the same group.
  • 2 is a schematic diagram of a click session log in an advertisement similarity processing method according to an embodiment of the present application. As shown in FIG. 2, by analyzing a user's click behavior, the content of an advertisement that the user has clicked may be obtained. . Moreover, the user clicks on a large amount of advertisement click behavior, wherein each advertisement click behavior corresponds to an advertisement, and the massive click behavior can avoid the noise deviation problem between the advertisements.
  • the 10,000 advertisements that the users belonging to the same group have clicked can be obtained, and the 10,000 advertisements constitute a user click collection, and the user clicks the collection to include the feature information of the advertisement 1 and the feature information of the advertisement 2
  • Step 102 Determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set.
  • the feature information of the entire advertisement text of each advertisement in the advertisement text collection, and the feature information of each word in each advertisement are analyzed,
  • the first set of advertisements and the second advertisements to be analyzed are included in the set of advertisement texts, and the semantic similarity between the first advertisements and the second advertisements can be determined.
  • the semantic similarity characterizes the extent to which the second advertisement is like the first advertisement.
  • FIG. 3 is a schematic structural diagram of a neural network model in an advertisement similarity processing method according to an embodiment of the present application.
  • the first layer in the neural network model is a classifier; the neural network;
  • the second layer in the model is the Average/Concatenate layer, which represents a connection form of the lower layer network to the upper layer network;
  • the last layer in the neural network model represents the advertisement matrix.
  • (Paragraph matrix) that is, the vectorized representation of all advertisements.
  • D represents an advertisement
  • paragraph is the meaning of a paragraph
  • Paragraph refers to an advertisement
  • W is a prefix of a word (Word) in each advertisement.
  • Step 103 Determine a click similarity between the first advertisement and the second advertisement according to the user clicking the set.
  • the neural network algorithm and the Word Embedding technology are used to model the user click set, wherein the neural network algorithm has a continuous bag of words (Cbow) and skip-
  • the gram structure here, the neural network algorithm can adopt a skip-gram structure; and then analyze the feature information of each advertisement to obtain the click similarity between the first advertisement and the second advertisement.
  • the click similarity characterizes the extent to which the second advertisement is like the first advertisement.
  • Step 104 Determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
  • the step 104 includes: determining, according to the semantic similarity and the click similarity, the similarity information between the first advertisement and the second advertisement, including: acquiring a user click frequency of the second advertisement; The similarity information is determined according to the user click frequency, semantic similarity, and click similarity.
  • the similarity information may be calculated according to the calculated semantic similarity and click similarity. Specifically, since it is necessary to calculate how much the second advertisement is like the first advertisement, and first obtain the user click frequency TF of the second advertisement, the user click frequency TF is the number of times the second advertisement is clicked by the user; The frequency TF, the semantic similarity Sim content, and the click similarity Sim session are used to calculate the similarity information between the first advertisement and the second advertisement.
  • the calculation formula of the similarity information may be various, and the embodiment provides a preference.
  • the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks.
  • the first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted.
  • the advertisements clicked by the users belonging to the same group constitute a user click collection
  • Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements
  • the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements
  • the semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Similarity. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.
  • FIG. 4 is a schematic flowchart diagram of another method for processing similarity of an advertisement according to an embodiment of the present application. As shown in FIG. 4, the method includes:
  • Step 201 Obtain an advertisement text set, where the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature including the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked.
  • the first advertisement and the second advertisement are also advertisements that the at least one user clicked.
  • the execution subject of the embodiment may be an advertisement similarity processing device, a server, or other device that can perform the method of the embodiment. This step can be referred to step 101 of FIG. 1 and will not be described again.
  • Step 202 Establish a semantic similarity objective function according to the set of advertisement texts.
  • step 202 specifically includes the following steps:
  • Step 2021 Establish a first preset function of the tth feature information in the advertisement text set according to the advertisement text set.
  • b denotes a preset first deviation value
  • U denotes a preset first parameter vector
  • h(w tk ,..., w t+k ;W) denotes a formal function
  • W denotes a wth in the advertisement text set t feature information
  • w tk represents the tkth feature information in the advertisement text set
  • w t+k represents the t+kth feature information in the advertisement text set
  • k represents the window size of the semantic similarity objective function to be established
  • t ⁇ [k,T]T represents the sum of the number of feature information in the advertisement text set
  • k, t, and T are all positive integers.
  • Step 2022 Establish a first probability distribution function according to the advertisement text set. Where i ⁇ [tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set.
  • Step 2023 the first predetermined function of information in accordance with a first set of advertising texts w t of features, and a first probability distribution function, a semantic similarity objective function
  • a multi-layered Deep Structured Semantic Models may be used to perform bi-char preprocessing on sentences such as sentences and words, for example, The word is directly processed by the text.
  • b represents a preset first deviation value
  • U represents a preset first parameter vector
  • the window size of the semantic similarity objective function to be established, t ⁇ [k,T], T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are positive integers; and each of the advertisement text sets
  • a feature information is a vector.
  • Step 203 Solving a semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement in an optimal state of the semantic similarity objective function, and a vectorized representation of the entire advertisement text of the second advertisement .
  • the semantic similarity objective function obtained in step 202 is solved by using a cross entropy method to determine each of the advertisement text sets in the optimal state of the semantic similarity objective function.
  • a vectorized representation of the feature information that is, a vectorized representation of the overall advertisement text of the first advertisement, a vectorized representation of each word in the first advertisement, a vectorized representation of the overall advertisement text of the second advertisement, and a second advertisement
  • the optimal state of the semantic similarity objective function may be the maximum value of the semantic similarity objective function, or the optimal state of the semantic similarity objective function may be that the value of the semantic similarity objective function is within a preset range.
  • Step 204 Determine a semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
  • the semantic similarity is wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.
  • the vectorized representation A of the entire advertisement text of the first advertisement and the vectorized representation B of the entire advertisement text of the second advertisement are obtained, and the value of the cosine of the two is obtained.
  • J represents the dimension of vector A
  • the dimension of vector A is the same as the dimension of vector B
  • j ⁇ [1, J], j, J are positive integers
  • a j is the jth value of vector A
  • b j is The jth value of vector B.
  • Step 205 Establish a click similarity objective function according to the user clicking the set.
  • step 205 specifically includes the following steps:
  • Step 2051 Establish a second preset function of the feature information of the w′ t advertisements in the user click set according to the user clicking the set.
  • b' represents a preset second deviation value
  • U' represents a preset second parameter vector
  • W') represents a formalized function
  • W' represents The user clicks on the feature information of the w′th advertisements in the collection
  • w′ t′ -k′ indicates that the user clicks on the feature information of the t′ -k′ advertisements in the collection
  • w′ t′+k′ indicates that the user clicks on the collection.
  • the feature information of the t'+k' advertisement in the middle, k' represents the window size of the click similarity objective function to be established, t' ⁇ [k', T'], T' indicates the user clicks on the advertisement in the set
  • the sum of numbers, k', t', T' are all positive integers.
  • Step 2052 Establish a second probability distribution function according to the user clicking the set.
  • i' ⁇ [t'-k',t'+k'] i' is a positive integer
  • w't ' indicates that the user clicks on the feature information of the t'th advertisement in the collection.
  • Step 2053 in accordance with a second predetermined characteristic information function of the user clicks set w 't ads, and a second probability distribution function, a similarity objective function clicking
  • the normalized pre-processing may be performed first.
  • a second preset function is established for the feature information of the w′ t advertisements in the user click collection. It can be seen that for the user to click on each feature information in the set, a second preset function is subsequently established.
  • b' represents a preset second deviation value
  • U' represents a preset second parameter vector
  • W is ') represents a function of formalization, wherein, W' represents 't ad of the characteristic information, w' w the user clicks on the set of t'-k 'denotes the first set of user clicks t'-k' th ad
  • the feature information, w′ t′+k′ indicates that the user clicks on the feature information of the t′+k′ advertisements in the collection, and k′ represents the window size of the click similarity objective function to be established, t′ ⁇ [k′,T '], T' represents the sum of the number of advertisements in the user's click collection, k', t', T' are all positive integers; and the user clicks on each feature information in the set as a vector.
  • the second feature information of a predetermined function w 't ad of Substituting into the second probability distribution function Then, since the second preset function of the user clicks each feature information in the set, the second preset function of each feature information can be substituted into the second probability distribution function. Thus, the click similarity objective function can be obtained.
  • Step 206 Solving the click similarity objective function to determine a vectorized representation of the first advertisement and an vectorized representation of the second advertisement in an optimal state of clicking the similarity objective function.
  • the SGD method may be used to solve the problem, and the user clicks on each of the sets in the optimal state of the click similarity objective function.
  • a vectorized representation of the feature information ie, a vectorized representation of the first advertisement, a vectorized representation of the second advertisement, a vectorized representation of the third advertisement, and so on.
  • each advertisement in the user click collection is an advertisement that the user belonging to the same group has clicked.
  • the optimal state of the click similarity objective function may be the maximum value of the click similarity objective function, or the optimal state of the click similarity objective function may be that the value of the click similarity objective function is within a preset range.
  • Step 207 Determine a click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
  • the click similarity is Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.
  • the values of the cosine of the two are obtained, thereby calculating the first advertisement and
  • the click similarity between the second ads is
  • J' represents the dimension of the vector C
  • the dimension of the vector C is the same as the dimension of the vector D
  • j' ⁇ [1, J'], j', J' are all positive integers
  • a j' is the jth of the vector C 'Value
  • b j' is the j'th value of the vector D.
  • Step 208 Determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
  • the step 208 specifically includes: determining, according to the semantic similarity and the click similarity, the similarity information between the first advertisement and the second advertisement, including: acquiring a user click frequency of the second advertisement; The similarity information is determined according to the user click frequency, semantic similarity, and click similarity.
  • this step can be omitted in step 104 of FIG. 1 .
  • the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks.
  • the first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted.
  • the advertisements clicked by the users belonging to the same group constitute a user click collection
  • Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements
  • the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements
  • the semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Similarity. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.
  • FIG. 5 is a schematic structural diagram of an apparatus for processing similarity of an advertisement according to an embodiment of the present invention. As shown in FIG. 5, the apparatus of this embodiment may include:
  • the obtaining unit 31 is configured to obtain an advertisement text set, where the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and overall characteristics of the advertisement text of the second advertisement. Information, feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring the user click collection
  • the first advertisement and the second advertisement are also clicked by the at least one user. ad.
  • the first determining unit 32 is configured to determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set.
  • the second determining unit 33 is configured to determine a click similarity between the first advertisement and the second advertisement according to the user click set.
  • the third determining unit 34 is configured to determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
  • the similarity processing device of the advertisement of the embodiment of the present invention can be used for the similarity processing method of the advertisement provided by the embodiment of the present invention, and the implementation principle thereof is similar, and details are not described herein again.
  • the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks.
  • the first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted.
  • the advertisements clicked by the users belonging to the same group constitute a user click collection
  • Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements
  • the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements
  • the semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Degree. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.
  • FIG. 6 is a schematic structural diagram of another similarity processing apparatus for an advertisement according to an embodiment of the present invention. On the basis of the embodiment shown in FIG. 5, as shown in FIG. 6, in the apparatus provided in this embodiment, the first determination is performed.
  • Unit 32 comprising:
  • the first establishing module 321 is configured to establish a semantic similarity objective function according to the set of advertisement texts.
  • the first solving module 322 is configured to solve the semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement and an advertisement text of the second advertisement in an optimal state of the semantic similarity objective function.
  • the overall vectorized representation is configured to solve the semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement and an advertisement text of the second advertisement in an optimal state of the semantic similarity objective function.
  • the first determining module 323 is configured to determine a semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
  • the first establishing module 321 includes:
  • a first establishing sub-module 3211 configured to establish, according to the advertisement text set, a first preset function of the tth feature information in the advertisement text set
  • b denotes a preset first deviation value
  • U denotes a preset first parameter vector
  • h(w tk ,..., w t+k ;W) denotes a formal function
  • W denotes a wth in the advertisement text set t feature information
  • w tk represents the tkth feature information in the advertisement text set
  • w t+k represents the t+kth feature information in the advertisement text set
  • k represents the window size of the semantic similarity objective function to be established
  • t ⁇ [k,T]T represents the sum of the number of feature information in the advertisement text set
  • k, t, and T are all positive integers.
  • a second establishing sub-module 3212 configured to establish a first probability distribution function according to the set of advertisement texts Where i ⁇ [tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set.
  • a first information according to a first predetermined function is used to set advertising texts w t of features, and a first probability distribution function, a semantic similarity objective function
  • A represents a vectorized representation of the entire advertisement text of the first advertisement
  • B represents a vectorized representation of the entire advertisement text of the second advertisement.
  • the second determining unit 33 includes:
  • the second establishing module 331 is configured to establish a click similarity objective function according to the user clicking the set.
  • the second solving module 332 is configured to solve the click similarity objective function to determine a vectorized representation of the first advertisement and a vectorized representation of the second advertisement in an optimal state of clicking the similarity objective function.
  • the second determining module 333 is configured to determine the click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
  • the second establishing module 331 includes:
  • the fourth establishing sub-module 3311 is configured to establish, according to the user click set, a second preset function of the feature information of the w′ t advertisements in the user click collection
  • b' represents a preset second deviation value
  • U' represents a preset second parameter vector
  • W') represents a formalized function
  • W' represents The user clicks on the feature information of the w′th advertisements in the collection
  • w′ t′ -k′ indicates that the user clicks on the feature information of the t′ -k′ advertisements in the collection
  • w′ t′+k′ indicates that the user clicks on the collection.
  • the feature information of the t'+k' advertisement in the middle, k' represents the window size of the click similarity objective function to be established, t' ⁇ [k', T'], T' indicates the user clicks on the advertisement in the set
  • the sum of numbers, k', t', T' are all positive integers.
  • a fifth establishing sub-module 3312 configured to establish a second probability distribution function according to the user clicking the set Where i' ⁇ [t'-k',t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the collection.
  • C represents a vectorized representation of the first advertisement
  • D represents a vectorized representation of the second advertisement
  • the third determining unit 34 includes:
  • the obtaining module 341 is configured to obtain a user click frequency of the second advertisement.
  • the third determining module 342 is configured to determine the similarity information according to the user click frequency, the semantic similarity, and the click similarity.
  • the similarity processing device of the advertisement of the embodiment of the present invention can perform another similarity processing method for the advertisement provided by the embodiment of the present invention, and the implementation principle thereof is similar, and details are not described herein again.
  • FIG. 7 is a block diagram showing the structure of a computing device that can be used to implement the similarity processing method of the above advertisement according to an embodiment of the present invention.
  • computing device 700 includes a memory 710 and a processor 720.
  • the processor 720 can be a multi-core processor or multiple processors.
  • processor 720 can include a general purpose main processor and one or more special coprocessors, such as a graphics processing unit (GPU), a digital signal processor (DSP), and the like.
  • the processor 720 can be implemented using a custom circuit, such as an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA).
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • Memory 710 can include various types of storage units, such as system memory, read only memory (ROM), and persistent storage.
  • the ROM can store static data or instructions required by the processor 720 or other modules of the computer.
  • the persistent storage device can be a readable and writable storage device.
  • the persistent storage device may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off.
  • the persistent storage device employs a mass storage device (eg, magnetic or optical disk, flash memory) as the permanent storage device.
  • the persistent storage device can be a removable storage device (eg, a floppy disk, an optical drive).
  • the system memory can be a readable and writable storage device or a volatile read/write storage device, such as dynamic random access memory.
  • System memory can store instructions and data that some or all of the processors need at runtime.
  • memory 710 can include any combination of computer readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read only memory), and magnetic disks and/or optical disks can also be employed.
  • the memory 710 can include a removable storage device readable and/or writable, such as a compact disc (CD), a read-only digital versatile disc (eg, a DVD-ROM, a dual layer DVD-ROM), Read-only Blu-ray discs, ultra-density discs, flash cards (such as SD cards, min SD cards, Micro-SD cards, etc.), magnetic floppy disks, and so on.
  • a removable storage device readable and/or writable such as a compact disc (CD), a read-only digital versatile disc (eg, a DVD-ROM, a dual layer DVD-ROM), Read-only Blu-ray discs, ultra-density discs, flash cards (such as SD cards, min SD cards, Micro-SD cards, etc.), magnetic floppy disks, and so on.
  • the computer readable storage medium does not include a carrier wave and an instantaneous electronic signal transmitted by wireless or wire.
  • the executable code is stored on the memory 710, and when the executable code is processed by the processor 720, the processor 720 can be caused to perform the similarity processing method of the advertisement described above.
  • the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks.
  • the first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted.
  • the advertisements clicked by the users belonging to the same group constitute a user click collection
  • Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements
  • the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements
  • the semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Similarity. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of cells is only a logical function division.
  • multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform portions of the methods of various embodiments of the present invention. step.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Abstract

An advertisement similarity processing method and apparatus, a calculation device, and a storage medium. The method comprises: obtaining an advertisement text set, and obtaining an advertisement click set (101); determining semantic similarity between a first advertisement and a second advertisement according to the advertisement text set (102); determining click similarity between the first advertisement and the second advertisement according to a user click set (103); and determining similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity (104). The similarity among all advertisements is determined, so that similar advertisements can be pushed to a user during advertisement pushing.

Description

广告的相似度处理方法和装置、计算设备及存储介质Method and device for processing similarity of advertisement, computing device and storage medium 技术领域Technical field
本发明涉及广告技术领域,尤其涉及一种广告的相似度处理方法和装置、计算设备及存储介质。The present invention relates to the field of advertising technologies, and in particular, to a similarity processing method and apparatus for an advertisement, a computing device, and a storage medium.
背景技术Background technique
随着媒体技术的发展,广告也越来越多的应用到媒体技术中。广告作为推销产品的一种重要手段被广泛使用;在投放广告的时候,需要考虑到广告之间相似性,以便于向用户的终端推送相似产品的广告,进而便于用户获知更多的产品信息。With the development of media technology, advertising is increasingly being applied to media technology. Advertising is widely used as an important means of promoting products; when advertising, it is necessary to consider the similarity between advertisements, so as to push advertisements of similar products to the user's terminal, thereby facilitating users to obtain more product information.
现有技术中,在分析广告之间的相似性的时候,一般是获取到广告的关键词信息,然后根据广告的关键词信息确定广告之间是否相似,然后将相似的广告推送给适合的用户群体。In the prior art, when analyzing the similarity between advertisements, the keyword information of the advertisement is generally obtained, and then the advertisement information is determined according to the keyword information of the advertisement, and then the similar advertisement is pushed to the appropriate user. group.
然而现有技术中,由于广告用户的不断变化以及广告行文的复杂性,进而在分析广告之间的相似性的时候,容易提取出错误的关键词信息,进而分析出的广告之间的相似性并不准确,进一步的,推送给用户群体的广告并不是相似的广告,进而推送广告错误。However, in the prior art, due to the constant change of the advertising user and the complexity of the advertising text, when analyzing the similarity between the advertisements, it is easy to extract the wrong keyword information, and then analyze the similarity between the advertisements. It’s not accurate. Further, the ads that are pushed to the user community are not similar ads, and the ads are pushed incorrectly.
发明内容Summary of the invention
本发明提供一种广告的相似度处理方法和装置,用以解决分析出的广告之间的相似性并不准确的问题。The invention provides a similarity processing method and device for advertising, which is used to solve the problem that the similarity between the analyzed advertisements is not accurate.
一方面,本发明提供一种广告的相似度处理方法,包括:In one aspect, the present invention provides a similarity processing method for an advertisement, including:
获取广告文本集合,其中,所述广告文本集合中包括第一广告的广告文 本整体的特征信息、所述第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、所述第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及所述至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,所述用户点击集合中包括所述第一广告的特征信息、所述第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告;Obtaining a set of advertisement text, wherein the set of advertisement text includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring the user Clicking on the collection, wherein the user clicks the set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of other advertisements that the at least one user clicked, the first advertisement and the second advertisement It is also an advertisement that at least one of the above users has clicked;
根据所述广告文本集合,确定所述第一广告与所述第二广告之间的语义相似度;Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts;
根据所述用户点击集合,确定所述第一广告与所述第二广告之间的点击相似度;Determining a click similarity between the first advertisement and the second advertisement according to the user clicking the set;
根据所述语义相似度和所述点击相似度,确定所述第一广告与所述第二广告之间的相似度信息。And determining similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
进一步地,根据所述广告文本集合,确定所述第一广告与所述第二广告之间的语义相似度,包括:Further, determining a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set, including:
根据所述广告文本集合,建立语义相似度目标函数;Establishing a semantic similarity objective function according to the set of advertisement texts;
对所述语义相似度目标函数进行求解,以确定在所述语义相似度目标函数的最优状态下的所述第一广告的广告文本整体的向量化表示、以及所述第二广告的广告文本整体的向量化表示;Solving the semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement and an advertisement text of the second advertisement in an optimal state of the semantic similarity objective function Overall vectorized representation;
根据所述第一广告的广告文本整体的向量化表示、以及所述第二广告的广告文本整体的向量化表示,确定所述语义相似度。The semantic similarity is determined according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
进一步地,所述根据所述广告文本集合,建立语义相似度目标函数,包括:Further, the establishing a semantic similarity objective function according to the set of advertisement texts includes:
根据所述广告文本集合,建立所述广告文本集合中的第w t个特征信息的第一预设函数
Figure PCTCN2018105093-appb-000001
其中,b表示预设的第一偏差值,U 表示预设的第一参数向量,h(w t-k,…,w t+k;W)表示形式化函数,W表示所述广告文本集合中的第w t个特征信息,w t-k表示所述广告文本集合中第t-k个特征信息,w t+k表示所述广告文本集合中第t+k个特征信息,k表示待建立的所述语义相似度目标函数的窗口大小,t∈[k,T],T表示所述广告文本集合中的特征信息的个数总和,k、t、T都是正整数;
Establishing a first preset function of the tth feature information in the set of advertisement texts according to the set of advertisement texts
Figure PCTCN2018105093-appb-000001
Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w tk , . . . , w t+k ;W) denotes a formalized function, and W denotes a set of the advertisement text The t t feature information, w tk represents the tkth feature information in the advertisement text set, w t+k represents the t+kth feature information in the advertisement text set, and k represents the semantic similarity to be established The window size of the degree objective function, t ∈ [k, T], T represents the sum of the number of feature information in the set of advertisement texts, and k, t, and T are positive integers;
根据所述广告文本集合,建立第一概率分布函数
Figure PCTCN2018105093-appb-000002
其中,i∈[t-k,t+k],i是正整数;w t表示所述广告文本集合中第t个特征信息;
Establishing a first probability distribution function according to the set of advertisement texts
Figure PCTCN2018105093-appb-000002
Where i∈[tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set;
根据所述广告文本集合中的第w t个特征信息的第一预设函数,以及所述第一概率分布函数,建立所述语义相似度目标函数
Figure PCTCN2018105093-appb-000003
Preset information according to a first function of the advertisement text w t of the feature set, and the first probability distribution function, establishing the semantic similarity objective function
Figure PCTCN2018105093-appb-000003
进一步地,所述语义相似度为
Figure PCTCN2018105093-appb-000004
其中,A表示所述第一广告的广告文本整体的向量化表示,B表示所述第二广告的广告文本整体的向量化表示。
Further, the semantic similarity is
Figure PCTCN2018105093-appb-000004
Wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.
进一步地,根据所述用户点击集合,确定所述第一广告与所述第二广告之间的点击相似度,包括:Further, determining a click similarity between the first advertisement and the second advertisement according to the user click set, including:
根据所述用户点击集合,建立点击相似度目标函数;Establishing a click similarity objective function according to the user clicking the set;
对所述点击相似度目标函数进行求解,以确定在所述点击相似度目标函数的最优状态下的所述第一广告的向量化表示、以及所述第二广告的向量化表示;Solving the click similarity objective function to determine a vectorized representation of the first advertisement and an vectorized representation of the second advertisement in an optimal state of the click similarity objective function;
根据所述第一广告的向量化表示、以及所述第二广告的向量化表示,确定所述点击相似度。The click similarity is determined based on the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
进一步地,根据所述用户点击集合,建立点击相似度目标函数,包括:Further, establishing a click similarity objective function according to the user clicking the set, including:
根据所述用户点击集合,建立所述用户点击集合中的第w′ t个广告的特征信息的第二预设函数
Figure PCTCN2018105093-appb-000005
其中,b′表示预设的第二偏差值,U′表示预设的第二参数向量,h′(w′ t-k,…,w′ t+k;W′)表示形式化函数,W′表示所述用户点击集合中的第w′ t个广告的特征信息,w′ t′-k′表示所述用户点击集合中第t′-k′个广告的特征信息,w′ t′+k′表示所述用户点击集合中第t′+k′个广告的特征信息,k′表示待建立的所述点击相似度目标函数的窗口大小,t′∈[k′,T′],T′表示所述用户点击集合中的广告的个数总和,k′、t′、T′都是正整数;
Establishing, according to the user click set, a second preset function of the feature information of the w′ t advertisements in the user click set
Figure PCTCN2018105093-appb-000005
Where b' represents a preset second deviation value, U' represents a preset second parameter vector, h'( w'tk , ..., w't +k ; W') represents a formalized function, and W' represents The user clicks on the feature information of the w′ t advertisements in the set, and w′ t′ -k′ represents the feature information of the t′ -k′ advertisements in the user click collection, w′ t′+k′ Representing that the user clicks on the feature information of the t'+k'th advertisement in the set, and k' represents the window size of the click similarity degree objective function to be established, t'∈[k', T'], T' The user clicks on the sum of the number of advertisements in the collection, and k', t', and T' are all positive integers;
根据所述用户点击集合,建立第二概率分布函数
Figure PCTCN2018105093-appb-000006
其中,i′∈[t′-k′,t′+k′],i′是正整数;w′ t′表示所述用户点击集合中第t′个广告的特征信息;
Establishing a second probability distribution function according to the user clicking on the set
Figure PCTCN2018105093-appb-000006
Where i'∈[t'-k', t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the set;
根据所述用户点击集合中的第w′ t个广告的特征信息的第二预设函数,以及所述第二概率分布函数,建立所述点击相似度目标函数
Figure PCTCN2018105093-appb-000007
Establishing the click similarity objective function according to a second preset function of the feature information of the w′ t advertisements in the set of the user clicks, and the second probability distribution function
Figure PCTCN2018105093-appb-000007
进一步地,所述点击相似度为
Figure PCTCN2018105093-appb-000008
其中,C表示所述第一广告的向量化表示,D表示所述第二广告的向量化表示。
Further, the click similarity is
Figure PCTCN2018105093-appb-000008
Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.
进一步地,根据所述语义相似度和所述点击相似度,确定所述第一广告与所述第二广告之间的相似度信息,包括:Further, determining the similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity, including:
获取所述第二广告的用户点击频次;Obtaining a frequency of user clicks of the second advertisement;
根据所述用户点击频次、所述语义相似度和所述点击相似度,确定所述相似度信息。The similarity information is determined according to the user click frequency, the semantic similarity, and the click similarity.
进一步地,所述相似度信息为Sim=(1/log(TF))*Sim content+Sim sessionFurther, the similarity information is Sim=(1/log(TF))*Sim content +Sim session ;
其中,TF表示所述用户点击频次,Sim content表示所述语义相似度,Sim session 表示所述点击相似度。 Wherein, TF represents the user click frequency, Sim content represents the semantic similarity, and Sim session represents the click similarity.
另一方面,本发明提供一种广告的相似度处理装置,包括:In another aspect, the present invention provides an advertisement similarity processing apparatus, including:
获取单元,用于获取广告文本集合,其中,所述广告文本集合中包括第一广告的广告文本整体的特征信息、所述第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、所述第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及所述至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,所述用户点击集合中包括所述第一广告的特征信息、所述第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告;An obtaining unit, configured to obtain an advertisement text set, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and advertisement text of the second advertisement Overall feature information, feature information of each of the second advertisements, feature information of the advertisement text of at least one other advertisement, and characteristics of each of the other advertisements of the at least one other advertisement Information, and obtaining a user click set, wherein the user click set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of at least one other advertisement that the user clicked, first The advertisement and the second advertisement are also advertisements that the at least one user has clicked on;
第一确定单元,用于根据所述广告文本集合,确定所述第一广告与所述第二广告之间的语义相似度;a first determining unit, configured to determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set;
第二确定单元,用于根据所述用户点击集合,确定所述第一广告与所述第二广告之间的点击相似度;a second determining unit, configured to determine, according to the user click set, a click similarity between the first advertisement and the second advertisement;
第三确定单元,用于根据所述语义相似度和所述点击相似度,确定所述第一广告与所述第二广告之间的相似度信息。a third determining unit, configured to determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
进一步地,所述第一确定单元,包括:Further, the first determining unit includes:
第一建立模块,用于根据所述广告文本集合,建立语义相似度目标函数;a first establishing module, configured to establish a semantic similarity objective function according to the set of advertisement texts;
第一求解模块,用于对所述语义相似度目标函数进行求解,以确定在所述语义相似度目标函数的最优状态下的所述第一广告的广告文本整体的向量化表示、以及所述第二广告的广告文本整体的向量化表示;a first solving module, configured to solve the semantic similarity objective function to determine a vectorized representation of the overall advertisement text of the first advertisement in an optimal state of the semantic similarity objective function, and a vectorized representation of the overall advertising text of the second advertisement;
第一确定模块,用于根据所述第一广告的广告文本整体的向量化表示、以及所述第二广告的广告文本整体的向量化表示,确定所述语义相似度。The first determining module is configured to determine the semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
进一步地,所述第一建立模块,包括:Further, the first establishing module includes:
第一建立子模块,用于根据所述广告文本集合,建立所述广告文本集合 中的第w t个特征信息的第一预设函数
Figure PCTCN2018105093-appb-000009
其中,b表示预设的第一偏差值,U表示预设的第一参数向量,h(w t-k,…,w t+k;W)表示形式化函数,W表示所述广告文本集合中的第w t个特征信息,w t-k表示所述广告文本集合中第t-k个特征信息,w t+k表示所述广告文本集合中第t+k个特征信息,k表示待建立的所述语义相似度目标函数的窗口大小,t∈[k,T],T表示所述广告文本集合中的特征信息的个数总和,k、t、T都是正整数;
a first establishing submodule, configured to establish, according to the advertisement text set, a first preset function of the tth feature information in the advertisement text set
Figure PCTCN2018105093-appb-000009
Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w tk , . . . , w t+k ; W) denotes a formalized function, and W denotes in the advertisement text set The t t feature information, w tk represents the tkth feature information in the advertisement text set, w t+k represents the t+kth feature information in the advertisement text set, and k represents the semantic similarity to be established The window size of the degree objective function, t ∈ [k, T], T represents the sum of the number of feature information in the set of advertisement texts, and k, t, and T are positive integers;
第二建立子模块,用于根据所述广告文本集合,建立第一概率分布函数
Figure PCTCN2018105093-appb-000010
其中,i∈[t-k,t+k],i是正整数;w t表示所述广告文本集合中第t个特征信息;
a second establishing submodule, configured to establish a first probability distribution function according to the set of advertisement texts
Figure PCTCN2018105093-appb-000010
Where i∈[tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set;
第三建立子模块,用于根据所述广告文本集合中的第w t个特征信息的第一预设函数,以及所述第一概率分布函数,建立所述语义相似度目标函数
Figure PCTCN2018105093-appb-000011
Establishing a third sub-module, a first predetermined function of the advertisement text w t features set information, the semantic similarity objective function and the first probability distribution function, according to established
Figure PCTCN2018105093-appb-000011
进一步地,所述语义相似度为
Figure PCTCN2018105093-appb-000012
其中,A表示所述第一广告的广告文本整体的向量化表示,B表示所述第二广告的广告文本整体的向量化表示。
Further, the semantic similarity is
Figure PCTCN2018105093-appb-000012
Wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.
进一步地,所述第二确定单元,包括:Further, the second determining unit includes:
第二建立模块,用于根据所述用户点击集合,建立点击相似度目标函数;a second establishing module, configured to establish a click similarity objective function according to the user clicking the set;
第二求解模块,用于对所述点击相似度目标函数进行求解,以确定在所述点击相似度目标函数的最优状态下的所述第一广告的向量化表示、以及所述第二广告的向量化表示;a second solving module, configured to solve the click similarity objective function to determine a vectorized representation of the first advertisement and an optional second advertisement in an optimal state of the click similarity objective function Vectorized representation
第二确定模块,用于根据所述第一广告的向量化表示、以及所述第二广告的向量化表示,确定所述点击相似度。a second determining module, configured to determine the click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
进一步地,所述第二建立模块,包括:Further, the second establishing module includes:
第四建立子模块,用于根据所述用户点击集合,建立所述用户点击集合中的第w′ t个广告的特征信息的第二预设函数
Figure PCTCN2018105093-appb-000013
其中,b′表示预设的第二偏差值,U′表示预设的第二参数向量,h′(w′ t-k,…,w′ t+k;W′)表示形式化函数,W′表示所述用户点击集合中的第w′ t个广告的特征信息,w′ t′-k′表示所述用户点击集合中第t′-k′个广告的特征信息,w′ t′+k′表示所述用户点击集合中第t′+k′个广告的特征信息,k′表示待建立的所述点击相似度目标函数的窗口大小,t′∈[k′,T′],T′表示所述用户点击集合中的广告的个数总和,k′、t′、T′都是正整数;
a fourth establishing submodule, configured to establish, according to the user click set, a second preset function of the feature information of the w′ t advertisements in the user click set
Figure PCTCN2018105093-appb-000013
Where b' represents a preset second deviation value, U' represents a preset second parameter vector, h'( w'tk , ..., w't +k ; W') represents a formalized function, and W' represents The user clicks on the feature information of the w′ t advertisements in the set, and w′ t′ -k′ represents the feature information of the t′ -k′ advertisements in the user click collection, w′ t′+k′ Representing that the user clicks on the feature information of the t'+k'th advertisement in the set, and k' represents the window size of the click similarity degree objective function to be established, t'∈[k', T'], T' The user clicks on the sum of the number of advertisements in the collection, and k', t', and T' are all positive integers;
第五建立子模块,用于根据所述用户点击集合,建立第二概率分布函数
Figure PCTCN2018105093-appb-000014
其中,i′∈[t′-k′,t′+k′],i′是正整数;w′ t′表示所述用户点击集合中第t′个广告的特征信息;
a fifth establishing submodule, configured to establish a second probability distribution function according to the user clicking the set
Figure PCTCN2018105093-appb-000014
Where i'∈[t'-k', t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the set;
第六建立子模块,用于根据所述用户点击集合中的第w′ t个广告的特征信息的第二预设函数,以及所述第二概率分布函数,建立所述点击相似度目标函数
Figure PCTCN2018105093-appb-000015
a sixth establishing submodule, configured to establish the click similarity objective function according to a second preset function of the feature information of the w′ t advertisements in the user click set, and the second probability distribution function
Figure PCTCN2018105093-appb-000015
进一步地,所述点击相似度为
Figure PCTCN2018105093-appb-000016
其中,C表示所述第一广告的向量化表示,D表示所述第二广告的向量化表示。
Further, the click similarity is
Figure PCTCN2018105093-appb-000016
Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.
进一步地,所述第三确定单元,包括:Further, the third determining unit includes:
获取模块,用于获取所述第二广告的用户点击频次;An obtaining module, configured to acquire a user click frequency of the second advertisement;
第三确定模块,用于根据所述用户点击频次、所述语义相似度和所述点击相似度,确定所述相似度信息。And a third determining module, configured to determine the similarity information according to the user click frequency, the semantic similarity, and the click similarity.
进一步地,所述相似度信息为Sim=(1/log(TF))*Sim content+Sim sessionFurther, the similarity information is Sim=(1/log(TF))*Sim content +Sim session ;
其中,TF表示所述用户点击频次,Sim content表示所述语义相似度,Sim session表示所述点击相似度。 Wherein, TF represents the user click frequency, Sim content represents the semantic similarity, and Sim session represents the click similarity.
另一方面,本发明提供了一种计算设备,包括:In another aspect, the present invention provides a computing device comprising:
处理器;以及Processor;
存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如上任何一项所述的方法。A memory having stored thereon executable code that, when executed by the processor, causes the processor to perform the method of any of the above.
另一方面,本发明提供了一种非暂时性机器可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如上任一项所述的方法。In another aspect, the present invention provides a non-transitory machine readable storage medium having stored thereon executable code that, when executed by a processor of an electronic device, causes the processor to perform the above One of the methods described.
本发明提供的广告的相似度处理方法和装置,通过获取广告文本集合,其中,广告文本集合中包括第一广告的广告文本整体的特征信息、第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,用户点击集合中包括第一广告的特征信息、第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告;根据广告文本集合,确定第一广告与第二广告之间的语义相似度;根据用户点击集合,确定第一广告与第二广告之间的点击相似度;根据语义相似度和点击相似度,确定第一广告与第二广告之间的相似度信息。从而通过对海量的广告中的词语进行提取,根据神经网络模型对海量的广告中的词语进行分析,可以对短文本的广告和长文本的广告都进行分析,便于提取广告中的主题和关键信息;并且,可以从用户点击广告的行为的角度进行分析,去获取到属于同一群体的用户所点击的海量的广告,然后,将属于同一群体的用户所点击过的广告构成一个用户点击集合,去对用户点击集合中的所有广告的特征进行分析,有利于进行广告归类;并且以上过程都是对海量的广告数据进行的分析,可以更准确的确定出广告之间相似性;然后将根据广告文本集合计算得到的语义相似度、以及根据用户点击集合计算得到的点击相似度进行计算,计算得到第一 广告与第二广告之间的相似度信息,即第二广告在多大程度上是与第一广告相似的,可以准确地确定出广告之间的相似度。进而可以根据以上过程确定出所有广告之间相似度,从而在向用户推送广告的时候,可以将相似的广告推送给用户。The method and device for processing similarity of an advertisement provided by the present invention, by acquiring an advertisement text set, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and The feature information of the advertisement text as a whole, the feature information of each word in the second advertisement, the feature information of the advertisement text of at least one other advertisement, and each of the words of each of the other advertisements of the at least one other advertisement Feature information, and obtaining a user click set, wherein the user clicks the feature including the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked, the first advertisement and the second advertisement And the advertisement that is clicked by the at least one user; determining a semantic similarity between the first advertisement and the second advertisement according to the advertisement text collection; determining a click similarity between the first advertisement and the second advertisement according to the user clicking the collection; Determine the first ad and the first based on semantic similarity and click similarity The similarity between the advertising information. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted. And, from the perspective of the behavior of the user clicking on the advertisement, to obtain a large number of advertisements clicked by the users belonging to the same group, and then, the advertisements clicked by the users belonging to the same group constitute a user click collection, Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements; and the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements; The semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement A similar advertisement can accurately determine the similarity between advertisements degree. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.
附图说明DRAWINGS
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in the specification
图1为本申请实施例提供的一种广告的相似度处理方法的流程示意图;1 is a schematic flowchart of a method for processing similarity of an advertisement according to an embodiment of the present application;
图2为本申请实施例提供的一种广告的相似度处理方法中的点击会话日志的示意图;2 is a schematic diagram of a click session log in an advertisement similarity processing method according to an embodiment of the present application;
图3为本申请实施例提供的一种广告的相似度处理方法中的神经网络模型的结构示意图;FIG. 3 is a schematic structural diagram of a neural network model in a similarity processing method for an advertisement according to an embodiment of the present disclosure;
图4为本申请实施例提供的另一种广告的相似度处理方法的流程示意图;FIG. 4 is a schematic flowchart diagram of another method for processing similarity of an advertisement according to an embodiment of the present disclosure;
图5为本发明实施例提供的一种广告的相似度处理装置的结构示意图;FIG. 5 is a schematic structural diagram of an apparatus for processing similarity of an advertisement according to an embodiment of the present disclosure;
图6为本发明实施例提供的另一种广告的相似度处理装置的结构示意图。FIG. 6 is a schematic structural diagram of another similarity processing apparatus for an advertisement according to an embodiment of the present invention.
图7为本发明实施例提供的计算设备的结构示意图。FIG. 7 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
通过上述附图,已示出本公开明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本公开构思的范围,而是通过参考特定实施例为本领域技术人员说明本公开的概念。The embodiments of the present disclosure have been shown by the above-described drawings, which will be described in more detail later. The drawings and the text are not intended to limit the scope of the present disclosure in any way, and the description of the present disclosure will be described by those skilled in the art by reference to the specific embodiments.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的 描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. The following description of the drawings refers to the same or similar elements in the different figures unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present disclosure. Instead, they are merely examples of devices and methods consistent with aspects of the present disclosure as detailed in the appended claims.
首先对本发明所涉及的名词进行解释:First, the nouns involved in the present invention are explained:
Word Embedding:指的是词嵌入技术;具体来说,将词进行向量化表示,实体的抽象成了数学描述,就可以进行建模,应用到很多任务中,例如比较词语之间的相似性,可以直接通过向量之间的余弦距离度量来确定。Word Embedding: refers to word embedding technology; specifically, the word is vectorized, the abstraction of the entity becomes a mathematical description, and it can be modeled and applied to many tasks, such as comparing similarities between words. It can be determined directly by the cosine distance metric between vectors.
DSSM(Deep Structured Semantic Model):这是一种神经网络模型,又称为sent2vec。DSSM (Deep Structured Semantic Model): This is a neural network model, also known as sent2vec.
随机梯度下降法(Stochastic Gradient Descent,简称SGD):是求解无约束最优化问题的一种常用方法,有实现简单的优点;随机梯度下降法是迭代算法,每一步需要求解目标函数的梯度向量。Stochastic Gradient Descent (SGD) is a common method for solving unconstrained optimization problems. It has the advantage of simple implementation. The stochastic gradient descent method is an iterative algorithm. Each step needs to solve the gradient vector of the objective function.
本发明具体的应用场景如下。随着媒体技术以及终端技术的发展,越来越多的广告需要投放到媒体技术中;可以向用户推送广告,可以根据用户特性将用户进行划分得到多个用户群体,进而向每一种用户群体推送相似的广告;或者直接向用户推送一系列相似的广告。那么如何准确地确定哪些广告是相似的,即广告之间的相似度,是一个需要解决的问题。The specific application scenario of the present invention is as follows. With the development of media technology and terminal technology, more and more advertisements need to be put into media technology; advertisements can be pushed to users, and users can be divided into multiple user groups according to user characteristics, and then to each user group. Push similar ads; or push a series of similar ads directly to the user. So how to accurately determine which advertisements are similar, that is, the similarity between advertisements, is a problem that needs to be solved.
本发明提供的广告的相似度处理方法和装置,旨在解决现有技术的如上技术问题。The similarity processing method and apparatus for advertising provided by the present invention are intended to solve the above technical problems of the prior art.
下面以具体地实施例对本发明的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本发明的实施例进行描述。The technical solutions of the present invention and how the technical solutions of the present application solve the above technical problems will be described in detail below with reference to specific embodiments. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
图1为本申请实施例提供的一种广告的相似度处理方法的流程示意图。如图1所示,该方法包括:FIG. 1 is a schematic flowchart diagram of a method for processing similarity of an advertisement according to an embodiment of the present application. As shown in Figure 1, the method includes:
步骤101、获取广告文本集合,其中,广告文本集合中包括第一广告的广告文本整体的特征信息、第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,用户点击集合中包括第一广告的特征信息、第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告。Step 101: Obtain an advertisement text set, where the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature including the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked. The first advertisement and the second advertisement are also advertisements that the at least one user clicked.
在本实施例中,具体的,本实施例的执行主体可以为广告的相似度处理装置、服务器或者其他可以执行本实施例方法的设备。In this embodiment, specifically, the execution subject of the embodiment may be an advertisement similarity processing device, a server, or other device that can perform the method of the embodiment.
首先,需要获取到广告主所提供的每一个广告;然后对每一个广告进行分析,进而可以将每一个广告拆分为多个词语;进而得到一个广告文本集合。在该广告文本集合中包括了多个广告中的每一个广告的广告文本整体的特征信息、以及多个广告中每一个广告的每一个词语的特征信息;其中,这多个广告中就有待分析的第一广告和第二广告。并且,每一个广告的广告文本整体的特征信息是一个向量,并且每一个词语的特征信息也是一个向量。First, you need to get every ad provided by the advertiser; then analyze each ad, and then split each ad into multiple words; and then get a set of advertising text. Included in the advertisement text set, feature information of the entire advertisement text of each of the plurality of advertisements, and feature information of each word of each of the plurality of advertisements; wherein the plurality of advertisements are to be analyzed The first ad and the second ad. Moreover, the feature information of the entire advertisement text of each advertisement is a vector, and the feature information of each word is also a vector.
举例来说,根据一万个广告生成一个广告文本集合,广告文本集合中包括了广告1的广告文本整体的特征信息、广告1的词语1的特征信息、广告1的词语2的特征信息、广告1的词语3的特征信息、广告2的广告文本整体的特征信息、广告2的词语2的特征信息、广告2的词语3的特征信息、广告2的词语4的特征信息、广告3的广告文本整体的特征信息、广告3的词语2的特征信息、广告3的词语3的特征信息、广告3的词语4的特征信息、广告4的广告文本整体的特征信息、广告4的词语4的特征信息、广告4的词语5的特征信息、广告4的词语6的特征信息,以此类推;其中不同词语的标号代表了不同的词语,广告1为第一广告,广告2为第二广告。需要分析广告1与广告2之间相似度。For example, an advertisement text collection is generated according to ten thousand advertisements, and the advertisement text collection includes feature information of the advertisement text as a whole of the advertisement 1 , feature information of the word 1 of the advertisement 1 , feature information of the word 2 of the advertisement 1 , and advertisement Feature information of word 3, feature information of advertisement text of advertisement 2, feature information of word 2 of advertisement 2, feature information of word 3 of advertisement 2, feature information of word 4 of advertisement 2, advertisement text of advertisement 3 Overall feature information, feature information of the word 2 of the advertisement 3, feature information of the word 3 of the advertisement 3, feature information of the word 4 of the advertisement 3, feature information of the entire advertisement text of the advertisement 4, and feature information of the word 4 of the advertisement 4. The feature information of the word 5 of the advertisement 4, the feature information of the word 6 of the advertisement 4, and the like; wherein the labels of the different words represent different words, the advertisement 1 is the first advertisement, and the advertisement 2 is the second advertisement. It is necessary to analyze the similarity between advertisement 1 and advertisement 2.
并且,需要获取到多个用户所点击过的广告,将多个用户所点击过的广告构成一个用户点击集合。具体来说,首先,获取每一个用户的点击会话(Click Session)日志,根据在每一个用户的Click Session日志,确定出每一个用户所点击过的广告;然后将每一个用户所点击过的广告放入到一个用户点击集合中;该用户点击集合中包括了多个用户所点击过的广告中的每一广告的特征信息,其中,这多个用户所点击过的广告中就有待分析的第一广告和第二广告。可知,第一广告和第二广告也是用户所点击过的广告。其中,每一个广告的特征信息为一个向量。例如,由于具有同样兴趣的用户群体对广告的点击也是有偏好的,进而被属于同一群体的用户点击过的广告也反映了广告本身的相似性;进而可以去获取属于同一群体的用户所点击过的广告,将属于同一群体的用户所点击过的广告构成一个用户点击集合,然后,对这些广告进行画像和归类。图2为本申请实施例提供的一种广告的相似度处理方法中的点击会话日志的示意图,如图2所示,通过分析用户的点击行为,可以在获取到用户曾经点击过的广告的内容。并且,获取用户海量的广告点击行为,其中,每一次广告点击行为对应着一个广告,这些海量的点击行为可以避免了广告之间的噪声偏差问题。Moreover, it is necessary to obtain advertisements that have been clicked by a plurality of users, and to advertise an advertisement that has been clicked by a plurality of users into one user click collection. Specifically, first, each user's Click Session log is obtained, and according to each user's Click Session log, an advertisement that each user clicks is determined; then, each user clicks on the advertisement. Put into a user click collection; the user click collection includes feature information of each advertisement in the advertisement that the plurality of users clicked, wherein the advertisements that the plurality of users clicked have the first to be analyzed One ad and the second ad. It can be seen that the first advertisement and the second advertisement are also advertisements that the user clicked. The feature information of each advertisement is a vector. For example, because a group of users with the same interest has a preference for clicks on advertisements, advertisements clicked by users belonging to the same group also reflect the similarity of the advertisements themselves; and thus can be obtained by users who belong to the same group. An advertisement that clicks on an ad that has been clicked by a user belonging to the same group to form a user click collection, and then images and categorizes the advertisements. 2 is a schematic diagram of a click session log in an advertisement similarity processing method according to an embodiment of the present application. As shown in FIG. 2, by analyzing a user's click behavior, the content of an advertisement that the user has clicked may be obtained. . Moreover, the user clicks on a large amount of advertisement click behavior, wherein each advertisement click behavior corresponds to an advertisement, and the massive click behavior can avoid the noise deviation problem between the advertisements.
举例来说,可以获取到属于同一群体的用户所点击过的一万个广告,将这一万个广告构成一个用户点击集合,用户点击集合中包括了广告1的特征信息、广告2的特征信息、广告3的特征信息、广告4的特征信息,以此类推;其中,广告1为第一广告,广告2为第二广告,需要分析广告1与广告2之间相似度。For example, the 10,000 advertisements that the users belonging to the same group have clicked can be obtained, and the 10,000 advertisements constitute a user click collection, and the user clicks the collection to include the feature information of the advertisement 1 and the feature information of the advertisement 2 The feature information of the advertisement 3, the feature information of the advertisement 4, and the like, wherein the advertisement 1 is the first advertisement and the advertisement 2 is the second advertisement, and the similarity between the advertisement 1 and the advertisement 2 needs to be analyzed.
步骤102、根据广告文本集合,确定第一广告与第二广告之间的语义相似度。Step 102: Determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set.
在本实施例中,具体的,根据神经网络模型以及Word Embedding技术,对广告文本集合中每一广告的广告文本整体的特征信息、以及每一个广告中的每一个词语的特征信息进行分析,由于广告文本集合中包括了待分析的第 一广告与第二广告,进而可以确定出第一广告与第二广告之间的语义相似度。在本实施例中,该语义相似度表征了第二广告像第一广告的程度是多少。In this embodiment, specifically, according to the neural network model and the Word Embedding technology, the feature information of the entire advertisement text of each advertisement in the advertisement text collection, and the feature information of each word in each advertisement are analyzed, The first set of advertisements and the second advertisements to be analyzed are included in the set of advertisement texts, and the semantic similarity between the first advertisements and the second advertisements can be determined. In this embodiment, the semantic similarity characterizes the extent to which the second advertisement is like the first advertisement.
图3为本申请实施例提供的一种广告的相似度处理方法中的神经网络模型的结构示意图,如图3所示,该神经网络模型中第一层为分类器(Classifier);该神经网络模型中第二层为平均化/联接(Average/Concatenate)层,该平均化/联接层代表的是下层网络到上层网络的一种连接形式;该神经网络模型中最后一层表征的是广告矩阵(Paragraph matrix),即所有广告的向量化表示,例如,D表示某一个广告,paragraph是段落的意思,Paragraph代指一个广告,W是每一个广告里面的词(Word)的前缀。FIG. 3 is a schematic structural diagram of a neural network model in an advertisement similarity processing method according to an embodiment of the present application. As shown in FIG. 3, the first layer in the neural network model is a classifier; the neural network; The second layer in the model is the Average/Concatenate layer, which represents a connection form of the lower layer network to the upper layer network; the last layer in the neural network model represents the advertisement matrix. (Paragraph matrix), that is, the vectorized representation of all advertisements. For example, D represents an advertisement, paragraph is the meaning of a paragraph, Paragraph refers to an advertisement, and W is a prefix of a word (Word) in each advertisement.
步骤103、根据用户点击集合,确定第一广告与第二广告之间的点击相似度。Step 103: Determine a click similarity between the first advertisement and the second advertisement according to the user clicking the set.
在本实施例中,具体的,采用神经网络算法以及Word Embedding技术,对用户点击集合进行建模处理,其中,神经网络算法中具有连续词袋模型(Continuous Bag of Words,简称Cbow)以及skip-gram结构,在这里,神经网络算法可以采用skip-gram结构;进而对每一广告的特征信息进行分析,得到第一广告与第二广告之间的点击相似度。在本实施例中,该点击相似度表征了第二广告像第一广告的程度是多少。In this embodiment, specifically, the neural network algorithm and the Word Embedding technology are used to model the user click set, wherein the neural network algorithm has a continuous bag of words (Cbow) and skip- The gram structure, here, the neural network algorithm can adopt a skip-gram structure; and then analyze the feature information of each advertisement to obtain the click similarity between the first advertisement and the second advertisement. In this embodiment, the click similarity characterizes the extent to which the second advertisement is like the first advertisement.
步骤104、根据语义相似度和点击相似度,确定第一广告与第二广告之间的相似度信息。Step 104: Determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
在可选的一种实施方式中,步骤104具体包括:根据语义相似度和点击相似度,确定第一广告与第二广告之间的相似度信息,包括:获取第二广告的用户点击频次;根据用户点击频次、语义相似度和点击相似度,确定相似度信息。In an optional implementation manner, the step 104 includes: determining, according to the semantic similarity and the click similarity, the similarity information between the first advertisement and the second advertisement, including: acquiring a user click frequency of the second advertisement; The similarity information is determined according to the user click frequency, semantic similarity, and click similarity.
在可选的一种实施方式中,相似度信息为 Sim=(1/log(TF))*Sim content+Sim session;其中,TF表示用户点击频次,Sim content表示语义相似度,Sim session表示点击相似度。 In an optional implementation manner, the similarity information is Sim=(1/log(TF))*Sim content +Sim session ; wherein TF represents a user click frequency, Sim content represents a semantic similarity, and Sim session represents a click. Similarity.
在本实施例中,具体的,可以依据计算得到的语义相似度和点击相似度,去计算相似度信息。具体来说,由于需要计算第二广告有多像第一广告,进而首先获取到第二广告的用户点击频次TF,该用户点击频次TF为第二广告被用户点击的次数;然后,根据用户点击频次TF、语义相似度Sim content以及点击相似度Sim session,计算出第一广告与第二广告之间的相似度信息,相似度信息的计算公式可以有多种,本实施例提供了一种优选的计算方式,可以得到相似度信息为Sim=(1/log(TF))*Sim content+Sim sessionIn this embodiment, specifically, the similarity information may be calculated according to the calculated semantic similarity and click similarity. Specifically, since it is necessary to calculate how much the second advertisement is like the first advertisement, and first obtain the user click frequency TF of the second advertisement, the user click frequency TF is the number of times the second advertisement is clicked by the user; The frequency TF, the semantic similarity Sim content, and the click similarity Sim session are used to calculate the similarity information between the first advertisement and the second advertisement. The calculation formula of the similarity information may be various, and the embodiment provides a preference. The calculation method can obtain the similarity information as Sim=(1/log(TF))*Sim content +Sim session .
本实施例通过获取广告文本集合,其中,广告文本集合中包括第一广告的广告文本整体的特征信息、第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,用户点击集合中包括第一广告的特征信息、第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告;根据广告文本集合,确定第一广告与第二广告之间的语义相似度;根据用户点击集合,确定第一广告与第二广告之间的点击相似度;根据语义相似度和点击相似度,确定第一广告与第二广告之间的相似度信息。从而通过对海量的广告中的词语进行提取,根据神经网络模型对海量的广告中的词语进行分析,可以对短文本的广告和长文本的广告都进行分析,便于提取广告中的主题和关键信息;并且,可以从用户点击广告的行为的角度进行分析,去获取到属于同一群体的用户所点击的海量的广告,然后,将属于同一群体的用户所点击过的广告构成一个用户点击集合,去对用户点击集合中的所有广告的特征进行分析,有利于进行广告归类;并且以上过程都是对 海量的广告数据进行的分析,可以更准确的确定出广告之间相似性;然后将根据广告文本集合计算得到的语义相似度、以及根据用户点击集合计算得到的点击相似度进行计算,计算得到第一广告与第二广告之间的相似度信息,即第二广告在多大程度上是与第一广告相似的,可以准确定的确定出广告之间的相似度。进而可以根据以上过程确定出所有广告之间相似度,从而在向用户推送广告的时候,可以将相似的广告推送给用户。In this embodiment, the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks. The first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted. And, from the perspective of the behavior of the user clicking on the advertisement, to obtain a large number of advertisements clicked by the users belonging to the same group, and then, the advertisements clicked by the users belonging to the same group constitute a user click collection, Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements; and the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements; The semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Similarity. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.
图4为本申请实施例提供的另一种广告的相似度处理方法的流程示意图。如图4所示,该方法包括:FIG. 4 is a schematic flowchart diagram of another method for processing similarity of an advertisement according to an embodiment of the present application. As shown in FIG. 4, the method includes:
步骤201、获取广告文本集合,其中,广告文本集合中包括第一广告的广告文本整体的特征信息、第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,用户点击集合中包括第一广告的特征信息、第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告。Step 201: Obtain an advertisement text set, where the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature including the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked. The first advertisement and the second advertisement are also advertisements that the at least one user clicked.
在本实施例中,具体的,本实施例的执行主体可以为广告的相似度处理装置、服务器或者其他可以执行本实施例方法的设备。本步骤可以参见图1的步骤101不再赘述。In this embodiment, specifically, the execution subject of the embodiment may be an advertisement similarity processing device, a server, or other device that can perform the method of the embodiment. This step can be referred to step 101 of FIG. 1 and will not be described again.
步骤202、根据广告文本集合,建立语义相似度目标函数。Step 202: Establish a semantic similarity objective function according to the set of advertisement texts.
在可选的一种实施方式中,步骤202具体包括以下步骤:In an optional implementation manner, step 202 specifically includes the following steps:
步骤2021、根据广告文本集合,建立广告文本集合中的第w t个特征信息的第一预设函数
Figure PCTCN2018105093-appb-000017
其中,b表示预设的第一偏差值,U表示预设的第一参数向量,h(w t-k,…,w t+k;W)表示形式化函数,W表示广告文本集合中的第w t个特征信息,w t-k表示广告文本集合中第t-k个特征信息,w t+k 表示广告文本集合中第t+k个特征信息,k表示待建立的语义相似度目标函数的窗口大小,t∈[k,T],T表示广告文本集合中的特征信息的个数总和,k、t、T都是正整数。
Step 2021: Establish a first preset function of the tth feature information in the advertisement text set according to the advertisement text set.
Figure PCTCN2018105093-appb-000017
Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w tk ,..., w t+k ;W) denotes a formal function, and W denotes a wth in the advertisement text set t feature information, w tk represents the tkth feature information in the advertisement text set, w t+k represents the t+kth feature information in the advertisement text set, and k represents the window size of the semantic similarity objective function to be established, t ∈[k,T],T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are all positive integers.
步骤2022、根据广告文本集合,建立第一概率分布函数
Figure PCTCN2018105093-appb-000018
其中,i∈[t-k,t+k],i是正整数;w t表示广告文本集合中第t个特征信息。
Step 2022: Establish a first probability distribution function according to the advertisement text set.
Figure PCTCN2018105093-appb-000018
Where i∈[tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set.
步骤2023、根据广告文本集合中的第w t个特征信息的第一预设函数,以及第一概率分布函数,建立语义相似度目标函数
Figure PCTCN2018105093-appb-000019
Step 2023, the first predetermined function of information in accordance with a first set of advertising texts w t of features, and a first probability distribution function, a semantic similarity objective function
Figure PCTCN2018105093-appb-000019
在本实施例中,具体的,在步骤201之后,针对于广告文本集合,需要建立待求解的语义相似度目标函数。In this embodiment, specifically, after step 201, for the set of advertisement texts, a semantic similarity objective function to be solved needs to be established.
具体来说,对于广告文本集合中包括的句子和词语,可以采用多层神经网络深度学习模型(Deep Structured Semantic Models,简称DSSM)对句子和词语等文本进行bi-char方式预处理,例如,以字为单位直接进行文本预处理。Specifically, for sentences and words included in the advertisement text collection, a multi-layered Deep Structured Semantic Models (DSSM) may be used to perform bi-char preprocessing on sentences such as sentences and words, for example, The word is directly processed by the text.
然后,依据广告文本集合中的所有特征信息,针对于广告文本集合中的第w t个特征信息建立一个第一预设函数
Figure PCTCN2018105093-appb-000020
可知,对于广告文本集合中的每一个特征信息,后续都会建立一个第一预设函数。在第一预设函数的公式中,b表示了一个预设的第一偏差值,U表示了一个预设的第一参数向量;h(w t-k,…,w t+k;W)表示形式化函数,其中,W表示广告文本集合中的第w t个特征信息,w t-k表示广告文本集合中第t-k个特征信息,w t+k表示广告文本集合中第t+k个特征信息,k表示待建立的语义相似度目标函数的窗口大小,t∈[k,T],T表示广告文本集合中的特征信息的个数总和,k、t、T都是正整数;并且广告文本集合中每一个特征信息为一个向量。
Then, based on all text ads feature information collection, establish a first predetermined function in the ad text for the second w t a set of characteristic information
Figure PCTCN2018105093-appb-000020
It can be seen that for each feature information in the advertisement text set, a first preset function is subsequently established. In the formula of the first preset function, b represents a preset first deviation value, U represents a preset first parameter vector; h(w tk , ..., w t+k ; W) representation a function, where W represents the tth feature information in the set of advertisement texts, w tk represents the tkth feature information in the set of advertisement texts, and w t+k represents the t+kth feature information in the set of advertisement texts, k The window size of the semantic similarity objective function to be established, t∈[k,T], T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are positive integers; and each of the advertisement text sets A feature information is a vector.
然后,根据第w t个特征信息的第一预设函数
Figure PCTCN2018105093-appb-000021
以及广告文本集合中的所有特征信息,建立一个第一概率分布函数
Figure PCTCN2018105093-appb-000022
在该第一概率分布函数中,i∈[t-k,t+k],i是正整数;w t表示广告文本集合中第t个特征信息。
Then, according to the first preset function of the t tth feature information
Figure PCTCN2018105093-appb-000021
And all the feature information in the set of advertisement texts to establish a first probability distribution function
Figure PCTCN2018105093-appb-000022
In the first probability distribution function, i ∈ [tk, t + k], i is a positive integer; w t represents the t-th feature information in the advertisement text set.
然后,将第w t个特征信息的第一预设函数
Figure PCTCN2018105093-appb-000023
代入到第一概率分布函数
Figure PCTCN2018105093-appb-000024
中;然后,由于可以得到广告文本集合中每一个特征信息的第一预设函数,进而可以将每一个特征信息的第一预设函数都分别代入到第一概率分布函数
Figure PCTCN2018105093-appb-000025
中,从而可以得到语义相似度目标函数
Figure PCTCN2018105093-appb-000026
Then, the first preset function of the w tth feature information
Figure PCTCN2018105093-appb-000023
Substituting the first probability distribution function
Figure PCTCN2018105093-appb-000024
Then, since the first preset function of each feature information in the advertisement text set can be obtained, the first preset function of each feature information can be substituted into the first probability distribution function respectively.
Figure PCTCN2018105093-appb-000025
Semantic similarity objective function
Figure PCTCN2018105093-appb-000026
步骤203、对语义相似度目标函数进行求解,以确定在语义相似度目标函数的最优状态下的第一广告的广告文本整体的向量化表示、以及第二广告的广告文本整体的向量化表示。Step 203: Solving a semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement in an optimal state of the semantic similarity objective function, and a vectorized representation of the entire advertisement text of the second advertisement .
在本实施例中,具体地,对于步骤202中得到的语义相似度目标函数,采用交叉熵方法进行求解,去确定出在语义相似度目标函数的最优状态下,广告文本集合中的每一个特征信息的向量化表示,即得到第一广告的广告文本整体的向量化表示、第一广告中的每一个词语的向量化表示、第二广告的广告文本整体的向量化表示、第二广告中的每一个词语的向量化表示、至少一个其他广告的广告文本整体的向量化表示、以及至少一个其他广告的每一个其他广告中的每一个词语的向量化表示。In this embodiment, specifically, the semantic similarity objective function obtained in step 202 is solved by using a cross entropy method to determine each of the advertisement text sets in the optimal state of the semantic similarity objective function. a vectorized representation of the feature information, that is, a vectorized representation of the overall advertisement text of the first advertisement, a vectorized representation of each word in the first advertisement, a vectorized representation of the overall advertisement text of the second advertisement, and a second advertisement A vectorized representation of each word, a vectorized representation of the overall advertising text of at least one other advertisement, and a vectorized representation of each of the other advertisements of at least one other advertisement.
其中,语义相似度目标函数的最优状态可以为语义相似度目标函数的值为最大值,或者,语义相似度目标函数的最优状态可以为语义相似度目标函数的值在预设范围内。The optimal state of the semantic similarity objective function may be the maximum value of the semantic similarity objective function, or the optimal state of the semantic similarity objective function may be that the value of the semantic similarity objective function is within a preset range.
步骤204、根据第一广告的广告文本整体的向量化表示、以及第二广告 的广告文本整体的向量化表示,确定语义相似度。Step 204: Determine a semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
在可选的一种实施方式中,语义相似度为
Figure PCTCN2018105093-appb-000027
其中,A表示第一广告的广告文本整体的向量化表示,B表示第二广告的广告文本整体的向量化表示。
In an alternative embodiment, the semantic similarity is
Figure PCTCN2018105093-appb-000027
Wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.
在本实施例中,具体地,在步骤203之后,根据第一广告的广告文本整体的向量化表示A、以及第二广告的广告文本整体的向量化表示B,求取两者的cosine的值,进而计算出第一广告与第二广告之间的语义相似度为
Figure PCTCN2018105093-appb-000028
其中,J表示向量A的维度,且向量A的维度与向量B的维度相同,j∈[1,J],j、J都是正整数,a j为向量A的第j个值,b j为向量B的第j个值。
In this embodiment, specifically, after step 203, the vectorized representation A of the entire advertisement text of the first advertisement and the vectorized representation B of the entire advertisement text of the second advertisement are obtained, and the value of the cosine of the two is obtained. And calculating the semantic similarity between the first advertisement and the second advertisement as
Figure PCTCN2018105093-appb-000028
Where J represents the dimension of vector A, and the dimension of vector A is the same as the dimension of vector B, j ∈ [1, J], j, J are positive integers, a j is the jth value of vector A, b j is The jth value of vector B.
步骤205、根据用户点击集合,建立点击相似度目标函数。Step 205: Establish a click similarity objective function according to the user clicking the set.
在可选的一种实施方式中,步骤205具体包括以下步骤:In an optional implementation manner, step 205 specifically includes the following steps:
步骤2051、根据用户点击集合,建立用户点击集合中的第w′ t个广告的特征信息的第二预设函数
Figure PCTCN2018105093-appb-000029
其中,b′表示预设的第二偏差值,U′表示预设的第二参数向量,h′(w′ t-k,…,w′ t+k;W′)表示形式化函数,W′表示用户点击集合中的第w′ t个广告的特征信息,w′ t′-k′表示用户点击集合中第t′-k′个广告的特征信息,w′ t′+k′表示用户点击集合中第t′+k′个广告的特征信息,k′表示待建立的点击相似度目标函数的窗口大小,t′∈[k′,T′],T′表示用户点击集合中的广告的个数总和,k′、t′、T′都是正整数。
Step 2051: Establish a second preset function of the feature information of the w′ t advertisements in the user click set according to the user clicking the set.
Figure PCTCN2018105093-appb-000029
Where b' represents a preset second deviation value, U' represents a preset second parameter vector, h'( w'tk , ..., w't +k ; W') represents a formalized function, and W' represents The user clicks on the feature information of the w′th advertisements in the collection, w′ t′ -k′ indicates that the user clicks on the feature information of the t′ -k′ advertisements in the collection, and w′ t′+k′ indicates that the user clicks on the collection. The feature information of the t'+k' advertisement in the middle, k' represents the window size of the click similarity objective function to be established, t'∈[k', T'], T' indicates the user clicks on the advertisement in the set The sum of numbers, k', t', T' are all positive integers.
步骤2052、根据用户点击集合,建立第二概率分布函数
Figure PCTCN2018105093-appb-000030
其中,i′∈[t′-k′,t′+k′],i′是正整数;w′ t′表示用户点击集合中第t′个广告的特征信息。
Step 2052: Establish a second probability distribution function according to the user clicking the set.
Figure PCTCN2018105093-appb-000030
Where i'∈[t'-k',t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the collection.
步骤2053、根据用户点击集合中的第w′ t个广告的特征信息的第二预设函 数,以及第二概率分布函数,建立点击相似度目标函数
Figure PCTCN2018105093-appb-000031
Step 2053, in accordance with a second predetermined characteristic information function of the user clicks set w 't ads, and a second probability distribution function, a similarity objective function clicking
Figure PCTCN2018105093-appb-000031
在本实施例中,具体的,对于用户点击集合中的特征信息,可以先进行归一化的预处理。In this embodiment, specifically, for the user to click on the feature information in the set, the normalized pre-processing may be performed first.
然后,依据用户点击集合中的所有特征信息,针对于用户点击集合中的第w′ t个广告的特征信息建立一个第二预设函数
Figure PCTCN2018105093-appb-000032
可知,对于用户点击集合中的每一个特征信息,后续都会建立一个第二预设函数。在第二预设函数的公式中,b′表示了一个预设的第二偏差值,U′表示了一个预设的第二参数向量;h′(w′ t-k,…,w′ t+k;W′)表示形式化函数,其中,W′表示用户点击集合中的第w′ t个广告的特征信息,w′ t′-k′表示用户点击集合中第t′-k′个广告的特征信息,w′ t′+k′表示用户点击集合中第t′+k′个广告的特征信息,k′表示待建立的点击相似度目标函数的窗口大小,t′∈[k′,T′],T′表示用户点击集合中的广告的个数总和,k′、t′、T′都是正整数;并且用户点击集合中每一个特征信息为一个向量。
Then, according to all the feature information in the user clicking on the set, a second preset function is established for the feature information of the w′ t advertisements in the user click collection.
Figure PCTCN2018105093-appb-000032
It can be seen that for the user to click on each feature information in the set, a second preset function is subsequently established. In the formula of the second preset function, b' represents a preset second deviation value, U' represents a preset second parameter vector; h'( w'tk , ..., w't +k ; W is ') represents a function of formalization, wherein, W' represents 't ad of the characteristic information, w' w the user clicks on the set of t'-k 'denotes the first set of user clicks t'-k' th ad The feature information, w′ t′+k′ indicates that the user clicks on the feature information of the t′+k′ advertisements in the collection, and k′ represents the window size of the click similarity objective function to be established, t′∈[k′,T '], T' represents the sum of the number of advertisements in the user's click collection, k', t', T' are all positive integers; and the user clicks on each feature information in the set as a vector.
然后,根据第w′ t个广告的特征信息的第二预设函数
Figure PCTCN2018105093-appb-000033
以及用户点击集合中的所有特征信息,建立一个第二概率分布函数
Figure PCTCN2018105093-appb-000034
在该第二概率分布函数中,i′∈[t′-k′,t′+k′],i′是正整数;w′ t′表示用户点击集合中第t′个广告的特征信息。
Then, according to the second preset function of the feature information of the w′ t advertisements
Figure PCTCN2018105093-appb-000033
And the user clicks on all the feature information in the set to establish a second probability distribution function.
Figure PCTCN2018105093-appb-000034
In the second probability distribution function, i'∈[t'-k', t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the set.
然后,将第w′ t个广告的特征信息的第二预设函数
Figure PCTCN2018105093-appb-000035
代入到第二概率分布函数
Figure PCTCN2018105093-appb-000036
中;然后,由于可以得到用户点击集合中每一个特征信息的第二预设函数,进而可以将每一个特征信息的第二预设函数 都分别代入到第二概率分布函数中
Figure PCTCN2018105093-appb-000037
从而可以得到点击相似度目标函数
Figure PCTCN2018105093-appb-000038
Then, the second feature information of a predetermined function w 't ad of
Figure PCTCN2018105093-appb-000035
Substituting into the second probability distribution function
Figure PCTCN2018105093-appb-000036
Then, since the second preset function of the user clicks each feature information in the set, the second preset function of each feature information can be substituted into the second probability distribution function.
Figure PCTCN2018105093-appb-000037
Thus, the click similarity objective function can be obtained.
Figure PCTCN2018105093-appb-000038
步骤206、对点击相似度目标函数进行求解,以确定在点击相似度目标函数的最优状态下的第一广告的向量化表示、以及第二广告的向量化表示。Step 206: Solving the click similarity objective function to determine a vectorized representation of the first advertisement and an vectorized representation of the second advertisement in an optimal state of clicking the similarity objective function.
在本实施例中,具体的,对于步骤205中得到的点击相似度目标函数,可以采用SGD方法进行求解,去确定出在点击相似度目标函数的最优状态下,用户点击集合中的每一个特征信息的向量化表示,即可以得到第一广告的向量化表示、第二广告的向量化表示、第三广告的向量化表示、以此类推。其中,用户点击集合中的每一个广告为用户点击过的广告,优选的,用户点击集合中的每一个广告为属于同一群体的用户所点击过的广告。In this embodiment, specifically, for the click similarity objective function obtained in step 205, the SGD method may be used to solve the problem, and the user clicks on each of the sets in the optimal state of the click similarity objective function. A vectorized representation of the feature information, ie, a vectorized representation of the first advertisement, a vectorized representation of the second advertisement, a vectorized representation of the third advertisement, and so on. Wherein, the user clicks on each advertisement in the collection as an advertisement that the user clicks. Preferably, each advertisement in the user click collection is an advertisement that the user belonging to the same group has clicked.
其中,点击相似度目标函数的最优状态可以为点击相似度目标函数的值为最大值,或者,点击相似度目标函数的最优状态可以为点击相似度目标函数的值在预设范围内。The optimal state of the click similarity objective function may be the maximum value of the click similarity objective function, or the optimal state of the click similarity objective function may be that the value of the click similarity objective function is within a preset range.
步骤207、根据第一广告的向量化表示、以及第二广告的向量化表示,确定点击相似度。Step 207: Determine a click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
在可选的一种实施方式中,点击相似度为
Figure PCTCN2018105093-appb-000039
其中,C表示第一广告的向量化表示,D表示第二广告的向量化表示。
In an optional embodiment, the click similarity is
Figure PCTCN2018105093-appb-000039
Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.
在本实施例中,具体的,在步骤206之后,根据第一广告的向量化表示C、以及第二广告的向量化表示D,求取两者的cosine的值,进而计算出第一广告与第二广告之间的点击相似度为
Figure PCTCN2018105093-appb-000040
其中,J′表示向量C的维度,且向量C的维度与向量D的维度相同,j′∈[1,J′],j′、J′都是正整数,a j′为向量C的第j′个值,b j′为向量D的第j′个值。
In this embodiment, specifically, after step 206, according to the vectorized representation C of the first advertisement and the vectorized representation D of the second advertisement, the values of the cosine of the two are obtained, thereby calculating the first advertisement and The click similarity between the second ads is
Figure PCTCN2018105093-appb-000040
Where J' represents the dimension of the vector C, and the dimension of the vector C is the same as the dimension of the vector D, j'∈[1, J'], j', J' are all positive integers, and a j' is the jth of the vector C 'Value, b j' is the j'th value of the vector D.
步骤208、根据语义相似度和点击相似度,确定第一广告与第二广告之 间的相似度信息。Step 208: Determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
在可选的一种实施方式中,步骤208具体包括:根据语义相似度和点击相似度,确定第一广告与第二广告之间的相似度信息,包括:获取第二广告的用户点击频次;根据用户点击频次、语义相似度和点击相似度,确定相似度信息。In an optional implementation manner, the step 208 specifically includes: determining, according to the semantic similarity and the click similarity, the similarity information between the first advertisement and the second advertisement, including: acquiring a user click frequency of the second advertisement; The similarity information is determined according to the user click frequency, semantic similarity, and click similarity.
在可选的一种实施方式中,相似度信息为Sim=(1/log(TF))*Sim content+Sim session;其中,TF表示用户点击频次,Sim content表示语义相似度,Sim session表示点击相似度。 In an optional implementation manner, the similarity information is Sim=(1/log(TF))*Sim content +Sim session ; wherein TF represents a user click frequency, Sim content represents a semantic similarity, and Sim session represents a click. Similarity.
在本实施例中,具体地,本步骤可以参见图1的步骤104不再赘述。In this embodiment, specifically, this step can be omitted in step 104 of FIG. 1 .
本实施例通过获取广告文本集合,其中,广告文本集合中包括第一广告的广告文本整体的特征信息、第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,用户点击集合中包括第一广告的特征信息、第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告;根据广告文本集合,确定第一广告与第二广告之间的语义相似度;根据用户点击集合,确定第一广告与第二广告之间的点击相似度;根据语义相似度和点击相似度,确定第一广告与第二广告之间的相似度信息。从而通过对海量的广告中的词语进行提取,根据神经网络模型对海量的广告中的词语进行分析,可以对短文本的广告和长文本的广告都进行分析,便于提取广告中的主题和关键信息;并且,可以从用户点击广告的行为的角度进行分析,去获取到属于同一群体的用户所点击的海量的广告,然后,将属于同一群体的用户所点击过的广告构成一个用户点击集合,去对用户点击集合中的所有广告的特征进行分析,有利于进行广告归类;并且以上过程都是对 海量的广告数据进行的分析,可以更准确的确定出广告之间相似性;然后将根据广告文本集合计算得到的语义相似度、以及根据用户点击集合计算得到的点击相似度进行计算,计算得到第一广告与第二广告之间的相似度信息,即第二广告在多大程度上是与第一广告相似的,可以准确定的确定出广告之间的相似度。进而可以根据以上过程确定出所有广告之间相似度,从而在向用户推送广告的时候,可以将相似的广告推送给用户。In this embodiment, the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks. The first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted. And, from the perspective of the behavior of the user clicking on the advertisement, to obtain a large number of advertisements clicked by the users belonging to the same group, and then, the advertisements clicked by the users belonging to the same group constitute a user click collection, Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements; and the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements; The semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Similarity. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.
图5为本发明实施例提供的一种广告的相似度处理装置的结构示意图,如图5所示,本实施例的装置可以包括:FIG. 5 is a schematic structural diagram of an apparatus for processing similarity of an advertisement according to an embodiment of the present invention. As shown in FIG. 5, the apparatus of this embodiment may include:
获取单元31,用于获取广告文本集合,其中,广告文本集合中包括第一广告的广告文本整体的特征信息、第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,用户点击集合中包括第一广告的特征信息、第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告。The obtaining unit 31 is configured to obtain an advertisement text set, where the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and overall characteristics of the advertisement text of the second advertisement. Information, feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring the user click collection The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicked. The first advertisement and the second advertisement are also clicked by the at least one user. ad.
第一确定单元32,用于根据广告文本集合,确定第一广告与第二广告之间的语义相似度。The first determining unit 32 is configured to determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set.
第二确定单元33,用于根据用户点击集合,确定第一广告与第二广告之间的点击相似度。The second determining unit 33 is configured to determine a click similarity between the first advertisement and the second advertisement according to the user click set.
第三确定单元34,用于根据语义相似度和点击相似度,确定第一广告与第二广告之间的相似度信息。The third determining unit 34 is configured to determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
本实施例的广告的相似度处理装置可执行本发明实施例提供的一种广告的相似度处理方法,其实现原理相类似,此处不再赘述。The similarity processing device of the advertisement of the embodiment of the present invention can be used for the similarity processing method of the advertisement provided by the embodiment of the present invention, and the implementation principle thereof is similar, and details are not described herein again.
本实施例通过获取广告文本集合,其中,广告文本集合中包括第一广告的广告文本整体的特征信息、第一广告中的每一个词语的特征信息、第二广 告的广告文本整体的特征信息、第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,用户点击集合中包括第一广告的特征信息、第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告;根据广告文本集合,确定第一广告与第二广告之间的语义相似度;根据用户点击集合,确定第一广告与第二广告之间的点击相似度;根据语义相似度和点击相似度,确定第一广告与第二广告之间的相似度信息。从而通过对海量的广告中的词语进行提取,根据神经网络模型对海量的广告中的词语进行分析,可以对短文本的广告和长文本的广告都进行分析,便于提取广告中的主题和关键信息;并且,可以从用户点击广告的行为的角度进行分析,去获取到属于同一群体的用户所点击的海量的广告,然后,将属于同一群体的用户所点击过的广告构成一个用户点击集合,去对用户点击集合中的所有广告的特征进行分析,有利于进行广告归类;并且以上过程都是对海量的广告数据进行的分析,可以更准确的确定出广告之间相似性;然后将根据广告文本集合计算得到的语义相似度、以及根据用户点击集合计算得到的点击相似度进行计算,计算得到第一广告与第二广告之间的相似度信息,即第二广告在多大程度上是与第一广告相似的,可以准确定的确定出广告之间的相似度。进而可以根据以上过程确定出所有广告之间相似度,从而在向用户推送广告的时候,可以将相似的广告推送给用户。In this embodiment, the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks. The first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted. And, from the perspective of the behavior of the user clicking on the advertisement, to obtain a large number of advertisements clicked by the users belonging to the same group, and then, the advertisements clicked by the users belonging to the same group constitute a user click collection, Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements; and the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements; The semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Degree. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.
图6为本发明实施例提供的另一种广告的相似度处理装置的结构示意图,在图5所示实施例的基础上,如图6所示,本实施例提供的装置中,第一确定单元32,包括:FIG. 6 is a schematic structural diagram of another similarity processing apparatus for an advertisement according to an embodiment of the present invention. On the basis of the embodiment shown in FIG. 5, as shown in FIG. 6, in the apparatus provided in this embodiment, the first determination is performed. Unit 32, comprising:
第一建立模块321,用于根据广告文本集合,建立语义相似度目标函数。The first establishing module 321 is configured to establish a semantic similarity objective function according to the set of advertisement texts.
第一求解模块322,用于对语义相似度目标函数进行求解,以确定在语义相似度目标函数的最优状态下的第一广告的广告文本整体的向量化表示、 以及第二广告的广告文本整体的向量化表示。The first solving module 322 is configured to solve the semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement and an advertisement text of the second advertisement in an optimal state of the semantic similarity objective function. The overall vectorized representation.
第一确定模块323,用于根据第一广告的广告文本整体的向量化表示、以及第二广告的广告文本整体的向量化表示,确定语义相似度。The first determining module 323 is configured to determine a semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
第一建立模块321,包括:The first establishing module 321 includes:
第一建立子模块3211,用于根据广告文本集合,建立广告文本集合中的第w t个特征信息的第一预设函数
Figure PCTCN2018105093-appb-000041
其中,b表示预设的第一偏差值,U表示预设的第一参数向量,h(w t-k,…,w t+k;W)表示形式化函数,W表示广告文本集合中的第w t个特征信息,w t-k表示广告文本集合中第t-k个特征信息,w t+k表示广告文本集合中第t+k个特征信息,k表示待建立的语义相似度目标函数的窗口大小,t∈[k,T],T表示广告文本集合中的特征信息的个数总和,k、t、T都是正整数。
a first establishing sub-module 3211, configured to establish, according to the advertisement text set, a first preset function of the tth feature information in the advertisement text set
Figure PCTCN2018105093-appb-000041
Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w tk ,..., w t+k ;W) denotes a formal function, and W denotes a wth in the advertisement text set t feature information, w tk represents the tkth feature information in the advertisement text set, w t+k represents the t+kth feature information in the advertisement text set, and k represents the window size of the semantic similarity objective function to be established, t ∈[k,T],T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are all positive integers.
第二建立子模块3212,用于根据广告文本集合,建立第一概率分布函数
Figure PCTCN2018105093-appb-000042
其中,i∈[t-k,t+k],i是正整数;w t表示广告文本集合中第t个特征信息。
a second establishing sub-module 3212, configured to establish a first probability distribution function according to the set of advertisement texts
Figure PCTCN2018105093-appb-000042
Where i∈[tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set.
第三建立子模块3213,用于根据广告文本集合中的第w t个特征信息的第一预设函数,以及第一概率分布函数,建立语义相似度目标函数
Figure PCTCN2018105093-appb-000043
Establishing a third sub-module 3213, a first information according to a first predetermined function is used to set advertising texts w t of features, and a first probability distribution function, a semantic similarity objective function
Figure PCTCN2018105093-appb-000043
语义相似度为
Figure PCTCN2018105093-appb-000044
其中,A表示第一广告的广告文本整体的向量化表示,B表示第二广告的广告文本整体的向量化表示。
Semantic similarity is
Figure PCTCN2018105093-appb-000044
Wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.
第二确定单元33,包括:The second determining unit 33 includes:
第二建立模块331,用于根据用户点击集合,建立点击相似度目标函数。The second establishing module 331 is configured to establish a click similarity objective function according to the user clicking the set.
第二求解模块332,用于对点击相似度目标函数进行求解,以确定在点击相似度目标函数的最优状态下的第一广告的向量化表示、以及第二广告的 向量化表示。The second solving module 332 is configured to solve the click similarity objective function to determine a vectorized representation of the first advertisement and a vectorized representation of the second advertisement in an optimal state of clicking the similarity objective function.
第二确定模块333,用于根据第一广告的向量化表示、以及第二广告的向量化表示,确定点击相似度。The second determining module 333 is configured to determine the click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
第二建立模块331,包括:The second establishing module 331 includes:
第四建立子模块3311,用于根据用户点击集合,建立用户点击集合中的第w′ t个广告的特征信息的第二预设函数
Figure PCTCN2018105093-appb-000045
其中,b′表示预设的第二偏差值,U′表示预设的第二参数向量,h′(w′ t-k,…,w′ t+k;W′)表示形式化函数,W′表示用户点击集合中的第w′ t个广告的特征信息,w′ t′-k′表示用户点击集合中第t′-k′个广告的特征信息,w′ t′+k′表示用户点击集合中第t′+k′个广告的特征信息,k′表示待建立的点击相似度目标函数的窗口大小,t′∈[k′,T′],T′表示用户点击集合中的广告的个数总和,k′、t′、T′都是正整数。
The fourth establishing sub-module 3311 is configured to establish, according to the user click set, a second preset function of the feature information of the w′ t advertisements in the user click collection
Figure PCTCN2018105093-appb-000045
Where b' represents a preset second deviation value, U' represents a preset second parameter vector, h'( w'tk , ..., w't +k ; W') represents a formalized function, and W' represents The user clicks on the feature information of the w′th advertisements in the collection, w′ t′ -k′ indicates that the user clicks on the feature information of the t′ -k′ advertisements in the collection, and w′ t′+k′ indicates that the user clicks on the collection. The feature information of the t'+k' advertisement in the middle, k' represents the window size of the click similarity objective function to be established, t'∈[k', T'], T' indicates the user clicks on the advertisement in the set The sum of numbers, k', t', T' are all positive integers.
第五建立子模块3312,用于根据用户点击集合,建立第二概率分布函数
Figure PCTCN2018105093-appb-000046
其中,i′∈[t′-k′,t′+k′],i′是正整数;w′ t′表示用户点击集合中第t′个广告的特征信息。
a fifth establishing sub-module 3312, configured to establish a second probability distribution function according to the user clicking the set
Figure PCTCN2018105093-appb-000046
Where i'∈[t'-k',t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the collection.
第六建立子模块3313,用于根据用户点击集合中的第w′ t个广告的特征信息的第二预设函数,以及第二概率分布函数,建立点击相似度目标函数
Figure PCTCN2018105093-appb-000047
Establishing a sixth sub-module 3313, for a second predetermined function of the feature information in accordance with user clicks set w 't of the ad, and a second probability distribution function, a similarity objective function clicking
Figure PCTCN2018105093-appb-000047
点击相似度为
Figure PCTCN2018105093-appb-000048
其中,C表示第一广告的向量化表示,D表示第二广告的向量化表示。
Click similarity to
Figure PCTCN2018105093-appb-000048
Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.
第三确定单元34,包括:The third determining unit 34 includes:
获取模块341,用于获取第二广告的用户点击频次。The obtaining module 341 is configured to obtain a user click frequency of the second advertisement.
第三确定模块342,用于根据用户点击频次、语义相似度和点击相似度,确定相似度信息。The third determining module 342 is configured to determine the similarity information according to the user click frequency, the semantic similarity, and the click similarity.
相似度信息为Sim=(1/log(TF))*Sim content+Sim session;其中,TF表示用户点击频次,Sim content表示语义相似度,Sim session表示点击相似度。 The similarity information is Sim=(1/log(TF))*Sim content +Sim session ; where TF represents the frequency of user clicks, Sim content represents semantic similarity, and Sim session represents click similarity.
本实施例的广告的相似度处理装置可执行本发明实施例提供的另一种广告的相似度处理方法,其实现原理相类似,此处不再赘述。The similarity processing device of the advertisement of the embodiment of the present invention can perform another similarity processing method for the advertisement provided by the embodiment of the present invention, and the implementation principle thereof is similar, and details are not described herein again.
图7示出了根据本发明一实施例可用于实现上述广告的相似度处理方法的计算设备的结构示意图。FIG. 7 is a block diagram showing the structure of a computing device that can be used to implement the similarity processing method of the above advertisement according to an embodiment of the present invention.
参见图7,计算设备700包括存储器710和处理器720。Referring to FIG. 7, computing device 700 includes a memory 710 and a processor 720.
处理器720可以是一个多核的处理器,也可以包含多个处理器。在一些实施例中,处理器720可以包含一个通用的主处理器以及一个或多个特殊的协处理器,例如图形处理器(GPU)、数字信号处理器(DSP)等等。在一些实施例中,处理器720可以使用定制的电路实现,例如特定用途集成电路(ASIC,Application Specific Integrated Circuit)或者现场可编程逻辑门阵列(FPGA,Field Programmable Gate Arrays)。The processor 720 can be a multi-core processor or multiple processors. In some embodiments, processor 720 can include a general purpose main processor and one or more special coprocessors, such as a graphics processing unit (GPU), a digital signal processor (DSP), and the like. In some embodiments, the processor 720 can be implemented using a custom circuit, such as an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA).
存储器710可以包括各种类型的存储单元,例如系统内存、只读存储器(ROM),和永久存储装置。其中,ROM可以存储处理器720或者计算机的其他模块需要的静态数据或者指令。永久存储装置可以是可读写的存储装置。永久存储装置可以是即使计算机断电后也不会失去存储的指令和数据的非易失性存储设备。在一些实施方式中,永久性存储装置采用大容量存储装置(例如磁或光盘、闪存)作为永久存储装置。另外一些实施方式中,永久性存储装置可以是可移除的存储设备(例如软盘、光驱)。系统内存可以是可读写存储设备或者易失性可读写存储设备,例如动态随机访问内存。系统内存可以存储一些或者所有处理器在运行时需要的指令和数据。此外,存储器710可以包括任意计算机可读存储媒介的组合,包括各种类型的半导体存储芯片(DRAM,SRAM,SDRAM,闪存,可编程只读存储器),磁盘和/或光盘也可以采用。在一些实施方式中,存储器710可以包括可读和/或写的 可移除的存储设备,例如激光唱片(CD)、只读数字多功能光盘(例如DVD-ROM,双层DVD-ROM)、只读蓝光光盘、超密度光盘、闪存卡(例如SD卡、min SD卡、Micro-SD卡等等)、磁性软盘等等。计算机可读存储媒介不包含载波和通过无线或有线传输的瞬间电子信号。Memory 710 can include various types of storage units, such as system memory, read only memory (ROM), and persistent storage. Among them, the ROM can store static data or instructions required by the processor 720 or other modules of the computer. The persistent storage device can be a readable and writable storage device. The persistent storage device may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (eg, magnetic or optical disk, flash memory) as the permanent storage device. In other embodiments, the persistent storage device can be a removable storage device (eg, a floppy disk, an optical drive). The system memory can be a readable and writable storage device or a volatile read/write storage device, such as dynamic random access memory. System memory can store instructions and data that some or all of the processors need at runtime. Moreover, memory 710 can include any combination of computer readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read only memory), and magnetic disks and/or optical disks can also be employed. In some embodiments, the memory 710 can include a removable storage device readable and/or writable, such as a compact disc (CD), a read-only digital versatile disc (eg, a DVD-ROM, a dual layer DVD-ROM), Read-only Blu-ray discs, ultra-density discs, flash cards (such as SD cards, min SD cards, Micro-SD cards, etc.), magnetic floppy disks, and so on. The computer readable storage medium does not include a carrier wave and an instantaneous electronic signal transmitted by wireless or wire.
存储器710上存储有可执行代码,当可执行代码被处理器720处理时,可以使处理器720执行上文述及的广告的相似度处理方法。The executable code is stored on the memory 710, and when the executable code is processed by the processor 720, the processor 720 can be caused to perform the similarity processing method of the advertisement described above.
上文中已经参考附图详细描述了根据本发明的广告的相似度处理方法和装置。The method and apparatus for processing similarity of an advertisement according to the present invention have been described in detail above with reference to the accompanying drawings.
本实施例通过获取广告文本集合,其中,广告文本集合中包括第一广告的广告文本整体的特征信息、第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,用户点击集合中包括第一广告的特征信息、第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告;根据广告文本集合,确定第一广告与第二广告之间的语义相似度;根据用户点击集合,确定第一广告与第二广告之间的点击相似度;根据语义相似度和点击相似度,确定第一广告与第二广告之间的相似度信息。从而通过对海量的广告中的词语进行提取,根据神经网络模型对海量的广告中的词语进行分析,可以对短文本的广告和长文本的广告都进行分析,便于提取广告中的主题和关键信息;并且,可以从用户点击广告的行为的角度进行分析,去获取到属于同一群体的用户所点击的海量的广告,然后,将属于同一群体的用户所点击过的广告构成一个用户点击集合,去对用户点击集合中的所有广告的特征进行分析,有利于进行广告归类;并且以上过程都是对海量的广告数据进行的分析,可以更准确的确定出广告之间相似性;然后将根据广告文本集合计算得到的语义相似度、以及根据用户点击集合计算得到 的点击相似度进行计算,计算得到第一广告与第二广告之间的相似度信息,即第二广告在多大程度上是与第一广告相似的,可以准确定的确定出广告之间的相似度。进而可以根据以上过程确定出所有广告之间相似度,从而在向用户推送广告的时候,可以将相似的广告推送给用户。In this embodiment, the advertisement text set is obtained, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring a user click set, wherein The user clicks the feature information of the first advertisement, the feature information of the second advertisement, and the feature information of the other advertisements that the at least one user clicks. The first advertisement and the second advertisement are also advertisements that the at least one user clicks on; Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts; determining a click similarity between the first advertisement and the second advertisement according to the user click collection; determining according to semantic similarity and click similarity Similarity information between the first advertisement and the second advertisement. Therefore, by extracting the words in the massive advertisements and analyzing the words in the massive advertisements according to the neural network model, the short text advertisements and the long text advertisements can be analyzed, so that the topics and key information in the advertisements can be easily extracted. And, from the perspective of the behavior of the user clicking on the advertisement, to obtain a large number of advertisements clicked by the users belonging to the same group, and then, the advertisements clicked by the users belonging to the same group constitute a user click collection, Analysis of the characteristics of all the ads in the user click collection is conducive to the classification of advertisements; and the above process is an analysis of a large amount of advertising data, which can more accurately determine the similarity between advertisements; The semantic similarity calculated by the text collection and the click similarity calculated according to the user click set are calculated, and the similarity information between the first advertisement and the second advertisement is calculated, that is, to what extent is the second advertisement If the advertisement is similar, the phase between the advertisements can be determined Similarity. Further, the similarity between all the advertisements can be determined according to the above process, so that similar advertisements can be pushed to the user when the advertisement is pushed to the user.
在本发明所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of cells is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本发明各个实施例的方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium. The software functional unit described above is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform portions of the methods of various embodiments of the present invention. step. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本发明旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求书指出。Other embodiments of the present disclosure will be apparent to those skilled in the <RTIgt; The present invention is intended to cover any variations, uses, or adaptations of the present disclosure, which are in accordance with the general principles of the present disclosure and include common general knowledge or common technical means in the art that are not disclosed in the present disclosure. . The specification and examples are to be regarded as illustrative only,
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求书来限制。It is to be understood that the invention is not limited to the details of the details and The scope of the disclosure is to be limited only by the appended claims.

Claims (20)

  1. 一种广告的相似度处理方法,其特征在于,包括:A similarity processing method for an advertisement, which is characterized by comprising:
    获取广告文本集合,其中,所述广告文本集合中包括第一广告的广告文本整体的特征信息、所述第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、所述第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及所述至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,所述用户点击集合中包括所述第一广告的特征信息、所述第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,所述第一广告和所述第二广告也是所述至少一个用户点击过的广告;Obtaining a set of advertisement text, wherein the set of advertisement text includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and feature information of the entire advertisement text of the second advertisement, Feature information of each word in the second advertisement, feature information of the entire advertisement text of the at least one other advertisement, and feature information of each of the other advertisements of the at least one other advertisement, and acquiring the user Clicking on the collection, wherein the user clicks the set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of at least one other advertisement that the user clicked, the first advertisement and the The second advertisement is also an advertisement that the at least one user clicked on;
    根据所述广告文本集合,确定所述第一广告与所述第二广告之间的语义相似度;Determining a semantic similarity between the first advertisement and the second advertisement according to the set of advertisement texts;
    根据所述用户点击集合,确定所述第一广告与所述第二广告之间的点击相似度;Determining a click similarity between the first advertisement and the second advertisement according to the user clicking the set;
    根据所述语义相似度和所述点击相似度,确定所述第一广告与所述第二广告之间的相似度信息。And determining similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
  2. 根据权利要求1所述的方法,其特征在于,根据所述广告文本集合,确定所述第一广告与所述第二广告之间的语义相似度,包括:The method according to claim 1, wherein determining a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set comprises:
    根据所述广告文本集合,建立语义相似度目标函数;Establishing a semantic similarity objective function according to the set of advertisement texts;
    对所述语义相似度目标函数进行求解,以确定在所述语义相似度目标函数的最优状态下的所述第一广告的广告文本整体的向量化表示、以及所述第二广告的广告文本整体的向量化表示;Solving the semantic similarity objective function to determine a vectorized representation of the entire advertisement text of the first advertisement and an advertisement text of the second advertisement in an optimal state of the semantic similarity objective function Overall vectorized representation;
    根据所述第一广告的广告文本整体的向量化表示、以及所述第二广告的广告文本整体的向量化表示,确定所述语义相似度。The semantic similarity is determined according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述广告文本集 合,建立语义相似度目标函数,包括:The method according to claim 2, wherein said establishing a semantic similarity objective function according to said set of advertisement texts comprises:
    根据所述广告文本集合,建立所述广告文本集合中的第w t个特征信息的第一预设函数
    Figure PCTCN2018105093-appb-100001
    其中,b表示预设的第一偏差值,U表示预设的第一参数向量,h(w t-k,...,w t+k;W)表示形式化函数,W表示所述广告文本集合中的第w t个特征信息,w t-k表示所述广告文本集合中第t-k个特征信息,w t+k表示所述广告文本集合中第t+k个特征信息,k表示待建立的所述语义相似度目标函数的窗口大小,t∈[k,T],T表示所述广告文本集合中的特征信息的个数总和,k、t、T都是正整数;
    Establishing a first preset function of the tth feature information in the set of advertisement texts according to the set of advertisement texts
    Figure PCTCN2018105093-appb-100001
    Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w tk , . . . , w t+k ; W) denotes a formal function, and W denotes the advertisement text collection The t tth feature information in the t tk represents the tkth feature information in the advertisement text set, w t+k represents the t+kth feature information in the advertisement text set, and k represents the to-be-established The window size of the semantic similarity objective function, t∈[k,T], T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are positive integers;
    根据所述广告文本集合,建立第一概率分布函数
    Figure PCTCN2018105093-appb-100002
    其中,i∈[t-k,t+k],i是正整数;w t表示所述广告文本集合中第t个特征信息;
    Establishing a first probability distribution function according to the set of advertisement texts
    Figure PCTCN2018105093-appb-100002
    Where i∈[tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set;
    根据所述广告文本集合中的第w t个特征信息的第一预设函数,以及所述第一概率分布函数,建立所述语义相似度目标函数
    Figure PCTCN2018105093-appb-100003
    Preset information according to a first function of the advertisement text w t of the feature set, and the first probability distribution function, establishing the semantic similarity objective function
    Figure PCTCN2018105093-appb-100003
  4. 根据权利要求2所述的方法,其特征在于,所述语义相似度为
    Figure PCTCN2018105093-appb-100004
    其中,A表示所述第一广告的广告文本整体的向量化表示,B表示所述第二广告的广告文本整体的向量化表示。
    The method of claim 2 wherein said semantic similarity is
    Figure PCTCN2018105093-appb-100004
    Wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.
  5. 根据权利要求1所述的方法,其特征在于,根据所述用户点击集合,确定所述第一广告与所述第二广告之间的点击相似度,包括:The method according to claim 1, wherein determining a click similarity between the first advertisement and the second advertisement according to the user click set comprises:
    根据所述用户点击集合,建立点击相似度目标函数;Establishing a click similarity objective function according to the user clicking the set;
    对所述点击相似度目标函数进行求解,以确定在所述点击相似度目标函数的最优状态下的所述第一广告的向量化表示、以及所述第二广告的向量化表示;Solving the click similarity objective function to determine a vectorized representation of the first advertisement and an vectorized representation of the second advertisement in an optimal state of the click similarity objective function;
    根据所述第一广告的向量化表示、以及所述第二广告的向量化表示,确定所述点击相似度。The click similarity is determined based on the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
  6. 根据权利要求5所述的方法,其特征在于,根据所述用户点击集合,建立点击相似度目标函数,包括:The method according to claim 5, wherein the click similarity objective function is established according to the user click set, including:
    根据所述用户点击集合,建立所述用户点击集合中的第w′ t个广告的特征信息的第二预设函数
    Figure PCTCN2018105093-appb-100005
    其中,b′表示预设的第二偏差值,U′表示预设的第二参数向量,h′(w′ t-k,...,w′ t+k;W′)表示形式化函数,W′表示所述用户点击集合中的第w′ t个广告的特征信息,w′ t′-k′表示所述用户点击集合中第t′-k′个广告的特征信息,w′ t′+k′表示所述用户点击集合中第t′+k′个广告的特征信息,k′表示待建立的所述点击相似度目标函数的窗口大小,t′∈[k′,T′],T′表示所述用户点击集合中的广告的个数总和,k′、t′、T′都是正整数;
    Establishing, according to the user click set, a second preset function of the feature information of the w′ t advertisements in the user click set
    Figure PCTCN2018105093-appb-100005
    Wherein b' represents a preset second deviation value, U' represents a preset second parameter vector, and h'( w'tk , ..., w't +k ; W') represents a formalized function, 'represents the user clicks on the set of w' wherein the ad information of t, w 't'-k' indicates the user clicks' ad the characteristic information, w 'in the first set t'-k t' + k' indicates that the user clicks on the feature information of the t'+k'th advertisement in the set, and k' represents the window size of the click similarity degree objective function to be established, t'∈[k', T'], T ' indicates that the user clicks on the sum of the number of advertisements in the collection, k', t', T' are all positive integers;
    根据所述用户点击集合,建立第二概率分布函数
    Figure PCTCN2018105093-appb-100006
    其中,i′∈[t′-k′,t′+k′],i′是正整数;w′ t′表示所述用户点击集合中第t′个广告的特征信息;
    Establishing a second probability distribution function according to the user clicking on the set
    Figure PCTCN2018105093-appb-100006
    Where i'∈[t'-k', t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the set;
    根据所述用户点击集合中的第w′ t个广告的特征信息的第二预设函数,以及所述第二概率分布函数,建立所述点击相似度目标函数
    Figure PCTCN2018105093-appb-100007
    Establishing the click similarity objective function according to a second preset function of the feature information of the w′ t advertisements in the set of the user clicks, and the second probability distribution function
    Figure PCTCN2018105093-appb-100007
  7. 根据权利要求5所述的方法,其特征在于,所述点击相似度为
    Figure PCTCN2018105093-appb-100008
    其中,C表示所述第一广告的向量化表示,D表示所述第二广告的向量化表示。
    The method of claim 5 wherein said click similarity is
    Figure PCTCN2018105093-appb-100008
    Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.
  8. 根据权利要求1-7任一项所述的方法,其特征在于,根据所述语义相似度和所述点击相似度,确定所述第一广告与所述第二广告之间的相似度信 息,包括:The method according to any one of claims 1 to 7, wherein the similarity information between the first advertisement and the second advertisement is determined according to the semantic similarity and the click similarity, include:
    获取所述第二广告的用户点击频次;Obtaining a frequency of user clicks of the second advertisement;
    根据所述用户点击频次、所述语义相似度和所述点击相似度,确定所述相似度信息。The similarity information is determined according to the user click frequency, the semantic similarity, and the click similarity.
  9. 根据权利要求8所述的方法,其特征在于,所述相似度信息为Sim=(1/log(TF))*Sim content+Sim sessionThe method according to claim 8, wherein the similarity information is Sim=(1/log(TF))*Sim content +Sim session ;
    其中,TF表示所述用户点击频次,Sim content表示所述语义相似度,Sim session表示所述点击相似度。 Wherein, TF represents the user click frequency, Sim content represents the semantic similarity, and Sim session represents the click similarity.
  10. 一种广告的相似度处理装置,其特征在于,包括:An apparatus for processing similarity of an advertisement, comprising:
    获取单元,用于获取广告文本集合,其中,所述广告文本集合中包括第一广告的广告文本整体的特征信息、所述第一广告中的每一个词语的特征信息、第二广告的广告文本整体的特征信息、所述第二广告中的每一个词语的特征信息、至少一个其他广告的广告文本整体的特征信息、以及所述至少一个其他广告的每一个其他广告中的每一个词语的特征信息,并获取用户点击集合,其中,所述用户点击集合中包括所述第一广告的特征信息、所述第二广告的特征信息、以及至少一个用户点击过的其他广告的特征信息,第一广告和第二广告也是上述至少一个用户点击过的广告;An obtaining unit, configured to obtain an advertisement text set, wherein the advertisement text set includes feature information of the entire advertisement text of the first advertisement, feature information of each word in the first advertisement, and advertisement text of the second advertisement Overall feature information, feature information of each of the second advertisements, feature information of the advertisement text of at least one other advertisement, and characteristics of each of the other advertisements of the at least one other advertisement Information, and obtaining a user click set, wherein the user click set includes feature information of the first advertisement, feature information of the second advertisement, and feature information of at least one other advertisement that the user clicked, first The advertisement and the second advertisement are also advertisements that the at least one user has clicked on;
    第一确定单元,用于根据所述广告文本集合,确定所述第一广告与所述第二广告之间的语义相似度;a first determining unit, configured to determine a semantic similarity between the first advertisement and the second advertisement according to the advertisement text set;
    第二确定单元,用于根据所述用户点击集合,确定所述第一广告与所述第二广告之间的点击相似度;a second determining unit, configured to determine, according to the user click set, a click similarity between the first advertisement and the second advertisement;
    第三确定单元,用于根据所述语义相似度和所述点击相似度,确定所述第一广告与所述第二广告之间的相似度信息。a third determining unit, configured to determine similarity information between the first advertisement and the second advertisement according to the semantic similarity and the click similarity.
  11. 根据权利要求10所述的装置,其特征在于,所述第一确定单元,包括:The device according to claim 10, wherein the first determining unit comprises:
    第一建立模块,用于根据所述广告文本集合,建立语义相似度目标函数;a first establishing module, configured to establish a semantic similarity objective function according to the set of advertisement texts;
    第一求解模块,用于对所述语义相似度目标函数进行求解,以确定在所述语义相似度目标函数的最优状态下的所述第一广告的广告文本整体的向量化表示、以及所述第二广告的广告文本整体的向量化表示;a first solving module, configured to solve the semantic similarity objective function to determine a vectorized representation of the overall advertisement text of the first advertisement in an optimal state of the semantic similarity objective function, and a vectorized representation of the overall advertising text of the second advertisement;
    第一确定模块,用于根据所述第一广告的广告文本整体的向量化表示、以及所述第二广告的广告文本整体的向量化表示,确定所述语义相似度。The first determining module is configured to determine the semantic similarity according to a vectorized representation of the entire advertisement text of the first advertisement and a vectorized representation of the entire advertisement text of the second advertisement.
  12. 根据权利要求11所述的装置,其特征在于,所述第一建立模块,包括:The device according to claim 11, wherein the first establishing module comprises:
    第一建立子模块,用于根据所述广告文本集合,建立所述广告文本集合中的第w t个特征信息的第一预设函数
    Figure PCTCN2018105093-appb-100009
    其中,b表示预设的第一偏差值,U表示预设的第一参数向量,h(w t-k,...,w t+k;W)表示形式化函数,W表示所述广告文本集合中的第w t个特征信息,w t-k表示所述广告文本集合中第t-k个特征信息,w t+k表示所述广告文本集合中第t+k个特征信息,k表示待建立的所述语义相似度目标函数的窗口大小,t∈[k,T],T表示所述广告文本集合中的特征信息的个数总和,k、t、T都是正整数;
    a first establishing submodule, configured to establish, according to the advertisement text set, a first preset function of the tth feature information in the advertisement text set
    Figure PCTCN2018105093-appb-100009
    Where b denotes a preset first deviation value, U denotes a preset first parameter vector, h(w tk , . . . , w t+k ; W) denotes a formal function, and W denotes the advertisement text collection The t tth feature information in the t tk represents the tkth feature information in the advertisement text set, w t+k represents the t+kth feature information in the advertisement text set, and k represents the to-be-established The window size of the semantic similarity objective function, t∈[k,T], T represents the sum of the number of feature information in the advertisement text set, and k, t, and T are positive integers;
    第二建立子模块,用于根据所述广告文本集合,建立第一概率分布函数
    Figure PCTCN2018105093-appb-100010
    其中,i∈[t-k,t+k],i是正整数;w t表示所述广告文本集合中第t个特征信息;
    a second establishing submodule, configured to establish a first probability distribution function according to the set of advertisement texts
    Figure PCTCN2018105093-appb-100010
    Where i∈[tk,t+k], i is a positive integer; w t represents the t-th feature information in the advertisement text set;
    第三建立子模块,用于根据所述广告文本集合中的第w t个特征信息的第一预设函数,以及所述第一概率分布函数,建立所述语义相似度目标函数
    Figure PCTCN2018105093-appb-100011
    Establishing a third sub-module, a first predetermined function of the advertisement text w t features set information, the semantic similarity objective function and the first probability distribution function, according to established
    Figure PCTCN2018105093-appb-100011
  13. 根据权利要求11所述的装置,其特征在于,所述语义相似度为
    Figure PCTCN2018105093-appb-100012
    其中,A表示所述第一广告的广告文本整体的向量化表示,B表示 所述第二广告的广告文本整体的向量化表示。
    The apparatus of claim 11 wherein said semantic similarity is
    Figure PCTCN2018105093-appb-100012
    Wherein, A represents a vectorized representation of the entire advertisement text of the first advertisement, and B represents a vectorized representation of the entire advertisement text of the second advertisement.
  14. 根据权利要求10所述的装置,其特征在于,所述第二确定单元,包括:The device according to claim 10, wherein the second determining unit comprises:
    第二建立模块,用于根据所述用户点击集合,建立点击相似度目标函数;a second establishing module, configured to establish a click similarity objective function according to the user clicking the set;
    第二求解模块,用于对所述点击相似度目标函数进行求解,以确定在所述点击相似度目标函数的最优状态下的所述第一广告的向量化表示、以及所述第二广告的向量化表示;a second solving module, configured to solve the click similarity objective function to determine a vectorized representation of the first advertisement and an optional second advertisement in an optimal state of the click similarity objective function Vectorized representation
    第二确定模块,用于根据所述第一广告的向量化表示、以及所述第二广告的向量化表示,确定所述点击相似度。a second determining module, configured to determine the click similarity according to the vectorized representation of the first advertisement and the vectorized representation of the second advertisement.
  15. 根据权利要求14所述的装置,其特征在于,所述第二建立模块,包括:The device according to claim 14, wherein the second establishing module comprises:
    第四建立子模块,用于根据所述用户点击集合,建立所述用户点击集合中的第w′ t个广告的特征信息的第二预设函数
    Figure PCTCN2018105093-appb-100013
    其中,b′表示预设的第二偏差值,U′表示预设的第二参数向量,h′(w′ t-k,...,w′ t+k;W′)表示形式化函数,W′表示所述用户点击集合中的第w′ t个广告的特征信息,w′ t′-k′表示所述用户点击集合中第t′-k′个广告的特征信息,w′ t′+k′表示所述用户点击集合中第t′+k′个广告的特征信息,k′表示待建立的所述点击相似度目标函数的窗口大小,t′∈[k′,T′],T′表示所述用户点击集合中的广告的个数总和,k′、t′、T′都是正整数;
    a fourth establishing submodule, configured to establish, according to the user click set, a second preset function of the feature information of the w′ t advertisements in the user click set
    Figure PCTCN2018105093-appb-100013
    Wherein b' represents a preset second deviation value, U' represents a preset second parameter vector, and h'( w'tk , ..., w't +k ; W') represents a formalized function, 'represents the user clicks on the set of w' wherein the ad information of t, w 't'-k' indicates the user clicks' ad the characteristic information, w 'in the first set t'-k t' + k' indicates that the user clicks on the feature information of the t'+k'th advertisement in the set, and k' represents the window size of the click similarity degree objective function to be established, t'∈[k', T'], T ' indicates that the user clicks on the sum of the number of advertisements in the collection, k', t', T' are all positive integers;
    第五建立子模块,用于根据所述用户点击集合,建立第二概率分布函数
    Figure PCTCN2018105093-appb-100014
    其中,i′∈[t′-k′,t′+k′],i′是正整数;w′ t′表示所述用户点击集合中第t′个广告的特征信息;
    a fifth establishing submodule, configured to establish a second probability distribution function according to the user clicking the set
    Figure PCTCN2018105093-appb-100014
    Where i'∈[t'-k', t'+k'], i' is a positive integer; w't ' indicates that the user clicks on the feature information of the t'th advertisement in the set;
    第六建立子模块,用于根据所述用户点击集合中的第w′ t个广告的特征信息的第二预设函数,以及所述第二概率分布函数,建立所述点击相似度目标 函数
    Figure PCTCN2018105093-appb-100015
    a sixth establishing submodule, configured to establish the click similarity objective function according to a second preset function of the feature information of the w′ t advertisements in the user click set, and the second probability distribution function
    Figure PCTCN2018105093-appb-100015
  16. 根据权利要求14所述的装置,其特征在于,所述点击相似度为
    Figure PCTCN2018105093-appb-100016
    其中,C表示所述第一广告的向量化表示,D表示所述第二广告的向量化表示。
    The device according to claim 14, wherein said click similarity is
    Figure PCTCN2018105093-appb-100016
    Where C represents a vectorized representation of the first advertisement and D represents a vectorized representation of the second advertisement.
  17. 根据权利要求10-16任一项所述的装置,其特征在于,所述第三确定单元,包括:The device according to any one of claims 10-16, wherein the third determining unit comprises:
    获取模块,用于获取所述第二广告的用户点击频次;An obtaining module, configured to acquire a user click frequency of the second advertisement;
    第三确定模块,用于根据所述用户点击频次、所述语义相似度和所述点击相似度,确定所述相似度信息。And a third determining module, configured to determine the similarity information according to the user click frequency, the semantic similarity, and the click similarity.
  18. 根据权利要求17所述的装置,其特征在于,所述相似度信息为Sim=(1/log(TF))*Sim content+Sim sessionThe apparatus according to claim 17, wherein the similarity information is Sim=(1/log(TF))*Sim content +Sim session ;
    其中,TF表示所述用户点击频次,Sim content表示所述语义相似度,Sim session表示所述点击相似度。 Wherein, TF represents the user click frequency, Sim content represents the semantic similarity, and Sim session represents the click similarity.
  19. 一种计算设备,包括:A computing device comprising:
    处理器;以及Processor;
    存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如权利要求1-9中任何一项所述的方法。A memory having executable code stored thereon that, when executed by the processor, causes the processor to perform the method of any of claims 1-9.
  20. 一种非暂时性机器可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求1至9中任一项所述的方法。A non-transitory machine readable storage medium having stored thereon executable code that, when executed by a processor of an electronic device, causes the processor to perform any of claims 1-9 Said method.
PCT/CN2018/105093 2017-12-29 2018-09-11 Advertisement similarity processing method and apparatus, calculation device, and storage medium WO2019128311A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711480255.2 2017-12-29
CN201711480255.2A CN108269122B (en) 2017-12-29 2017-12-29 Advertisement similarity processing method and device

Publications (1)

Publication Number Publication Date
WO2019128311A1 true WO2019128311A1 (en) 2019-07-04

Family

ID=62773136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/105093 WO2019128311A1 (en) 2017-12-29 2018-09-11 Advertisement similarity processing method and apparatus, calculation device, and storage medium

Country Status (2)

Country Link
CN (1) CN108269122B (en)
WO (1) WO2019128311A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866095A (en) * 2019-10-10 2020-03-06 重庆金融资产交易所有限责任公司 Text similarity determination method and related equipment
CN112381166A (en) * 2020-11-20 2021-02-19 北京百度网讯科技有限公司 Information point identification method and device and electronic equipment

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269122B (en) * 2017-12-29 2021-08-06 阿里巴巴(中国)有限公司 Advertisement similarity processing method and device
CN109189915B (en) * 2018-09-17 2021-10-15 重庆理工大学 Information retrieval method based on depth correlation matching model
CN110780968B (en) * 2019-10-31 2022-03-11 腾讯科技(深圳)有限公司 Information display method, device, equipment and storage medium
CN111681107A (en) * 2020-06-11 2020-09-18 黄锐 Real-time personalized financial product recommendation algorithm based on Embedding
CN111899049A (en) * 2020-07-23 2020-11-06 广州视源电子科技股份有限公司 Advertisement putting method, device and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520878A (en) * 2009-04-03 2009-09-02 华为技术有限公司 Method, device and system for pushing advertisements to users
CN102929928A (en) * 2012-09-21 2013-02-13 北京格致璞科技有限公司 Multidimensional-similarity-based personalized news recommendation method
CN105302880A (en) * 2015-10-14 2016-02-03 合一网络技术(北京)有限公司 Content correlation recommendation method and apparatus
US20170140429A1 (en) * 2015-11-12 2017-05-18 Yahoo! Inc. Method and system for providing advertisements based on semantic representations
CN107464132A (en) * 2017-07-04 2017-12-12 北京三快在线科技有限公司 A kind of similar users method for digging and device, electronic equipment
CN108269122A (en) * 2017-12-29 2018-07-10 广东神马搜索科技有限公司 The similarity treating method and apparatus of advertisement

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831234B (en) * 2012-08-31 2015-04-22 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN103793390B (en) * 2012-10-29 2018-05-29 阿里巴巴集团控股有限公司 Querying condition similarity determines method, Object Query method and relevant apparatus
CN103838789A (en) * 2012-11-27 2014-06-04 大连灵动科技发展有限公司 Text similarity computing method
CN104268130B (en) * 2014-09-24 2017-02-15 南开大学 Social advertising facing Twitter feasibility analysis method
CN106156023B (en) * 2015-03-23 2020-02-21 华为技术有限公司 Semantic matching method, device and system
CN105183772A (en) * 2015-08-07 2015-12-23 百度在线网络技术(北京)有限公司 Release information click rate estimation method and apparatus
CN105893484A (en) * 2016-03-29 2016-08-24 西安交通大学 Microblog Spammer recognition method based on text characteristics and behavior characteristics
CN106095841B (en) * 2016-06-05 2019-05-03 西华大学 A kind of mobile Internet advertisement recommended method based on collaborative filtering
CN107194434B (en) * 2017-06-16 2020-06-30 中国矿业大学 Moving object similarity calculation method and system based on space-time data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520878A (en) * 2009-04-03 2009-09-02 华为技术有限公司 Method, device and system for pushing advertisements to users
CN102929928A (en) * 2012-09-21 2013-02-13 北京格致璞科技有限公司 Multidimensional-similarity-based personalized news recommendation method
CN105302880A (en) * 2015-10-14 2016-02-03 合一网络技术(北京)有限公司 Content correlation recommendation method and apparatus
US20170140429A1 (en) * 2015-11-12 2017-05-18 Yahoo! Inc. Method and system for providing advertisements based on semantic representations
CN107464132A (en) * 2017-07-04 2017-12-12 北京三快在线科技有限公司 A kind of similar users method for digging and device, electronic equipment
CN108269122A (en) * 2017-12-29 2018-07-10 广东神马搜索科技有限公司 The similarity treating method and apparatus of advertisement

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866095A (en) * 2019-10-10 2020-03-06 重庆金融资产交易所有限责任公司 Text similarity determination method and related equipment
CN112381166A (en) * 2020-11-20 2021-02-19 北京百度网讯科技有限公司 Information point identification method and device and electronic equipment
CN112381166B (en) * 2020-11-20 2024-03-05 北京百度网讯科技有限公司 Information point identification method and device and electronic equipment

Also Published As

Publication number Publication date
CN108269122A (en) 2018-07-10
CN108269122B (en) 2021-08-06

Similar Documents

Publication Publication Date Title
WO2019128311A1 (en) Advertisement similarity processing method and apparatus, calculation device, and storage medium
US11301525B2 (en) Method and apparatus for processing information
CN107463605B (en) Method and device for identifying low-quality news resource, computer equipment and readable medium
US10692019B2 (en) Failure feedback system for enhancing machine learning accuracy by synthetic data generation
JP5351958B2 (en) Semantic event detection for digital content recording
US11403532B2 (en) Method and system for finding a solution to a provided problem by selecting a winner in evolutionary optimization of a genetic algorithm
US11238364B2 (en) Learning from distributed data
US9183293B2 (en) Systems and methods for scalable topic detection in social media
US20170300575A1 (en) Methods and systems for quantifying and tracking software application quality
Malik et al. Applied unsupervised learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA
CN107193974B (en) Regional information determination method and device based on artificial intelligence
US20180239986A1 (en) Image Clustering Method, Image Clustering System, And Image Clustering Server
US11373117B1 (en) Artificial intelligence service for scalable classification using features of unlabeled data and class descriptors
CN111783039B (en) Risk determination method, risk determination device, computer system and storage medium
US11455364B2 (en) Clustering web page addresses for website analysis
Caruso et al. Deprivation and the dimensionality of welfare: a variable‐selection cluster‐analysis approach
CN107608980A (en) Information-pushing method and system based on the analysis of DPI big datas
CN111783126B (en) Private data identification method, device, equipment and readable medium
CN110929525B (en) Network loan risk behavior analysis and detection method, device, equipment and storage medium
TW201820173A (en) De-identification data generation apparatus, method, and computer program product thereof
JP2019028984A (en) System and method for clustering near-duplicate images in very large image collections, method and system for clustering multiple images, program, method for clustering multiple content items
CN110363206B (en) Clustering of data objects, data processing and data identification method
US11030539B1 (en) Consumer insights analysis using word embeddings
Jain et al. Generator based approach to analyze mutations in genomic datasets
CN107665442A (en) Obtain the method and device of targeted customer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18896736

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18896736

Country of ref document: EP

Kind code of ref document: A1