CN106529996A

CN106529996A - Deep learning-based advertisement display method and device

Info

Publication number: CN106529996A
Application number: CN201610939786.2A
Authority: CN
Inventors: 张军
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-10-24
Filing date: 2016-10-24
Publication date: 2017-03-22

Abstract

The present provides a deep learning-based advertisement display method and device. The deep learning-based advertisement display method includes the following steps that: picture sampling is performed on video content, so that a video picture sequence is obtained; an advertisement corresponding to the video picture sequence is determined according to a generated video and advertisement matching model, wherein the video and advertisement matching model is generated through deep learning-based training; and the advertisement is displayed. With the method adopted, the advertisement related to the video content can be displayed and played, and manual operation can be decreased.

Description

Advertisement display method and device based on deep learning

Technical Field

The application relates to the technical field of networks, in particular to an advertisement display method and device based on deep learning.

Background

With the popularization of network technology, more and more users watch videos through a network. While the video is being played to the user over the network, the advertisement is also typically displayed. In the related art, one way is to play a preset advertisement to a user before the video is played, and the advertisement content played in this way cannot be related to the video content in general. In order to solve the relevance problem, another way is to play advertisements related to video contents at the same time of video playing, and the relevance of the video contents and the advertisements is manually pre-associated by editors. However, this approach requires a significant amount of manual operation.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

To this end, an object of the present application is to provide an advertisement presentation method based on deep learning, which can present advertisements related to played video content and reduce manual operations.

Another object of the present application is to provide an advertisement display device based on deep learning.

In order to achieve the above object, an advertisement display method based on deep learning provided in an embodiment of a first aspect of the present application includes: carrying out picture sampling on video content to obtain a video picture sequence; determining advertisements corresponding to the video picture sequence according to the generated video and advertisement matching model, wherein the video and advertisement matching model is generated based on deep learning training; and displaying the advertisement.

According to the advertisement display method based on deep learning, the advertisements relevant to the video content are determined through the video and advertisement matching model, the advertisements relevant to the video content can be played, a large amount of manual association operation is not needed, and manual operation is reduced.

In order to achieve the above object, an advertisement display device based on deep learning according to an embodiment of a second aspect of the present application includes: the sampling module is used for carrying out picture sampling on the video content to obtain a video picture sequence; the determining module is used for determining advertisements corresponding to the video picture sequence according to the generated video and advertisement matching model, and the video and advertisement matching model is generated based on deep learning training; and the display module is used for displaying the advertisement.

According to the advertisement display device based on deep learning, the advertisement related to the video content is determined through the video and advertisement matching model, the advertisement related to the video content can be played, a large amount of manual association operation is not needed, and manual operation is reduced.

The embodiment of the present application further provides an advertisement display device based on deep learning, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: carrying out picture sampling on video content to obtain a video picture sequence; determining advertisements corresponding to the video picture sequence according to the generated video and advertisement matching model, wherein the video and advertisement matching model is generated based on deep learning training; and displaying the advertisement.

An embodiment of the present application also provides a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of a terminal, enable the terminal to perform a deep learning-based advertisement presentation method, the method including: carrying out picture sampling on video content to obtain a video picture sequence; determining advertisements corresponding to the video picture sequence according to the generated video and advertisement matching model, wherein the video and advertisement matching model is generated based on deep learning training; and displaying the advertisement.

An embodiment of the present application further provides a computer program product, wherein when being executed by an instruction processor in the computer program product, a method for advertisement presentation based on deep learning is performed, and the method includes: carrying out picture sampling on video content to obtain a video picture sequence; determining advertisements corresponding to the video picture sequence according to the generated video and advertisement matching model, wherein the video and advertisement matching model is generated based on deep learning training; and displaying the advertisement.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart illustrating a deep learning-based advertisement presentation method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a deep learning-based advertisement displaying method according to another embodiment of the present application;

FIG. 3 is a schematic flow chart diagram of a method of constructing training data in an embodiment of the present application;

FIG. 4 is a schematic diagram of an RNN structure according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an advertisement displaying apparatus based on deep learning according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an advertisement displaying apparatus based on deep learning according to another embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar modules or modules having the same or similar functionality throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.

As described above, when displaying an advertisement related to video content, the related art adopts a manual association manner, but this manner requires a large amount of manual operations. To address this problem, the present application introduces deep learning into the field of advertisement presentation.

The concept of deep learning stems from the study of artificial neural networks. A multi-layer perceptron with multiple hidden layers is a deep learning structure. Deep learning forms a more abstract class or feature of high-level representation properties by combining low-level features to discover a distributed feature representation of the data. Deep learning is a new field in machine learning research, and its motivation is to create and simulate a neural network for human brain to analyze and learn, which simulates the mechanism of human brain to interpret data such as images, sounds and texts.

Fig. 1 is a flowchart illustrating a deep learning-based advertisement presentation method according to an embodiment of the present application.

As shown in fig. 1, the method of the present embodiment includes:

s11: and carrying out picture sampling on the video content to obtain a video picture sequence.

The video content refers to video content to be played, such as a movie, a television episode, and the like.

The video content may specifically include network video content, that is, video content played through a network.

The network may be an internet accessed by a Personal Computer (PC), or may also be a mobile internet accessed by a mobile device (such as a mobile phone or a tablet Computer), and accordingly, the network video content may be played on the PC or the mobile device.

When the video content is subjected to picture sampling, the whole video content can be segmented in a preset time period to obtain a plurality of segmented time periods, one video picture is selected in each time period to serve as a video picture corresponding to the corresponding time period, and the video pictures corresponding to all the segmented time periods form a video picture sequence. The preset time period is, for example, 0.5 seconds. When the video pictures in each time period are selected, a mode of randomly selecting one picture can be adopted, or an existing or future picture sampling algorithm can be adopted.

S12: and determining advertisements corresponding to the video picture sequence according to the generated video and advertisement matching model, wherein the video and advertisement matching model is generated based on deep learning training.

The video and advertisement matching model can be generated in advance before the video content is played, and the specific flow can refer to the following description.

The input of the video and advertisement matching model is a video picture sequence, the output is advertisement information corresponding to the input, and the advertisement information can be specifically a probability value of each advertisement category in the existing advertisement categories.

For example, the video picture sequence includes T video pictures, and there are M types of existing advertisement categories, so that there are T input nodes of the video and advertisement matching model, and each input node corresponds to one video picture; the number of output nodes of the video and advertisement matching model is also T, and the output nodes correspond to one video picture respectively; the output of each output node is a vector with dimension M, and each element of the vector is a probability value of each of the M advertisement categories.

Therefore, after the video picture sequence is obtained, the video picture sequence can be used as the input of the video and advertisement matching model, and each output of the model is a vector formed by the probability value of each advertisement category corresponding to each video picture in the video picture sequence.

After determining the vector corresponding to each video picture in the video picture sequence, selecting an advertisement in the advertisement category with the maximum probability value in the vector as the advertisement corresponding to the corresponding video picture, wherein the advertisements corresponding to all the video pictures in the video picture sequence can form the advertisement sequence corresponding to the video picture sequence. Or,

after determining the vectors corresponding to the video pictures in the video picture sequence, N probability values with larger probability values (N is a preset value and is a natural number of 1 or more) can be selected from the vectors corresponding to all the video pictures, and one advertisement is selected from the advertisement categories corresponding to each selected probability value as the advertisement (N in total) corresponding to the video picture sequence.

In the above flow, the selection of the advertisement in the advertisement category may be performed in a random manner or in other manners.

S13: and displaying the advertisement.

Displaying the advertisement may refer to playing the corresponding advertisement while playing the video content.

Specifically, when determining the advertisement, the playing time corresponding to each advertisement may also be determined, and the playing time may be a time period to which the video picture corresponding to the advertisement belongs. For example, the video pictures sampled in the order of time from first to last are video picture-1, video picture-2, …, and video picture-T, respectively, the determined advertisement sequence includes advertisement-1 and advertisement-2, and advertisement-1 and advertisement-2 correspond to video picture-1 and video picture-T, respectively, then advertisement-1 is played in the video time period (e.g., the first 0.5 second time period of the entire video content) of video picture-1, and advertisement-2 is played in the video time period (e.g., the last 0.5 second time period of the entire video content) of video picture-T.

Further, the advertisement may be displayed in an internal preset area of the video content or in an area outside the video content.

Furthermore, when the advertisement is displayed in an area outside the video content, the advertisement can be displayed at different positions according to different playing terminals. For example, if the playing terminal is a PC, an advertisement can be played on the right side of the video player; and if the playing terminal is a mobile device, playing the advertisement below the played video content.

In the embodiment, the advertisement sequence corresponding to the video picture sequence of the video content is determined through the video and advertisement matching model, the advertisement related to the video content can be displayed, a large amount of manual association operation is not needed, and manual operation is reduced.

Fig. 2 is a flowchart illustrating a deep learning-based advertisement presentation method according to another embodiment of the present application.

As shown in fig. 2, the method of the present embodiment includes:

s21: and constructing training data, and storing the training data into a video and advertisement log library, wherein the training data is constructed according to the existing video content containing subtitles.

The training data is used to train the generation of video and advertisement matching models.

In this embodiment, the training data is obtained in a structured manner and stored in the video and advertisement log library, so that the training data is directly obtained from the video and advertisement log library for training when a video and advertisement matching model is generated in the following.

The training data is constructed from video content containing subtitles. The video content containing the subtitles can be collected from the existing data in a collection mode, and as much video content containing the subtitles as possible can be collected, so that the video content containing the subtitles in a large quantity can be collected, and the training data can be constructed according to the video content containing the subtitles in a large quantity.

Specifically, referring to fig. 3, the flow of the method of constructing the training data may include:

s31: and respectively taking the collected video contents containing the subtitles as current video contents.

The following process may be performed for each current video content:

s32: and sampling the current video content to obtain a video picture sequence.

The picture sampling method can be referred to the relevant contents of the above embodiments, and is not described in detail here.

S33: and extracting subtitles of all video pictures in the video picture sequence, and determining advertisements matched with the extracted subtitles as the advertisements matched with all the video pictures by adopting a text string similarity matching mode.

The extraction of subtitles from pictures can use existing or future text extraction techniques.

The text string similarity matching mode can be realized in various modes, and one specific mode comprises the following steps: respectively converting the extracted caption content (simply called caption content) and the content of each advertisement to be matched (simply called advertisement content) into vectors, calculating the similarity value between the two vectors, and selecting the advertisement corresponding to the advertisement content with the maximum similarity value as the matched advertisement. The method for converting the caption content and the advertisement content into the vector may adopt various existing or future technologies for converting the text into the vector, for example, word2vec algorithm. The similarity value between vectors is calculated, for example: the distance between vectors (e.g., cosine distance) is taken as the value of similarity between vectors.

S34: and determining advertisement information corresponding to each video picture according to the advertisement matched with each video picture, and taking a video picture sequence and a corresponding advertisement information sequence as a group of training data, wherein the advertisement information sequence is composed of the advertisement information corresponding to each video picture in the video picture sequence.

The advertisement information corresponding to each video picture can be the advertisement category to which the advertisement matched with the video picture belongs, so that the video picture sequence and the advertisement category sequence form training data.

Therefore, a set of training data can be obtained for each video content including subtitles, and a large amount of training data can be obtained by performing the above processing on a large amount of video content including subtitles.

S22: and when the matching model of the video and the advertisement needs to be trained and generated, acquiring training data from a video and advertisement log library.

Where training data may be pre-constructed and stored in the video and advertisement matching model. When the generated video and advertisement matching model needs to be trained, the training data can be directly obtained from the log library so as to improve the obtaining speed of the training data. In addition, training data are stored in a video and advertisement log library, so that a large amount of data can be accumulated, and the subsequent use requirements can be met.

S23: and determining a model structure.

In this embodiment, a deep learning manner is adopted, and thus the model structure is specifically a deep learning network structure.

One particular deep learning network structure is a Recurrent Neural Networks (RNN) structure.

The RNN includes: simple RNN, gated RNN (GatedRNN), or Long Short-term memory (LSTM).

S24: and training according to the training data and the model structure to generate a video and advertisement matching model.

As shown in fig. 4, an RNN structure includes: an input layer, a convolution and pooling (Pooling) layer, a hidden layer, and an output layer.

In this embodiment, the input layer is used for inputting a video picture sequence, and may be represented by Image1, Image2, …, and ImageT in time order. The convolution and pooling layers are used to convert each video picture in the sequence of video pictures into a picture representation Rep1, Rep2, …, RepT, respectively. The hidden layer is used to convert the picture representation into hidden layer representations h1, h2, …, hT. The output layer is configured to output a sequence of advertisement information output1, output2, …, output corresponding to the sequence of video pictures.

Taking a set of training data as an example (the remaining sets of training data may be performed with reference), as shown above, a set of training data includes: when training, the video picture sequence in the training data is used as the data of an input layer, and a model output is obtained after the data passes through a network with a structure shown in FIG. 4, wherein the output is a prediction vector after the model, the prediction vector is a vector corresponding to the video picture, and the vector consists of advertisement category probability values; in addition, the advertisement category sequence in the training data is a real value, and the real value is the advertisement category matched with each video picture.

From the predicted and true values, a loss function can be determined, and the formula of the loss function J (θ) can be expressed as:

wherein P (AD ═ AD)_i|image_i) Representing the corresponding i-th video picture image_iThe true value of the corresponding prediction vector AD after model in the video pictureReal value ad_iA probability value at the location of (a).

The training targets are: and minimizing the loss function to obtain the model parameter when the loss function value is minimum. Specifically, a gradient-based optimization algorithm may be used to perform the operation of the minimization loss function. Further, in the gradient-based optimization algorithm, a Back Propagation Through Time (BPTT) algorithm may be used to calculate the gradient of the parameter.

A gradient-based optimization algorithm whose idea is to iteratively update a randomly initialized parameter by computing the gradient (partial derivative of the parameter) of a certain set (called mini-batch size) of training samples, so that after a number of iterations the value calculated by the deep learning network from the parameter, the difference between the actual value and the value is minimized over a defined loss function.

The BPTT algorithm is an effective method in RNN networks to calculate the gradient of a parameter.

After the model parameters are determined, a video and advertisement matching model having the determined model parameters and the adopted model structure is obtained.

After the video and advertisement matching model is generated, corresponding advertisements can be played when video content is played.

S25: and carrying out picture sampling on the video content to obtain a video picture sequence.

S26: and determining advertisements corresponding to the video picture sequence according to the generated video and advertisement matching model, wherein the video and advertisement matching model is generated based on deep learning training.

S27: and displaying the advertisement.

The details of S25-S27 can be found in S11-S13, and will not be described in detail.

In the embodiment, the advertisement sequence corresponding to the video picture sequence of the video content is determined through the video and advertisement matching model, the advertisement related to the video content can be displayed, a large amount of manual association operation is not needed, and manual operation is reduced. Training data are constructed through video content with subtitles, weakly labeled training data can be constructed, and workload of manual labeling is reduced.

Fig. 5 is a schematic structural diagram of an advertisement displaying apparatus based on deep learning according to an embodiment of the present application.

As shown in fig. 5, the apparatus 50 of the present embodiment includes: a sampling module 51, a determination module 52 and a presentation module 53.

The sampling module 51 is configured to perform picture sampling on video content to obtain a video picture sequence;

a determining module 52, configured to determine, according to the generated video and advertisement matching model, an advertisement corresponding to the video picture sequence, where the video and advertisement matching model is generated based on deep learning training;

and a display module 53 for displaying the advertisement.

In some embodiments, referring to fig. 6, the apparatus 50 further comprises: a training module 54 for training a model for generating video and advertisement matches, the training module 54 comprising:

an obtaining sub-module 541, configured to obtain training data from an existing video and advertisement log library, where the video and advertisement log library is used to store the training data, and the training data is constructed according to existing video content containing subtitles;

a determination submodule 542 for determining a model structure;

and the training submodule 543 is used for training according to the training data and the model structure to generate a video and advertisement matching model.

In some embodiments, referring to fig. 6, the apparatus 50 further comprises: a construction module 55 for constructing training data, the construction module 55 being specifically configured to:

respectively taking the collected video contents containing the subtitles as current video contents, and executing the following steps corresponding to the current video contents:

sampling current video content to obtain a video picture sequence;

extracting subtitles of all video pictures in the video picture sequence, and determining advertisements matched with the extracted subtitles as advertisements matched with all the video pictures in a text string similarity matching mode;

and determining advertisement information corresponding to each video picture according to the advertisement matched with each video picture, and taking a video picture sequence and a corresponding advertisement information sequence as a group of training data, wherein the advertisement information sequence is composed of the advertisement information corresponding to each video picture in the video picture sequence.

In some embodiments, the model structure determined by the determination sub-module 542 comprises: and (5) deeply learning a network structure.

In some embodiments, the deep learning network structure comprises: structure of RNN.

In some embodiments, the training sub-module 543 is specifically configured to:

determining a loss function according to the training data and the model structure, wherein parameters in the loss function comprise model parameters;

minimizing a loss function by adopting a gradient-based optimization algorithm to determine model parameters, wherein the gradient of the parameters is calculated by adopting a BPTT algorithm;

and after the model parameters are determined, obtaining a video and advertisement matching model with the model parameters and the model structure.

It is understood that the apparatus of the present embodiment corresponds to the method embodiment described above, and specific contents may be referred to the related description of the method embodiment, and are not described in detail herein.

In the embodiment, the advertisements relevant to the video content are determined through the video and advertisement matching model, the advertisements relevant to the video content can be played, a large amount of manual association operation is not needed, and manual operation is reduced.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. An advertisement display method based on deep learning is characterized by comprising the following steps:

carrying out picture sampling on video content to obtain a video picture sequence;

determining advertisements corresponding to the video picture sequence according to the generated video and advertisement matching model, wherein the video and advertisement matching model is generated based on deep learning training;

and displaying the advertisement.

2. The method of claim 1, further comprising: training to generate a video and advertisement matching model, wherein the training to generate the video and advertisement matching model comprises the following steps:

acquiring training data from an existing video and advertisement log library, wherein the video and advertisement log library is used for storing the training data, and the training data is constructed according to existing video content containing subtitles;

determining a model structure;

and training according to the training data and the model structure to generate a video and advertisement matching model.

3. The method of claim 2, further comprising: constructing training data and storing the training data in a video and advertisement log library, wherein the constructing of the training data comprises:

sampling current video content to obtain a video picture sequence;

4. The method of claim 2, wherein the model structure comprises: and (5) deeply learning a network structure.

5. The method of claim 4, wherein the deep learning network structure comprises: structure of RNN.

6. The method of claim 2, wherein training based on the training data and the model structure to generate a video and advertisement matching model comprises:

7. An advertisement presentation device based on deep learning, comprising:

the sampling module is used for carrying out picture sampling on the video content to obtain a video picture sequence;

the determining module is used for determining advertisements corresponding to the video picture sequence according to the generated video and advertisement matching model, and the video and advertisement matching model is generated based on deep learning training;

and the display module is used for displaying the advertisement.

8. The apparatus of claim 7, further comprising: a training module for training a generated video and advertisement matching model, the training module comprising:

the acquisition submodule is used for acquiring training data from an existing video and advertisement log library, the video and advertisement log library is used for storing the training data, and the training data is constructed according to existing video content containing subtitles;

a determining submodule for determining a model structure;

and the training submodule is used for training according to the training data and the model structure to generate a video and advertisement matching model.

9. The apparatus of claim 8, further comprising: a construction module for constructing training data, the construction module being specifically configured to:

sampling current video content to obtain a video picture sequence;

10. The apparatus of claim 8, wherein the model structure determined by the determination sub-module comprises: and (5) deeply learning a network structure.

11. The apparatus of claim 10, wherein the deep learning network structure comprises: structure of RNN.

12. The apparatus of claim 8, wherein the training submodule is specifically configured to: