CN106792003B

CN106792003B - Intelligent advertisement insertion method and device and server

Info

Publication number: CN106792003B
Application number: CN201611224892.9A
Authority: CN
Inventors: 张仙伟; 谢文昊; 王静怡; 王潇潇; 朱养鹏
Original assignee: Xian Shiyou University
Current assignee: Xian Shiyou University
Priority date: 2016-12-27
Filing date: 2016-12-27
Publication date: 2020-04-14
Anticipated expiration: 2036-12-27
Also published as: CN106792003A

Abstract

The invention discloses an intelligent advertisement inserting method, an intelligent advertisement inserting device and a server, and belongs to the field of network communication. The intelligent advertisement inserting method comprises the following steps: acquiring an inter-cut object, wherein the inter-cut object is a video; acquiring audio information of a spot object; inquiring advertisements matched with the audio information in a preset advertisement library; and inserting the advertisement in the video according to the matching result. According to the intelligent advertisement inserting method, the automatic matching of the advertisement to the video and the inserting into the video are realized through the text comparison technology or the neural network technology, the inserting accuracy is high, the relevance of the advertisement and the video is enhanced, and the advertisement effect is good.

Description

Intelligent advertisement insertion method and device and server

Technical Field

The invention relates to the field of network communication, in particular to an intelligent advertisement inserting method and device.

Background

With the rapid development of the internet and the progress of wireless communication technology, the internet protocol television is increasingly popular with people, and the current internet protocol television adopts a P2P streaming media technology, so that the user connection is faster, the buffering time is shorter, and the playing fluency and the watching experience are higher. In the prior art, the network advertisement is generally placed before the network television starts, or is inserted between two streaming media, and a carousel form is adopted, that is, a program which is automatically arranged according to a certain play sequence by a streaming media file subjected to digital processing and encoding, and the play sequence is generally specified by a text file or an XML playlist file with a certain format.

However, when the user is aware of the intention of watching the network television program, a shielding or skipping measure is generally adopted for the advertisement positioned at the beginning of the program, or a webpage is opened in advance and the advertisement is officially watched after the advertisement is displayed in the past, and the advertisement effect of the advertisement delivery form in the prior art is very undesirable.

The prior art has at least the following disadvantages:

1. the advertisement is unrelated to the watched video program, so that the interest is not strong;

2. the advertisement is played before the program, so that the shielding performance is strong and the advertisement effect is weak.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an intelligent advertisement inserting method and device, which insert related advertisements according to video contents, are vivid and interesting and improve the advertisement effect. The technical scheme is as follows:

in one aspect, the present invention provides a method for inserting an intelligent advertisement, including:

acquiring an inter-cut object, wherein the inter-cut object is a video;

acquiring audio information of a spot object;

inquiring advertisements matched with the audio information in a preset advertisement library;

and inserting the advertisement in the video according to the matching result.

The following two ways are available for inquiring the advertisement matched with the audio information in a preset advertisement library:

in a first way

Advertisement videos are stored in the preset advertisement library, and each advertisement video is correspondingly provided with a matching tag;

the querying of the advertisement matched with the audio information in the preset advertisement library comprises:

converting the audio information into text information;

performing word segmentation processing on the text information to obtain word segmentation texts;

matching the word segmentation text with matching labels in an advertisement library;

and acquiring the corresponding advertisement video according to the matching label obtained by matching.

The second way is

Dividing the audio information into sub-audios;

selecting target sub-audio from the sub-audio to form an audio set;

inputting the audio set into a matching deep neural network model to search for matched advertisements in an advertisement library;

and acquiring advertisements corresponding to the matching results according to the matching results output by the model.

The intelligent advertisement insertion method in the second mode further includes pre-training the matching deep neural network, including:

acquiring audio set sample data, wherein the audio set sample data is marked with a matching type;

minimizing a loss function by adopting a random gradient descent method;

and training the matched deep neural network through audio set sample data and a loss function which completes minimization to obtain a model.

Preferably, the minimizing the loss function by using the random gradient descent method comprises:

obtaining the gradient of the loss function by adopting a back propagation method according to all the weights and the loss functions of the neural network;

updating the weight of the neural network by adopting a random gradient descent method according to the gradient;

the updated weights are iterated a preset number of times to minimize the loss function.

In another aspect, the present invention provides an intelligent advertisement insertion device, including:

the inter cut object module is used for acquiring inter cut objects, and the inter cut objects are videos;

the audio module is used for acquiring the audio information of the inter-cut object;

the query module is used for querying advertisements matched with the audio information in a preset advertisement library;

and the inserting module is used for inserting the advertisement in the video according to the matching result.

Optionally, the query module is implemented in two ways:

in a first mode

The intelligent advertisement insertion device also comprises an advertisement library module which is used for presetting and storing advertisement videos, and each advertisement video is correspondingly provided with a matching tag;

correspondingly, the query module comprises:

the conversion submodule is used for converting the audio information into text information;

the word segmentation sub-module is used for carrying out word segmentation processing on the text information to obtain a word segmentation text;

the matching sub-module is used for matching the word segmentation text with the matching tags in the advertisement library;

and the first obtaining submodule is used for obtaining the corresponding advertisement video according to the matching tag obtained by matching.

Mode two

The intelligent advertisement insertion device also comprises a pre-training module which is used for pre-training the matching deep neural network;

correspondingly, the query module comprises:

a division submodule for dividing the audio information into sub-audios;

the audio set sub-module is used for selecting target sub-audio from the sub-audio to form an audio set;

the network model submodule is used for inputting the audio set into a matched deep neural network model so as to search matched advertisements in an advertisement library;

and the second obtaining submodule is used for obtaining the advertisement corresponding to the matching result according to the matching result output by the model.

The pre-training module comprises:

the sample submodule is used for acquiring sample data of an audio set, and the sample data of the audio set is marked with a matching type;

a minimization submodule for minimizing the loss function using a random gradient descent method;

and the learning submodule is used for training the matched deep neural network through the audio set sample data and the loss function which completes the minimization to obtain a model.

Preferably, the minimization submodule comprises:

the gradient unit is used for obtaining the gradient of the loss function by adopting a back propagation method according to all the weights and the loss function of the neural network;

the weight updating unit is used for updating the weight of the neural network by adopting a random gradient descent method according to the gradient;

and the iteration unit is used for performing iteration on the updated weight for preset times so as to minimize the loss function.

In another aspect, the present invention provides a smart advertisement insertion server comprising one or more smart advertisement insertion devices as described above.

The technical scheme provided by the invention has the following beneficial effects:

1) the advertisement is inserted on the floating window of the video page along with the video playing content, so that the form is novel and the watching performance is strong;

2) associated advertisements are inserted by automatically attaching video playing contents, so that the method is vivid and interesting;

3) compared with the traditional advertisement, the advertisement effect is remarkable.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of a method for intelligent advertisement insertion according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for advertisement insertion according to a text-to-text conversion manner according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for advertisement insertion according to a split sub-audio method according to an embodiment of the present invention;

FIG. 4 is a flowchart of a training method for a matched deep neural network according to an embodiment of the present invention;

FIG. 5 is a flowchart of a method for obtaining sample data of an audio set to be trained according to an embodiment of the present invention;

FIG. 6 is a flow chart of a method for minimizing a model loss function according to an embodiment of the present invention;

FIG. 7 is a block diagram of an advertisement insertion device according to an embodiment of the present invention;

FIG. 8 is a block diagram of an advertisement insertion server according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a neuron in a CNN network model provided in an embodiment of the present invention;

FIG. 10 is a block diagram of an LSTM memory unit in the RNN network model according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or device.

In an embodiment of the present invention, a method for inserting a smart advertisement is provided, and referring to fig. 1, the method flow includes:

s101, acquiring the inter cut object.

Specifically, the insertion object is a network video, in this embodiment, according to the specific content of the network video, an advertisement related to the content is inserted in the video process, and the information of the advertisement includes, but is not limited to, a small floating window video advertisement, or a text advertisement, and the like.

S102, audio information of the inter-cut object is obtained.

Specifically, video is a combination of picture information and audio information, and the audio information is acquired from the video.

S103, inquiring the matched advertisement.

And searching advertisements related to the current video (audio) playing content in a preset advertisement library, wherein the advertisements are in the forms including but not limited to floating window video, floating window animation and barrage text advertisements.

And S104, inserting the advertisement according to the matching result.

And if the advertisement related to the current playing content is matched in the advertisement library, positioning the advertisement to the playing position of the related content in the video, and inserting the advertisement.

In an embodiment of the present invention, a method for performing advertisement insertion according to a text-to-text conversion manner is provided, and referring to fig. 2, the flow of the method includes:

s201, obtaining a video object to be inserted with an advertisement.

S202, acquiring audio information of the video object.

S203, converting the audio information into text information.

Specifically, the method is realized by adopting a voice recognition technology: first, a training set is established, which includes some converted audio and numbers (done by human); then, the audio is segmented into phoneme pieces by utilizing a training set, and the most possible character combination in the training set is found by utilizing a specific algorithm. Through the training, good parameters are found, and a voice conversion model is built.

And S204, performing word segmentation processing to obtain word segmentation texts.

And performing word segmentation on the converted text information, wherein the word segmentation is based on the fact that the text information is divided and segmented according to sentences to obtain a sentence with an independent sentence, and even further segmenting the independent sentence to obtain an independent word.

Except for the flow of first voice conversion and then word segmentation provided by the embodiment, the method can also be realized by adopting a mode of firstly dividing the audio according to the audio nodes to obtain a sentence of complete sub-audio and then converting each sub-audio into a text, and the processing sequence of the method is not specifically limited.

S205, matching the word segmentation text with the matching labels of the advertisement videos in the advertisement library.

Specifically, an advertisement library is preset, advertisements delivered by cooperative advertisement merchants are stored in the advertisement library, each advertisement is provided with a corresponding matching tag, and the word segmentation text is matched with the matching tags in the advertisement library. If the matched label in the advertisement library is traversed and the matched label is not matched with the current word segmentation text, the playing point without the advertisement insertion currently is indicated, and if the matched label is matched with the current word segmentation text, the playing point with the relevant advertisement capable of being inserted currently is indicated.

And S206, acquiring the corresponding advertisement video according to the matching result.

Specifically, according to the corresponding relation between the matching tag and the advertisement in the advertisement library, the matching tag is used as an index to find the corresponding advertisement.

And S207, finding the audio position corresponding to the matched word segmentation text, and inserting the advertisement video.

And storing the word segmentation text matched with the matched tag, correspondingly finding out the relevant position of the audio frequency, namely, having a playing point capable of inserting the advertisement, and inserting the relevant advertisement at the playing point. For example, the advertisement library is provided with advertisements of good pavers, the matching tag is (nut or pistachio or snack), and when the audio playing is "you are my pistachio! And when the advertisement is matched with the good product pavements, popping up a floating window on the current video page, displaying the advertisement video or advertisement words, wherein the floating window is provided with a closing button and a time-limited closing function.

In an embodiment of the present invention, a method for advertisement insertion in a split sub-audio manner is provided, and referring to fig. 3, the flow of the method includes:

and S31, training the matched deep neural network model.

A specific training method flow is shown in fig. 4, and the method flow includes:

and S311, obtaining the sample data of the audio set to be trained. The audio set sample data is composed of audio sets, each audio set is extracted from a complete audio medium, or all audio segments of the complete audio medium are combined into a video set. In contrast, extracting target sub-audio from an audio medium to form a target audio set is a preferred scheme for obtaining an audio set, and by extracting representative and important target sub-audio, sub-audio which is not beneficial to matching advertisements is effectively eliminated, so that not only is the processing burden of a neural network model reduced, the processing speed of the neural network model increased, but also interference options are removed, and the accuracy of audio matching advertisements is improved, as shown in fig. 5, the method for extracting target sub-audio has the following flows:

s3111, acquiring an audio frequency to be trained;

s3112, dividing the audio into sub-audio;

s3113, clustering the sub-audios to obtain a plurality of clusters;

s3114, selecting the sub-audio closest to the clustering center as the target sub-audio

S3115, the target sub-audio forms an audio set and is included in the sample.

Specifically, the clustering method includes a Mean Shift (Mean Shift) algorithm, a fuzzy C-means clustering algorithm, a hierarchical clustering algorithm, and the like. The algorithm principle of the Mean Shift algorithm is that a region with the circle center of o and the radius of h is randomly selected from a sample to obtain the average value of all sample points in the region, the sample density at the circle center is necessarily smaller than or equal to the sample density at the Mean value, and the steps are repeated until the sample converges to a density maximum value point; the fuzzy C-means clustering algorithm has the working principle that the algorithm divides n samples into C groups to obtain clustering centers of the groups, finally the objective function of the non-similarity index is minimized, the algorithm gives a membership degree of 0-1 to each sample point, and the degree of the samples belonging to each class is judged according to the value of the membership degree; in this embodiment, a K-means clustering algorithm is adopted to cluster sub-audio to be clustered, where for an audio set X ═ { X1, X2, …, xn }, where n is the number of sub-audio, a cluster to be divided into K ═ V1, V2, …, vk } is set, K objects are randomly selected as initial cluster centers, then the distance between each object and each seed cluster center is calculated, and each object is assigned to the cluster center closest to the object. The cluster centers and the objects assigned to them represent a cluster, once all the objects are assigned, the cluster center of each cluster is recalculated according to the existing objects in the cluster, the process is repeated continuously until the cluster centers do not change any more, the algorithm is terminated, k audio classes with larger differences are obtained by clustering, and correspondingly, one or more sub-audio(s) closest to the cluster centers in each audio class are used as target sub-audio (if the cluster centers are the sub-audio themselves, the sub-audio of the cluster centers is the target sub-audio).

And S312, learning the gradient of the loss function of the neural network so as to minimize the loss function.

And S313, training a network to finally obtain a matched deep neural network model.

The loss function in S312 is a loss function of the deep neural network, the loss function has a close relationship with the accuracy of the neural network model classification result, and in order to improve the classification accuracy of the matching deep neural network model, it is necessary to minimize the loss function, and the specific method is shown in fig. 6, where the method for minimizing the loss function includes:

s3121, calculating the gradient of the loss function by a back propagation method: the Back Propagation (BP) method is an algorithm used in combination with an optimization method, and calculates the gradient of a loss function for all weights in a network, and in vector calculus, the gradient of a certain point in a scalar field points to the direction in which the scalar field increases fastest, and the gradient is a directional derivative parameter;

s3122, feeding the gradient back to a random gradient descent method: the optimization method is not limited to the random gradient descent method, and may be a gradient descent method or a random parallel gradient descent method;

s3123, updating the weight;

s3124, judging whether the set iteration times are reached, if so, executing S3125, if not, iterating the weight to a back propagation method, namely, continuously executing S3121-S3124 by the updated weight;

and S3125, completing a minimization loss function, wherein the current loss function is a minimization result.

The manually-specified iteration times are obtained through multiple tests and experiences, for example, the iteration times are set to be 1000 times during the test, and if the value of the loss function is not reduced after the iteration times reach 200 times during the test, the iteration times can be set to be 300 times during the next test, so that the test time is saved.

And S32, acquiring the video object to be inserted with the advertisement.

And S33, acquiring the audio information of the video object.

And S34, dividing the audio information into sub-audios.

Specifically, the audio information is divided into sub-audios according to the division basis of the audio node (the middle pause point of two words in the audio).

And S35, selecting the target sub-audio to form an audio set.

Optionally, all the sub-audios obtained by audio segmentation form a target audio set, and preferably, a target sub-audio that is more favorable for advertisement matching is extracted from the audio by a clustering method, and then the target sub-audio forms the target audio set, where the clustering method is the same as the clustering step in the step S3113, and is not described herein again.

And S36, inputting the audio set to match the deep neural network model.

And S37, determining the advertisement video corresponding to the target sub-audio according to the model output.

Specifically, when the model is constructed, an advertisement ID list is preset, the corresponding relation between the model output parameters and the advertisement is set, namely the model output result is an advertisement ID number, and then the corresponding advertisement video is determined by comparing the advertisement ID list.

And S38, searching the position of the target sub audio in the video, and inserting the advertisement video.

And storing the advertisement video and the corresponding sub audio, determining the position of the advertisement video in the video according to the sub audio, and inserting the advertisement at the position.

In one embodiment of the present invention, there is provided a smart advertisement insertion apparatus, whose module architecture, referring to fig. 7, includes the following modules:

an inter-cut object module 710, configured to obtain an inter-cut object, where the inter-cut object is a video;

an audio module 720, configured to obtain audio information of the inter-cut object;

the query module 730 is configured to query an advertisement matched with the audio information in a preset advertisement library;

and the inserting module 740 is configured to insert the advertisement in the video according to the matching result.

The commercial break in one embodiment further comprises: the advertisement library module 750 is used for presetting advertisement videos, and each advertisement video is correspondingly provided with a matching tag;

correspondingly, the query module 730 includes:

a conversion sub-module 731 for converting the audio information into text information;

the word segmentation sub-module 732 is configured to perform word segmentation processing on the text information to obtain a word segmentation text;

the matching sub-module 733 is used for matching the word segmentation text with the matching tags in the advertisement library;

the first obtaining sub-module 734 is configured to obtain a corresponding advertisement video according to the matching tag obtained by matching.

The advertisement insertion device in another embodiment further comprises: a pre-training module 760 for pre-training the matching deep neural network;

correspondingly, the query module 730 includes:

a division submodule 735 for dividing the audio information into sub-audios;

the audio set submodule 736 is configured to select a target sub-audio from the sub-audios to form an audio set;

a network model submodule 737, configured to input the audio set into a matching deep neural network model to search for a matching advertisement in an advertisement library;

a second obtaining submodule 738, configured to obtain, according to the matching result output by the model, an advertisement corresponding to the matching result;

wherein the pre-training module 760 comprises:

a sample sub-module 761 for obtaining audio set sample data, the audio set sample data being marked with a matching type;

a minimization submodule 762 for minimizing a loss function using a random gradient descent method;

and the learning submodule 763 is configured to train the matching deep neural network through the audio set sample data and the loss function that completes minimization, so as to obtain a model.

Wherein the minimize sub-module 762 comprises:

a gradient unit 7621, configured to obtain a gradient of the loss function by using a back propagation method according to all weights and loss functions of the neural network;

a weight updating unit 7622, configured to update the weight of the neural network by using a random gradient descent method according to the gradient;

an iteration unit 7623, configured to iterate the updated weights a preset number of times to minimize a loss function.

In one embodiment of the present invention, a smart advertisement insertion server is provided, which comprises one or more of the above-described smart advertisement insertion devices, see fig. 8.

It should be noted that: in the above-described embodiments, when performing unified management control, the internet-of-things control device is described by way of example only with the division of the functional modules, and in practical applications, the above-described function distribution may be completed by different functional modules according to needs, that is, the internal structure of the advertisement insertion device is divided into different functional modules to complete all or part of the functions described above. In addition, the embodiment of the advertisement insertion device provided in this embodiment and the advertisement insertion method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments and are not described herein again.

In an embodiment of the present invention, for example, when the method is executed in a mobile terminal, a computer terminal, or a similar computing device, the terminal may include an RF (Radio Frequency) circuit, a memory including one or more computer-readable storage media, an input unit, a display unit, a sensor, an audio circuit, a WiFi (wireless fidelity) module, a processor including one or more processing cores, and a power supply. Wherein:

the RF circuit may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information from a base station and then sends the received downlink information to one or more processors for processing; in addition, data relating to uplink is transmitted to the base station. Typically, the RF circuitry includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like. In addition, the RF circuitry may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for mobile communications), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), email, SMS (Short messaging Service), etc.

The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required by functions (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the terminal, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may further include a memory controller to provide access to the memory by the processor and the input unit.

The input unit may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit may include a touch-sensitive surface as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations by a user (e.g., operations by a user on or near the touch-sensitive surface using a finger, a stylus, or any other suitable object or attachment) thereon or nearby, and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor, and can receive and execute commands sent by the processor. In addition, touch sensitive surfaces may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. The input unit may comprise other input devices than a touch sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit may be used to display information input by or provided to a user and various graphic user interfaces of the terminal, which may be configured by graphics, text, icons, video, and any combination thereof. The Display unit may include a Display panel, and optionally, the Display panel may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface may overlie the display panel, and when a touch operation is detected on or near the touch-sensitive surface, the touch operation is transmitted to the processor to determine the type of touch event, and the processor then provides a corresponding visual output on the display panel in accordance with the type of touch event. Although in this embodiment the touch sensitive surface and the display panel are implemented as two separate components for input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel for input and output functions.

The terminal may also include at least one sensor, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or the backlight when the terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the terminal is stationary, and can be used for applications of recognizing terminal gestures (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured in the terminal, detailed description is omitted here.

Audio circuitry, a speaker, and a microphone may provide an audio interface between the user and the terminal. The audio circuit can transmit the electric signal converted from the received audio data to the loudspeaker, and the electric signal is converted into a sound signal by the loudspeaker to be output; on the other hand, the microphone converts the collected sound signal into an electric signal, which is received by the audio circuit and converted into audio data, which is then output to the processor for processing, and then transmitted to, for example, another terminal via the RF circuit, or the audio data is output to the memory for further processing. The audio circuit may also include an earbud jack to provide communication of a peripheral headset with the terminal.

WiFi belongs to short-distance wireless transmission technology, and the terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through a WiFi module, and provides wireless broadband internet access for the user. It is understood that the WiFi module does not belong to the essential constitution of the terminal and can be omitted entirely as required within the scope not changing the essence of the invention.

The processor is a control center of the terminal, connects various parts of the whole terminal by using various interfaces and lines, and executes various functions of the terminal and processes data by running or executing software programs and/or modules stored in the memory and calling data stored in the memory, thereby integrally monitoring the terminal. Optionally, the processor may include one or more processing cores; preferably, the processor may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor.

The terminal also includes a power supply (e.g., a battery) for powering the various components, and preferably, the power supply may be logically coupled to the processor via a power management system, such that functions of managing charging, discharging, and power consumption are performed via the power management system. The power supply may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

Although not shown, the terminal may further include a camera, a bluetooth module, and the like, which will not be described herein. Specifically, in this embodiment, the display unit of the terminal is a touch screen display, the terminal further includes a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for:

acquiring an inter-cut object, wherein the inter-cut object is a video;

acquiring audio information of a spot object;

and inserting the advertisement in the video according to the matching result.

Further, advertisement videos are stored in the preset advertisement library, and each advertisement video is correspondingly provided with a matching tag;

specifically, the processor of the terminal is further configured to execute the instructions of:

converting the audio information into text information;

Alternatively, the processor of the terminal is further configured to execute the following instructions: dividing the audio information into sub-audios;

selecting target sub-audio from the sub-audio to form an audio set;

Specifically, the processor of the terminal is further configured to execute the instructions of: pre-training the matching deep neural network, comprising:

minimizing a loss function by adopting a random gradient descent method;

Through the above description of the embodiments, those skilled in the art can clearly understand that the video title generation technical solution provided by the present invention can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In one embodiment of the present invention, there is provided a computer-readable storage medium, which may be the computer-readable storage medium contained in the memory in the above-described embodiments; or it may be a separate computer-readable storage medium not incorporated in the terminal. A computer readable storage medium storing one or more programs, the one or more programs for use by one or more processors in performing a method for ad insertion, the method comprising:

acquiring an inter-cut object, wherein the inter-cut object is a video;

acquiring audio information of a spot object;

and inserting the advertisement in the video according to the matching result.

in a first way

converting the audio information into text information;

The second way is

Dividing the audio information into sub-audios;

selecting target sub-audio from the sub-audio to form an audio set;

minimizing a loss function by adopting a random gradient descent method;

In an embodiment of the present invention, a matching deep neural network model is obtained by using a CNN (convolutional neural network) model architecture, and an input data processing flow of the CNN model includes:

firstly, defining an extraction condition for extracting a target sub-audio from an audio;

secondly, extracting target sub-audio which meets the extraction condition from the video (audio) of the advertisement to be inserted;

thirdly, for each audio set consisting of target sub-audio, arranging the sub-audio in a descending order according to the classification attribution degree, wherein the classification attribution degree is defined as:

the classification attribution degree (degree in the node circle/degree of the node in the original graph) × (degree in the node circle/maximum degree in the circle subgraph).

Splicing the sample data into a three-dimensional array, wherein three dimensions are circles, image members and data channels from outside to inside, the number of the members in each circle in the three-dimensional array must be equal, the number is determined as M, the circles with the member number exceeding K intercept the member data of M members with the top ranking, and the circles with the member number less than M are complemented by 0.

The architecture design of the CNN model is as follows: the input layer comprises two 2D convolutional layers (constraint 2D _1, constraint 2D _2) and two fully connected layers (dense _1, dense _2), the constraint 2D _ input is used for inputting the neural network, the constraint 2D _ input _1(input layer) is the input layer of the neural network, no operation is performed in the layer, only the size and the type of input data are defined, and therefore, the output quantity of output is not changed.

The constraint 2D is a 2-dimensional convolutional layer, which simply reduces model parameters and data operations by parameter sharing, and the main parameters of the convolutional layer include: a. the number of convolution kernels, each convolution kernel corresponding to a feature map, the number of convolution kernels can be displayed by the number of feature maps, and the number of the feature maps in this embodiment is 64; b. the length and the width of a convolution kernel, wherein the convolution kernel is a rectangle and needs to be specified in length and width, and the volume of the convolution kernel in the embodiment is 3x 3; c. step size refers to the step size of the convolution kernel in the translation, because the convolution kernel is 2-dimensional data, accordingly, the step size is an array with a length of 2, such as (1, 1), the neurons of the convolution layer are shared by using weights (weights), and the number of weights of each neuron is equal to the length of the convolution kernel x the width of the convolution kernel.

Activation is an Activation function of a neuron, in a neural network, except for the last layer output, any neuron has an Activation function, the Activation functions of all neurons in each layer are the same, and neurons in different layers have different Activation functions. Each input edge of a neuron has a weight and each neuron has a bias (bias), in this embodiment, an activation function ReLu is used, which is defined as g (z) ═ max {0, z }.

MaxPooling2D is an operation of two-dimensional data, specifically, it takes the maximum value output in a rectangle, and the main parameters of MaxPooling2D include: pool size, meaning a rectangle, e.g. 3x 3; b. step size, refers to the length of each move, e.g., (3, 3).

The purpose of Dropout is to prevent overfitting, one of the most common problems of machine learning, to describe that the model performs much better on the training set than on the test set. That is, if a model is over-fitted, it performs well in training, but much less well in actual prediction with new data, the main parameters of Dropout include: parameter p: a value between 0 and 1 represents a probability, and when training the model, the input of the layer (i.e. the output of the previous layer) is randomly set to 0 according to p probability, for example, p is 0.2, and then 20% of the input node data is randomly set to 0, but during the prediction phase, the layer does not do anything.

Flatten acts to Flatten the two-dimensional array into one-dimensional, such as converting [ [1,2], [3,4] ] into [1,2,3,4 ].

Dense is a fully-connected layer, and generally, a Hidden layer is a fully-connected layer, and if a neuron is as shown in FIG. 9, the operation formula is as follows:

output ═ g (z), where g (z) is an activation function, and is specifically defined as above, and is not described herein again;

z＝∑_jw_jx_j+ b, wherein, x_iIs the ith input, w_iIs the weight of the ith input and b is the bias threshold.

Because of the multi-classification problem, each video circle belongs to one category, the output layer is softmax, the Loss function (Loss function) selects category cross entropy (random cross entropy) and learns the model parameters by using the random gradient descent (SGD), and the learning process is as described above for training the matching neural network.

In another embodiment of the present invention, a matching deep neural network model is obtained using an RNN (recurrent neural network) model architecture, and as with CNN, the members of each circle are sorted in descending order according to the classification attribution, unlike CNN, in RNN, a sequence related to member data is sorted, each item in the sequence corresponds to personal data of one user, and the sequence corresponding to each circle is allowed to have a different length, that is, the number of circle members may not be uniform.

The architecture design of the RNN model is as follows: contains three LSTM layers (LSTM _1, LSTM _2, LSTM _3) and two fully connected layers (dense _1, dense _ 2).

The RNN neural network inputs to the neural network by using LSTM _ input, and LSTM _ input _1(InputLayer) is an input layer of the RNN neural network, and there is no operation in this layer, and only the size and type of input data are defined, so that output quantity does not change, and fig. 10 shows the structure of an LSTM memory unit.

The full-connection layer and the over-fitting prevention layer in the RNN neural network are defined the same as the full-connection layer and the over-fitting prevention layer of the CNN neural network, respectively, and are not described herein again.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method of intelligent advertisement insertion, the method comprising:

acquiring an inter-cut object, wherein the inter-cut object is a video;

acquiring audio information of a spot object;

inquiring the advertisement matched with the audio information in a preset advertisement library, wherein the method comprises the following steps: dividing the audio information into sub-audios; selecting target sub-audio from the sub-audio to form an audio set; inputting the audio set into a matching deep neural network model to search for matched advertisements in an advertisement library; acquiring advertisements corresponding to the matching results according to the matching results output by the model, presetting an advertisement ID list when the model is constructed, setting the corresponding relation between model output parameters and the advertisements, namely the model output results are advertisement ID numbers, and determining corresponding advertisement videos by contrasting the advertisement ID list;

according to the matching result, inserting the advertisement in the video, including storing the advertisement video and the corresponding sub-audio, determining the position of the advertisement in the video according to the sub-audio, and inserting the advertisement in the position;

wherein selecting a target sub-audio from the sub-audios comprises: clustering the sub-audios to obtain a plurality of clusters, and selecting the sub-audio closest to the cluster center as a target sub-audio, wherein the method comprises the following steps: for an audio set X = { X1, X2, …, xn }, n is the number of sub-audio, the audio set is divided into K clusters V = { V1, V2, …, vk }, K objects are randomly selected as initial cluster centers, then the distance between each object and each seed cluster center is calculated, and each object is allocated to the cluster center closest to the object; the cluster centers and the objects allocated to the cluster centers represent a cluster, once all the objects are allocated, the cluster center of each cluster is recalculated according to the existing object in the cluster, the process is repeated continuously until the cluster centers do not change, the algorithm is terminated, k audio classes with larger differences are obtained by clustering, one or more sub-audios closest to the cluster centers in each audio class are used as target sub-audios, and if the cluster centers are the sub-audios, the sub-audios of the cluster centers are the target sub-audios.

2. The method of claim 1, further comprising pre-training the matching deep neural network, comprising:

minimizing a loss function by adopting a random gradient descent method;

3. The method of claim 2, wherein minimizing the loss function using a stochastic gradient descent method comprises:

4. An intelligent advertisement insertion device, the device comprising:

the query module is used for querying the advertisements matched with the audio information in a preset advertisement library, and comprises the following steps: a division submodule for dividing the audio information into sub-audios; the audio set sub-module is used for selecting target sub-audio from the sub-audio to form an audio set; the network model submodule is used for inputting the audio set into a matched deep neural network model so as to search matched advertisements in an advertisement library; the second obtaining submodule is used for obtaining advertisements corresponding to the matching result according to the matching result output by the model, presetting an advertisement ID list when the model is constructed, setting the corresponding relation between the model output parameter and the advertisements, namely the model output result is an advertisement ID number, and determining the corresponding advertisement video by contrasting the advertisement ID list;

the inserting module is used for inserting the advertisement in the video according to the matching result, and comprises a storage module, an inserting module and a playing module, wherein the storage module stores the advertisement video and the corresponding sub audio, determines the position of the advertisement video in the video according to the sub audio, and inserts the advertisement in the position;

wherein the audio set submodule selecting the target sub-audio from the sub-audio comprises: clustering the sub-audios to obtain a plurality of clusters, and selecting the sub-audio closest to the cluster center as a target sub-audio, wherein the method comprises the following steps: for an audio set X = { X1, X2, …, xn }, n is the number of sub-audio, the audio set is divided into K clusters V = { V1, V2, …, vk }, K objects are randomly selected as initial cluster centers, then the distance between each object and each seed cluster center is calculated, and each object is allocated to the cluster center closest to the object; the cluster centers and the objects allocated to the cluster centers represent a cluster, once all the objects are allocated, the cluster center of each cluster is recalculated according to the existing object in the cluster, the process is repeated continuously until the cluster centers do not change, the algorithm is terminated, k audio classes with larger differences are obtained by clustering, one or more sub-audios closest to the cluster centers in each audio class are used as target sub-audios, and if the cluster centers are the sub-audios, the sub-audios of the cluster centers are the target sub-audios.

5. The apparatus of claim 4, further comprising:

the pre-training module is used for pre-training the matched deep neural network;

the pre-training module comprises:

6. The apparatus of claim 5, wherein the minimization submodule comprises:

7. A smart ad spot server comprising one or more smart ad spot apparatuses according to any one of claims 4-6.