WO2012167568A1 - 视频广告播放方法、设备和系统 - Google Patents

视频广告播放方法、设备和系统 Download PDF

Info

Publication number
WO2012167568A1
WO2012167568A1 PCT/CN2011/082747 CN2011082747W WO2012167568A1 WO 2012167568 A1 WO2012167568 A1 WO 2012167568A1 CN 2011082747 W CN2011082747 W CN 2011082747W WO 2012167568 A1 WO2012167568 A1 WO 2012167568A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
video
audio
file
result vector
Prior art date
Application number
PCT/CN2011/082747
Other languages
English (en)
French (fr)
Inventor
王玮
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201180002916.5A priority Critical patent/CN103503463B/zh
Priority to EP11867149.4A priority patent/EP2785058A4/en
Priority to PCT/CN2011/082747 priority patent/WO2012167568A1/zh
Publication of WO2012167568A1 publication Critical patent/WO2012167568A1/zh
Priority to US14/285,192 priority patent/US20140257995A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6582Data stored in the client, e.g. viewing habits, hardware capabilities, credit card number
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data

Definitions

  • the present invention relates to the field of information technology, and in particular, to a video advertisement playing method, device and system.
  • BACKGROUND OF THE INVENTION In recent years, online advertising has developed rapidly, and online advertising has become an important means of publicity for businesses. However, current netizens have mastered more network resources and are more sensitive and alert to advertising information. Therefore, it is necessary to improve the adaptability of the content of the advertisement and the target video file, so that the content of the advertisement is compared with the scene of the current video. Adapt to make the ads achieve better results.
  • One way is to manually determine the video content, add tags to the video, and play the ads that match the video based on the tags during video playback.
  • this method consumes a lot of labor costs, and it is impossible to know the progress and content of the video, and it is impossible to deliver the appropriate advertisement according to the scene currently being played.
  • Embodiments of the present invention provide a video advertisement playing method, device, and system, so as to enable a client to deliver a suitable advertisement according to a scenario currently being played.
  • an embodiment of the present invention provides a video advertisement playing method, including:
  • the matching advertisement file is sent to the client.
  • the embodiment of the invention further provides another video advertisement playing method, including:
  • the embodiment of the present invention further provides a server, including:
  • a receiver configured to receive at least one of image feature data, subtitle text, and audio text of a video file sent by the client, where the image feature data, the caption text, and the audio text of the video file are respectively played by the client according to the current play
  • the video picture, video subtitle and audio content analysis of the video file are obtained;
  • a processor configured to obtain, according to at least one of image feature data, subtitle text, and audio text of the video file, a feature fusion result vector of the video file; and a feature fusion result vector of each advertisement file to be delivered Performing a similarity matching calculation on the feature fusion result vector of the video file, and determining one or more advertisement files with the highest similarity as the matching advertisement file; the sender, configured to send the matched advertisement file to the client .
  • the embodiment of the invention further provides a client, including:
  • a processor configured to acquire image feature data of the video image according to a video picture and/or video subtitle and/or audio content of a currently played video file, the subtitle text of the video subtitle and the audio of the audio content At least one of the texts;
  • a transmitter configured to send at least one of image feature data, the caption text, and the audio text of the video file to a server, so that the server generates, according to image feature data of the video file, the caption At least one of the text and the audio text determines a matching advertisement file;
  • an embodiment of the present invention further provides a video advertisement playing system, including a client and a server;
  • the client is configured to: obtain image feature data of the video image according to a video image and/or video caption and/or audio content of a currently played video file, and the caption text and the audio content of the video caption At least one of the audio texts of the video file, at least one of the image feature data, the caption text, and the audio text is sent to a server, so that the server is based on image feature data of the video file, At least one of the subtitle text and the audio text determines a matching advertisement file; playing a matching advertisement file sent by the server; the server is configured to: receive image feature data and subtitle text of a video file sent by the client And at least one of the audio text, the image feature data, the subtitle text, and the audio text of the video file are respectively obtained by the client according to the video image, the video subtitle, and the audio content of the currently played video file; Image feature data, caption text of the video file At least one of the audio texts is obtained as a feature fusion result vector of the video file; the feature fusion result vector of each advertisement file to be served is
  • the client obtains at least one of image feature data, subtitle text and audio text according to the currently played video image analysis, and sends the server to the server according to the feature data provided by the client.
  • FIG. 1 is a flowchart of an embodiment of a video advertisement playing method provided by the present invention.
  • FIG. 2 is a flowchart of still another embodiment of a video advertisement playing method according to the present invention
  • FIG. 3 is a flowchart of another embodiment of a video advertisement playing method provided by the present invention
  • FIG. 5 is a schematic structural diagram of an embodiment of a server provided by the present invention
  • FIG. 6 is a schematic structural diagram of still another embodiment of a server provided by the present invention.
  • FIG. 7 is a schematic structural diagram of an embodiment of a client provided by the present invention.
  • FIG. 8 is a schematic structural diagram of an embodiment of a video advertisement playing system provided by the present invention.
  • the technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention.
  • the embodiments are a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
  • FIG. 1 is a flowchart of an embodiment of a video advertisement playing method according to the present invention. As shown in FIG. 1, the method includes:
  • S102 Send at least one of image feature data, subtitle text, and audio text of the video file to the server, so that the server determines the matched advertisement file according to at least one of image feature data, subtitle text, and audio text of the video file.
  • the execution body of the above steps is a client, and specifically may be various video players on a terminal device such as a personal computer or a mobile phone.
  • the client can obtain the current play picture of the specified location according to the video content of the currently played video file, and extract the image feature data of the currently played video picture.
  • the client can use various existing image feature data extraction algorithms, for example: Scale-invariant feature transform (SIFT) algorithm, etc.
  • SIFT Scale-invariant feature transform
  • the image feature data extracted by the client may include:
  • the color characteristics of the video picture usually can be represented by color cumulative histogram data, which is used to describe the statistical distribution characteristics of the image color, and has translation, scale, and rotation invariance; texture features of the video picture: usually Using grayscale co-occurrence matrix data to represent, various statistics of grayscale co-occurrence matrix data can be used as a measure of texture features.
  • the gray level co-occurrence matrix is used to represent the joint probability distribution of two gray pixels at different distances ( ⁇ X, ⁇ y) in the image. If the gray level of the image is L level, then the co-occurrence matrix is ⁇ _ ⁇ L matrix;
  • the shape characteristics of the video picture Represented by the outline features of the image, it can also be represented by the regional features of the image.
  • the contour feature of the image is mainly for the outer boundary of the object, and the region feature of the image is for the entire shape region, and the shape parameter of the image is obtained by describing the boundary feature.
  • the client can also use existing speech recognition technology to convert the vocabulary content of the voice of a video file into a computer-readable input, such as: a button, a binary code, or a sequence of characters.
  • the client may also extract subtitles according to the currently played video file to obtain subtitle text. Therefore, the feature data sent by the client to the server also includes the caption text.
  • the client can extract the subtitle text by using various video text extraction methods in the prior art.
  • the process of extracting subtitle text may include: the client may cut the video clip into a video image, and process the video image; and then determine whether the video image includes text information, and the position of the text information in the video image, and the text The area is cut out; the client can use the redundant feature of the text information in time to find a plurality of consecutive frames containing the same text, and use the multi-frame fusion method to enhance the text area; however, the extracted text area is further performed.
  • Grayscale and binarization the obtained white text on a white background or a white text on a black background is recognized to obtain a subtitle text.
  • the recognition of the text image can be realized by using existing optical character recognition (OCR) technology.
  • the client analyzes the image feature data of the video picture, the subtitle text of the video subtitle, and the audio text of the audio content.
  • the client may also adopt other methods.
  • the currently played video picture is analyzed to obtain at least one of image feature data of the video picture, subtitle text of the video subtitle, and audio text of the audio content.
  • the client may send at least one of the image feature data, the audio text and the caption text obtained by the analysis to the server, and correspondingly, the server may store at least one of the received image feature data, the audio text and the caption text, and the local storage.
  • the various ad files match to determine the ad file that matches the video screen currently being played by the client. After the server determines the matching advertisement file, the matching advertisement file or the advertisement link can be sent to the client for the client to play.
  • the client sends at least one of image feature data, subtitle text and audio text to the server according to the currently played video frame analysis, and the server obtains the characteristics of the video file according to the feature data provided by the client.
  • the result vector is merged, and the feature fusion result vector of each advertisement file to be served is used to perform similarity matching calculation to determine the matching advertisement file, and then the matched advertisement is sent to the client for playing, so that the advertisement played by the client is more suitable for the client.
  • FIG. 2 is a flowchart of still another embodiment of a video advertisement playing method according to the present invention. As shown in FIG. 2, the method includes:
  • the execution body of the above steps is the server.
  • the client can obtain the current play picture of the specified position according to the currently played video content, and extract the image feature data of the currently played video picture, which may specifically include: color cumulative histogram data used to represent the color feature of the video picture image, Gray level co-occurrence matrix data representing texture features of video picture images, gray level direction matrix data for representing image shape features of video pictures, and the like.
  • Clients can also leverage existing speech recognition technology to convert vocabulary content in human speech into computer-readable input such as buttons, binary codes, or sequences of characters.
  • the client may also extract subtitles according to the currently played video file to obtain subtitle text.
  • the video file sent by the client to the server The feature data also includes: subtitle text of the video file.
  • the server may collect a plurality of pictures or video pictures in advance, and these pictures may be some important pictures in the video or specify a video picture that needs to be inserted into the advertisement, and the server may perform image feature extraction on the picture or video picture to obtain image feature data.
  • the image feature data may include: color cumulative histogram data for representing a color feature of the video screen image, gray level co-occurrence matrix data for representing a texture feature of the video image image, and grayscale gradient direction for representing a shape feature of the video image image Matrix data, etc.
  • the server can mark the selected images and mark the content or category of the images.
  • the server can establish the relationship between the image feature data and the annotation, and adopts a machine learning algorithm, for example: Support Vector Machine (SVM) algorithm, and trains the selected feature data to obtain an image feature data classification model.
  • SVM Support Vector Machine
  • the essence of the machine learning algorithm is: The machine can learn some of the "experience” by learning the image feature data and annotation of the training picture, so that the new data can be classified. And the machine gets the "experience," It is the image feature data classification model.
  • the server may also select a plurality of subtitle files and audio files in advance, and perform training on the feature data and annotations of the subtitle files and the audio files by using a machine learning algorithm, for example, an SVM algorithm, thereby respectively obtaining a subtitle text classification model and audio. Text classification model.
  • a machine learning algorithm for example, an SVM algorithm
  • the server may classify the image feature data into the image feature data classification model to obtain a graphic feature data result vector, where the vector includes Multiple dimensions, each of which can represent a category, such as: sports, finance, entertainment, etc.
  • Each dimension of the vector indicates the possibility that the input image feature data belongs to the corresponding category, and the larger the value of the corresponding dimension of a certain category, the greater the possibility that the input image feature belongs to the category. That is, the process in which the server inputs the input image feature data into the image feature data classification model and outputs the image feature data result vector is actually a process of classifying the image feature data.
  • the server can input the subtitle text into the subtitle text classification model to obtain the subtitle text classification result vector; the server can also input the audio text into the audio text classification model to obtain an audio text classification result vector.
  • the server may further classify the graphic feature data classification result vector, the subtitle text classification result vector, and the audio text classification result vector.
  • Performing at least one weighted fusion calculation that is, a category of the image feature data represented by the image feature data result vector, and/or a category of the caption text represented by the caption text classification result vector, and/or a category of the audio text classification result vector.
  • the weighted fusion is performed to obtain a feature fusion result vector of the video file, and the feature fusion result vector represents a category to which the video content currently played by the client belongs.
  • the process of performing weighted fusion by the server may use various weighted fusion algorithms provided by the prior art.
  • the server may also obtain image feature data and/or audio text corresponding to various advertisement files that need to be delivered in advance, and for the advertisement file with subtitles, the server may further obtain the subtitle text of each advertisement file that needs to be served, and Image feature data and/or audio text and/or subtitle text corresponding to each advertisement file are respectively input into an image feature data classification model, an audio text classification model, and a subtitle text classification model, to obtain image features corresponding to each advertisement file.
  • a data result vector, an audio text classification result vector, and a subtitle text classification result vector and then performing fusion calculation on the image feature data result vector of the advertisement file, and/or the audio text classification result vector and/or the included subtitle text classification result vector,
  • the feature of the ad file fuses the result vector.
  • the server may further combine the feature fusion result vector corresponding to the video file with various advertisement files that need to be served.
  • the corresponding feature fusion result vector performs similarity matching calculation, and determines one or more advertisement files that most closely match the currently played video content of the client according to the similarity level.
  • the process of performing similarity matching by the server may adopt various similarity matching algorithms provided by the prior art.
  • the server determines the matching ad file
  • the matching ad file or the ad link can be sent to the client for the client to play.
  • the client sends at least one of image feature data, subtitle text and audio text to the server according to the currently played video frame analysis, and the server obtains the characteristics of the video file according to the feature data provided by the client.
  • the result vector is merged, and the feature fusion result vector of each advertisement file to be served is used to perform similarity matching calculation to determine the matching advertisement file, and then the matched advertisement is sent to the client for playing, so that the advertisement played by the client is more suitable for the client.
  • FIG. 3 is a flowchart of another embodiment of a video advertisement playing method according to the present invention.
  • the embodiment provides that the feature data of a video file provided by a client to a server includes: image feature data, subtitle text, and At least one of the audio texts, the server determining, according to at least one of the image feature data, the caption text, and the audio text, a specific embodiment of the matched advertisement, the method comprising:
  • S301a The server performs image feature extraction on the collected training video image, obtains image feature data of the training video image, performs text annotation on the training video image, obtains annotation data of the training video image, and performs image feature data and annotation data on the training video image.
  • Support vector machine SVM training is performed to obtain an image feature data classification model.
  • the server can collect a number of pictures, which can be some important pictures in the video or specify a video picture that requires an interstitial advertisement, which is referred to herein as a training video picture.
  • the server performs image feature extraction on the training video image to obtain image feature data of the training video image.
  • the image feature data may include: color cumulative histogram data used to represent the color feature of the video image image, and used to represent the texture feature of the video image image.
  • Gray level co-occurrence matrix data gray scale gradient direction matrix data for representing a shape feature of a video picture image, and the like.
  • the server may further perform text labeling on the training video screen, that is, classify the training video screen according to the category to which it belongs, for example, it may be classified into sports, finance, entertainment, etc., thereby obtaining label data of the training video screen.
  • the server can use the image feature data and the annotation data of the training video picture as input of the SVM classification algorithm, and perform support vector machine SVM training on the image feature data and the annotation data to obtain an image feature data classification model. That is, the machine can learn some of the "experience” by learning the image feature data and the annotation data of the training picture, thereby being able to classify the new data.
  • the "experience" obtained by the machine through learning is the image feature data classification model.
  • the server extracts the subtitles of the collected training videos, obtains the subtitle text of the training video, performs text labeling on the training video, obtains the annotation data of the training video, performs SVM training on the subtitle text and the annotation data of the training video, and obtains the subtitle text.
  • Classification model The server extracts the subtitles of the collected training videos, obtains the subtitle text of the training video, performs text labeling on the training video, obtains the annotation data of the training video, performs SVM training on the subtitle text and the annotation data of the training video, and obtains the subtitle text.
  • the server can collect training videos containing subtitles, and perform subtitle extraction on these training videos to obtain subtitle texts of the training videos. Moreover, the server can perform text labeling on the training video, obtain the annotation data of the training video, and then use the subtitle text and the annotation data of the training video as input of the SVM classification algorithm, and perform S VM training on the subtitle text and the annotation data of the training video. Subtitle text classification model.
  • S301c The server extracts the collected training audio content, obtains the audio text of the training audio, performs text labeling on the training audio, obtains the annotation data of the training audio, performs SVM training on the video text and the annotation data of the training audio, and obtains the audio text. Classification model.
  • the server can also collect training videos containing audio, and perform audio extraction on the training audio content to obtain audio text of the training audio content.
  • the server also needs to perform text annotation on the training video picture audio to obtain the text annotation of the training video picture audio.
  • the audio text and the annotation data of the training audio content are used as input of the SVM classification algorithm, and the video audio text and the annotation data of the training audio content are trained. Perform SVM training to obtain an audio text classification model.
  • S301a-S301c is a process in which the server obtains an image feature data classification model, a script text classification model, and an audio text classification model through SVM training.
  • the order between the above steps is in no particular order.
  • the server receives image feature data, subtitle text, and audio text of the video file sent by the client.
  • the server inputs the image feature data of the video file into a preset image feature data classification model to obtain a graphic feature data classification result vector of the video file; and/or, the server inputs the caption text of the video file into the preset caption text.
  • the classification model is classified to obtain a subtitle text classification result vector of the video file; and/or, the server classifies the audio text of the video file into a preset audio text classification model, and obtains an audio text classification result vector of the video file; wherein, the image Feature data classification model, subtitle text classification model and audio text classification model Have the same classification dimension.
  • the image feature data classification model, the subtitle text classification model, and the audio text classification model pre-established by the server are empirical models for classifying image feature data, subtitle text, and audio text, respectively.
  • classification models and subtitles are extracted from the image feature data.
  • the graphic feature data classification result vector, the subtitle text classification result vector, and the audio text classification result vector of the video file output by the text classification model and the audio text classification model respectively reflect the image feature data of the video file, the subtitle text, and the category to which the audio text belongs.
  • the image feature data classification model, the subtitle text classification model, and the audio text classification model have the same categories and dimensions, the image feature data classification result vector and subtitle text of the video file are
  • the default values of the classification result vector and the audio text classification result vector can be taken as follows: , which includes ⁇ 1/ ⁇ , ⁇ represents the number of dimensions of the classification category.
  • the client can send one or more of the image feature data, subtitle text or audio text of the video file to the server, for example: a certain video has no audio, the client can send the image feature data to the server and Subtitle text; In this scenario, the server can take the audio text classification result vector as the default value. Other situations are not - enumeration.
  • the server may further acquire each advertisement file according to each of the advertisement file video images, video subtitles and audio content to be served.
  • Image feature data, subtitle text and audio text and respectively input image feature data, subtitle text and audio text of each advertisement file into an image feature data classification model, a subtitle text classification model and an audio text classification model, respectively, to obtain respective advertisements.
  • the image feature data classification result vector, the subtitle text classification result vector or the audio text classification result vector, that is, the server also needs to perform operations of S303M and S303b2 for subsequent matching operations.
  • S303b1 The server acquires at least one of image feature data, subtitle text, and audio text of each advertisement file according to a video screen and/or video subtitle and/or audio content of each advertisement file to be served.
  • S303b2 The server classifies image feature data of each advertisement file into an image feature data classification model, and obtains an image feature data classification result vector of each advertisement file; and/or Subtitle text of the document is input into a subtitle text classification model for classification, and a subtitle text classification result vector of each advertisement file is obtained; and/or, audio text of each advertisement file is input into an audio text classification model to obtain an audio text of each advertisement file.
  • Classification result vector The server classifies image feature data of each advertisement file into an image feature data classification model, and obtains an image feature data classification result vector of each advertisement file; and/or Subtitle text of the document is input into a subtitle text classification model for classification, and a subtitle text classification result vector of each advertisement file is obtained; and/or, audio text of each advertisement file is input into an audio text classification model to obtain an audio text of each advertisement file.
  • the image feature data classification result vector of the advertisement file, the subtitle text classification result vector of the advertisement file, and the audio text classification result vector of the advertisement file have the same classification dimension.
  • S303b1 and S303b2 may be performed before the server receives at least one of image feature data, subtitle text, and audio text of the video file sent by the client, or may receive image feature data, subtitle text, and At least one of the audio texts is followed.
  • S304a The server performs weighted fusion calculation on at least one of the graphic feature data classification result vector of the video file, the subtitle text classification result vector, and the audio text classification result vector, to obtain a feature fusion result vector of the video file.
  • This embodiment provides a method for weighted fusion calculation. Assuming that the classification dimension is n-dimensional, the image feature data classification result vector of the video file obtained by the image feature data classification model of the video file is:
  • the subtitle text classification result vector of the video file obtained by the subtitle text classification model is:
  • ( ⁇ 1? ⁇ 2 ,... ⁇ ⁇ ).
  • the score value of the subtitle text classification result vector obtained after the classification model is in the dimension i.
  • the audio text classification result vector of the video file obtained by the audio text classification model is:
  • the server can use the following formula to perform weighted fusion on the graphic feature data classification result vector of the video file, the subtitle text classification result vector, and the audio text classification result vector:
  • the feature fusion result vector is: the sum of the image feature data result vector of the video file, the subtitle text result vector, and the audio text result vector.
  • denotes the feature fusion result vector
  • a , ⁇ , ⁇ are the weight parameters given by the image feature data result vector, the caption text result vector and the audio text result vector, respectively.
  • the formula for calculating the value of ", ⁇ , ⁇ is: This formula represents the cosine of the angle between the vector and the unit vector.
  • This formula represents the cosine of the angle between the calculated vector ⁇ and the unit vector.
  • the formula indicates that the value of ⁇ is equal to: The reciprocal of the cosine of the angle between the vector ⁇ and the unit vector divided by the sum of the three vectors t7 , V , and the reciprocal of the cosine of the angle of the unit vector.
  • the video file played by the client may have a corresponding multiple label stored on the server side, and each label is used to mark the content of a certain segment or picture of the video file. Therefore, if the server side has a video file corresponding to the video file, After multiple labels, the server may further correct the feature fusion result vector by using the label corresponding to the video file after obtaining the video file feature fusion result vector.
  • the details are as follows:
  • the server may pre-generate the score vector of the label, and specifically may map multiple labels to the classification dimensions of each classification model, and then separately count the number of labels corresponding to each classification dimension to obtain a vector, and normalize the vector as a video.
  • the server can correct the feature fusion result vector of the video file according to the label score vector of the video file, and can be implemented by the following formula:
  • ⁇ + ⁇
  • f denotes the corrected final classification result vector
  • A denotes the feature fusion result vector of the video file
  • denotes the label score vector
  • denotes the feature fusion result vector of the video file respectively , the weight parameter assigned by the tag score vector.
  • T represents the weighted sum of the feature fusion result vector of the video file and the score vector of the tag.
  • the weight parameter, the value of ⁇ is calculated as:
  • This formula represents the angle cosine of the calculated vector ⁇ and the unit vector /.
  • Cos ( , ) cos( , ) This formula indicates that the parameter value is equal to: the reciprocal of the cosine of the angle between the vector and the unit vector Divided by the sum of the two vectors A, s and the reciprocal of the cosine of the angle of the unit vector.
  • the present invention can also use other existing weighted fusion algorithms to determine the feature fusion of a video file or an advertisement file. Result vector.
  • S304b The server performs weighted fusion calculation on at least one of the graphic feature data classification result vector, the subtitle text classification result vector, and the audio text classification result vector of each advertisement file, to obtain a feature fusion result vector of each advertisement file.
  • the server may perform weighted fusion calculation on at least one of the graphic feature data classification result vector, the subtitle text classification result vector, and the audio text classification result vector of each advertisement file.
  • weighted fusion calculation For the specific process of the weighted fusion calculation, refer to S304a. , will not repeat them here.
  • S304b may be performed before the server receives at least one of image feature data, subtitle text, and audio text of the video file sent by the client, or may receive image feature data, subtitle text, and audio text. At least one of them is followed.
  • the server performs a similarity matching calculation on the feature fusion result vector of each advertisement file and the feature fusion result vector of the video file, and determines one or more advertisement files with the largest similarity as the matched advertisement file.
  • This embodiment provides a method for similarity matching calculation, which is specifically as follows:
  • the formula represents calculating the angle sine of the feature fusion result vector of the advertisement file and the feature fusion result vector of the video file.
  • the sum of the square values corresponding to the two vectors is multiplied, "i x ' represents the square root of the sum of the squares of the scores of the dimensions of the vector X, and ⁇ represents the sum of the squares of the score values of the dimensions of the vector.
  • One or more advertisements with the highest degree of similarity are determined to be matching advertisements.
  • the server sends the matched advertisement file to the client.
  • the server determines the advertisement file that matches the video file
  • the matching advertisement file or the link of the advertisement file can be sent to the client for the client to play.
  • the method for playing the video advertisement provided in this embodiment can be applied to a client of a terminal such as a personal computer or a mobile phone, for example, a video player inserts an advertisement, and is particularly suitable for selecting the video content that is closest to the currently played video content when the video playback click is paused. Matching ads are played.
  • the method specifically includes:
  • the client obtains a video picture, a video subtitle, and an audio content of a currently played video file.
  • the client can directly use the video playing software to obtain a screenshot of the video played by the video, as a video screen of the currently playing video file.
  • the client can cut the video clip into frames, then process the video image, determine whether the video image contains text information, and the position of the text information in the video image, and cut the text area to form a text area. Finally, the extracted text area is grayed out and binarized to obtain a subtitle text picture with black text on white or white on black.
  • the client can also directly obtain the audio content of the currently played video file through the video player. You can also select the audio content between the start time and end time captured in the video to select the desired audio portion.
  • the client analyzes the image feature data of the video file, the subtitle text of the video subtitle, and the audio text of the audio content according to the video image, the video subtitle, and the audio content of the currently played video file.
  • the caption text of the video caption and the audio text of the audio content refer to the corresponding description of the embodiment shown in FIG. 1 , and details are not described herein again.
  • the client sends image feature data, subtitle text, and audio text of the video file to the server.
  • the server obtains a feature fusion result vector of the video file according to the image feature data of the video file, the subtitle text, and the audio text.
  • the server performs similarity matching calculation on the feature fusion result vector of each advertisement file to be served and the feature fusion result vector of the video file, and determines one or more advertisement files with the highest similarity as the matching advertisement file.
  • the server Before determining the matching advertisement file, the server needs to establish an image feature data classification model, a subtitle text classification model, and an audio text classification model.
  • the classification dimension set by the server for each classification model is 5 dimensions, for example: car, IT, real estate, food, entertainment.
  • the image feature data classification result vector of the video file is:
  • Subtitle text of the video file Subtitle text of the video file obtained by inputting the subtitle text classification model
  • the classification result vector is:
  • (0.05,0.05,0.10,0,0.80);
  • the audio text classification result vector of the video file obtained by inputting the audio text of the video file is: ((0.07, 0.08, 0.10, 0, 0.75); then the feature fusion result vector R of the video file, the calculation process can be seen
  • the feature fusion result vector of the video file obtained by the above process and the feature fusion result vector of each advertisement file may be directly matched to the similarity calculation (this embodiment).
  • the calculation process of the feature fusion result vector of each advertisement file is omitted, and the one or more advertisements with the highest similarity are the target advertisement files that most closely match the video file.
  • the labels can be mapped to the classification dimensions of each classification model, and the number of labels mapped to each classification dimension is statistically obtained to obtain a label score vector.
  • the label fusion vector s is used to correct the feature fusion result vector of the video file, and the feature fusion result vector of the final video file is obtained.
  • the feature fusion result vector of each advertisement file is used to perform similarity matching calculation, and an advertisement file matching the video file is determined.
  • the server sends the matched advertisement file to the client.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
  • FIG. 5 is a schematic structural diagram of an embodiment of a server provided by the present invention. As shown in FIG. 5, the server includes: a receiver 11, a processor 12, and a transmitter 13;
  • the receiver 11 is configured to receive at least one of image feature data, subtitle text, and audio text of a video file sent by the client, and image feature data, subtitle text, and audio text of the video file are respectively performed by the client according to the currently played video file. Video footage, video subtitles and audio content analysis are obtained;
  • the processor 12 is configured to obtain, according to at least one of image feature data, subtitle text, and audio text of the video file, a feature fusion result vector of the video file; and feature fusion result vector and video file feature of each advertisement file to be served
  • the fusion result vector is used for similarity matching calculation, and one or more advertisement files with the highest similarity are determined as matching advertisement files;
  • the sender 13 is used to send the matching advertisement file to the client.
  • FIG. 6 is a schematic structural diagram of still another embodiment of a server provided by the present invention. As shown in FIG. 6, the server includes: a receiver 11, a processor 12, a transmitter 13, and a memory 14.
  • the processor 12 may be specifically configured to: input image feature data of the video file into a preset image feature data classification model, and obtain a graphic feature data classification result vector of the video file; and/or, the video file Subtitle text is input into a preset subtitle text classification model for classification, and a subtitle text classification result vector of the video file is obtained; and/or, the audio text of the video file is input into a preset audio text classification model to be classified, and the audio of the video file is obtained.
  • the text classification result vector, the image feature data classification model, the subtitle text classification model, and the audio text classification model have the same classification dimension; the graphic feature data classification result vector of the video file, the subtitle text classification result vector, and the audio text classification result vector At least one performs a weighted fusion calculation to obtain a feature fusion result vector of the video file.
  • the processor 12 is further configured to: perform image feature extraction on the collected training video image to obtain image feature data of the training video image; perform text annotation on the training video image to obtain annotation data of the training video image; The image feature data and the annotation data of the picture are trained by the support vector machine SVM to obtain an image feature data classification model;
  • the processor 12 is further configured to: perform subtitle extraction on the collected training video to obtain subtitle text of the training video; perform text annotation on the training video to obtain annotation data of the training video; and subtitle text and annotation data of the training video.
  • the processor 12 may be further configured to: perform audio extraction on the collected training audio to obtain an audio text of the training audio; perform text annotation on the training audio to obtain annotation data of the training audio; SVM training is performed on the video text and annotation data of the training audio to obtain an audio text classification model.
  • the result vector, representing the subtitle text classification result vector, ⁇ represents the audio text classification result vector, CC, ⁇ , respectively, the weight parameter given by the image feature data result vector, the subtitle text result vector, and the audio text result vector
  • the processor 12 is further configured to: obtain at least image feature data, subtitle text, and audio text of each advertisement file according to video images and/or video subtitles and/or audio content of each advertisement file to be served
  • the image feature data of each advertisement file is input into the image feature data classification model to be classified, and the image feature data classification result vector of each advertisement file is obtained; and/or the subtitle text of each advertisement file is input into the subtitle text classification model for classification.
  • the subtitle text classification result vector of the advertisement file and the audio text classification result vector of the advertisement file have the same classification dimension; respectively, the graphic features of each advertisement file
  • At least one of the data classification result vector, the subtitle text classification result vector, and the audio text classification result vector is subjected to weighted fusion calculation to obtain a feature fusion result vector of each advertisement file.
  • the memory 14 can be used to: store a plurality of tags of the video file, the tag is used to mark a segment or picture content of the video file;
  • the processor 12 is further configured to: map multiple labels to the classification dimension respectively, separately count the number of labels corresponding to each classification dimension, and obtain a label score vector corresponding to the video file; and adopt a label score vector of the video file, The feature fusion result vector of the video file is corrected.
  • the server provided by the embodiment of the present invention corresponding to the video playing method provided by the present invention, is a functional device, and the specific process of executing the video playing method can be referred to the method.
  • the client obtains at least one of the image feature data, the subtitle text, and the audio text according to the currently played video image analysis, and the server obtains the feature fusion result of the video file according to the feature data provided by the client.
  • Vector, and the feature fusion result vector of each advertisement file to be delivered is subjected to similarity matching calculation to determine the matching advertisement file, and then the matched advertisement is sent to the client for playing, so that the advertisement played by the client is more suitable for the current client. The scene being played.
  • FIG. 7 is a schematic structural diagram of an embodiment of a client provided by the present invention. As shown in FIG. 7, the client includes: a processor 21, a transmitter 22, and a player 23;
  • the processor 21 is configured to analyze, according to the video picture and/or the video subtitle and/or the audio content of the currently played video file, the image feature data of the video picture, the subtitle text of the video subtitle, and the audio text of the audio content. ;
  • the transmitter 22 is configured to send at least one of image feature data, subtitle text, and audio text of the video file to the server, so that the server determines the matching according to at least one of image feature data, subtitle text, and audio text of the video file.
  • Advertising file
  • the player 23 is configured to play a matching advertisement file sent by the server.
  • the client provided by the embodiment of the present invention corresponds to the video playing method provided by the present invention.
  • the specific process of executing the video playing method may be referred to as a method embodiment, and details are not described herein.
  • the client provided by the embodiment of the present invention sends at least one of the image feature data, the caption text and the audio text to the server according to the currently played video image analysis, and the server obtains the feature fusion result vector of the video file according to the feature data provided by the client. And performing a similarity matching calculation with the feature fusion result vector of each advertisement file to be served to determine a matching advertisement file, The matched advertisement is sent to the client for playing, so that the advertisement played by the client is more suitable for the scene currently being played by the client.
  • FIG. 8 is a schematic structural diagram of an embodiment of a video advertisement playing system according to the present invention. As shown in FIG. 8, the system includes: a client 1 and a server 2;
  • the client 1 is configured to: analyze image feature data of the video picture, at least one of the caption text of the video caption and the audio text of the audio content according to the video picture and/or the video caption and/or the audio content of the currently played video file. Transmitting at least one of image feature data, subtitle text, and audio text of the video file to the server 2, so that the server 2 determines a matching advertisement file according to at least one of image feature data, subtitle text, and audio text of the video file; Playing the matching advertisement file sent by the server 2;
  • the server 2 is configured to: receive at least one of image feature data, subtitle text, and audio text of a video file sent by the client 1, the image feature data of the video file, the caption text, and the audio text are respectively performed by the client according to the currently played video file Obtaining video image, video subtitle and audio content analysis; obtaining a feature fusion result vector of the video file according to at least one of image feature data, subtitle text and audio text of the video file; synthesizing the feature of each advertisement file to be delivered
  • the vector and the feature fusion result vector of the video file are used for similarity matching calculation, and one or more advertisement files with the highest similarity are determined as matching advertisement files; and the matched advertisement file is sent to the client 1.
  • the video advertisement playing system provided by the embodiment of the present invention is corresponding to the video playing method provided by the present invention.
  • the specific process of executing the video playing method may be referred to the method embodiment, and details are not described herein.
  • the video advertisement playing system provided by the embodiment of the present invention sends at least one of image feature data, subtitle text and audio text to the server according to the currently played video image analysis, and the server obtains the feature fusion of the video file according to the feature data provided by the client. Result vector, and similarity matching calculation is performed with the feature fusion result vector of each advertisement file to be determined to determine the matching advertisement file, and then the matched advertisement is sent to the client for playing, so that the advertisement played by the client is more suitable for the client.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明实施例提供一种视频广告播放方法、设备和系统。一种方法包括:接收客户端发送的视频文件的图像特征数据、字幕文本和音频文本中的至少一个,视频文件的图像特征数据、字幕文本和音频文本由客户端分别根据当前播放的视频文件的视频画面、视频字幕和音频内容分析获取;根据视频文件的图像特征数据、字幕文本和音频文本中的至少一个,得到视频文件的特征融合结果向量;将待投放的各个广告文件的特征融合结果向量与视频文件的特征融合结果向量进行相似度匹配计算,将相似度最大的一个或多个广告文件确定为匹配的广告文件;将匹配的广告文件发送给客户端。本发明实施例,实现客户端播放的广告更适合客户端当前正在播放的场景。

Description

视频广告播放方法、 设备和系统 技术领域 本发明涉及信息技术领域, 特别涉及一种视频广告播放方法、 设备和系 统。 背景技术 近几年来, 网络广告快速发展, 网络广告已成为商家重要的宣传方式。 然而, 现在的网民掌握了更多的网络资源, 对广告信息更加敏感和警惕, 因 此, 有必要提高广告投放的内容与目标视频文件的适应性, 使广告投放的内 容与当前视频播放的场景相适应, 使广告达到更为良好的投放效果。
一种方式是通过人工方式确定视频内容, 并为视频添加标签, 在视频播 放时根据标签查找与视频相匹配的广告进行播放。 然而, 这种方法耗费大量 人力成本, 并且无法获知视频的播放进度和内容, 无法根据当前正在播放的 场景投放合适的广告。
另一种方式服务器预先为要在客户端播放的视频文件设置广告索引, 并 且将广告索引发送给客户端, 当客户端播放视频文件时, 客户端根据广告索 引中预先编的播放次序选中要播放的广告向服务器请求播放。 然而, 这种方 法一旦广告索引文件编排确定之后, 修改比较困难, 并且服务器无法获知视 频的播放进度和内容, 无法根据当前正在播放的场景投放合适的广告。 发明内容 本发明实施例提供了一种视频广告播放方法、 设备和系统, 以实现客户 端根据当前正在播放的场景投放合适的广告。
一方面, 本发明实施例提供一种视频广告播放方法, 包括:
接收客户端发送的视频文件的图像特征数据、 字幕文本和音频文本中的 至少一个, 所述视频文件的图像特征数据、 字幕文本和音频文本由所述客户 端分别根据当前播放的所述视频文件的视频画面、 视频字幕和音频内容分析 获取;
根据所述视频文件的图像特征数据、字幕文本和音频文本中的至少一个, 得到所述视频文件的特征融合结果向量;
将待投放的各个广告文件的特征融合结果向量与所述视频文件的特征融 合结果向量进行相似度匹配计算, 将相似度最大的一个或多个广告文件确定 为匹配的广告文件;
将所述匹配的广告文件发送给所述客户端。
本发明实施例还提供另一种视频广告播放方法, 包括:
根据当前播放的视频文件的视频画面和 /或视频字幕和 /或音频内容,分析 获取所述视频画面的图像特征数据, 所述视频字幕的字幕文本和所述音频内 容的音频文本中的至少一个;
将所述视频文件的图像特征数据、 所述字幕文本和所述音频文本中的至 少一个发送给服务器, 以使所述服务器根据所述视频文件的图像特征数据、 所述字幕文本和所述音频文本中的至少一个确定匹配的广告文件;
播放所述服务器发送的匹配的广告文件。
另一方面, 本发明实施例还提供一种服务器, 包括:
接收器, 用于接收客户端发送的视频文件的图像特征数据、 字幕文本和 音频文本中的至少一个, 所述视频文件的图像特征数据、 字幕文本和音频文 本由所述客户端分别根据当前播放的所述视频文件的视频画面、 视频字幕和 音频内容分析获取;
处理器, 用于根据所述视频文件的图像特征数据、 字幕文本和音频文本 中的至少一个, 得到所述视频文件的特征融合结果向量; 将待投放的各个广 告文件的特征融合结果向量与所述视频文件的特征融合结果向量进行相似度 匹配计算, 将相似度最大的一个或多个广告文件确定为匹配的广告文件; 发送器, 用于将所述匹配的广告文件发送给所述客户端。
本发明实施例还提供一种客户端, 包括:
处理器,用于根据当前播放的视频文件的视频画面和 /或视频字幕和 /或音 频内容, 分析获取所述视频画面的图像特征数据, 所述视频字幕的字幕文本 和所述音频内容的音频文本中的至少一个;
发送器, 用于将所述视频文件的图像特征数据、 所述字幕文本和所述音 频文本中的至少一个发送给服务器, 以使所述服务器根据所述视频文件的图 像特征数据、 所述字幕文本和所述音频文本中的至少一个确定匹配的广告文 件;
播放器, 用于播放所述服务器发送的匹配的广告文件。 再一方面, 本发明实施例还提供一种视频广告播放系统, 包括客户端和 服务器;
所述客户端用于: 根据当前播放的视频文件的视频画面和 /或视频字幕和 /或音频内容, 分析获取所述视频画面的图像特征数据, 所述视频字幕的字幕 文本和所述音频内容的音频文本中的至少一个; 将所述视频文件的图像特征 数据、 所述字幕文本和所述音频文本中的至少一个发送给服务器, 以使所述 服务器根据所述视频文件的图像特征数据、 所述字幕文本和所述音频文本中 的至少一个确定匹配的广告文件; 播放所述服务器发送的匹配的广告文件; 所述服务器用于: 接收客户端发送的视频文件的图像特征数据、 字幕文 本和音频文本中的至少一个, 所述视频文件的图像特征数据、 字幕文本和音 频文本由所述客户端分别根据当前播放的所述视频文件的视频画面、 视频字 幕和音频内容分析获取; 根据所述视频文件的图像特征数据、 字幕文本和音 频文本中的至少一个, 得到所述视频文件的特征融合结果向量; 将待投放的 各个广告文件的特征融合结果向量与所述视频文件的特征融合结果向量进行 相似度匹配计算, 将相似度最大的一个或多个广告文件确定为匹配的广告文 件; 将所述匹配的广告文件发送给所述客户端。
本发明实施例提供的视频广告播放方法、 设备和系统, 客户端根据当前 播放的视频画面分析获取图像特征数据、 字幕文本和音频文本的至少一个发 送给服务器, 服务器根据客户端提供的这些特征数据得到视频文件的特征融 合结果向量, 并与待投放的各个广告文件的特征融合结果向量进行相似度匹 配计算确定相匹配的广告文件, 再将匹配的广告发送给客户端播放, 从而使 客户端播放的广告更适合客户端当前正在播放的场景。 附图说明 为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作一简单地介绍, 显而易见地, 下 面描述中的附图是本发明的一些实施例, 对于本领域普通技术人员来讲, 在 不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明提供的视频广告播放方法一个实施例的流程图;
图 2为本发明提供的视频广告播放方法又一个实施例的流程图; 图 3为本发明提供的视频广告播放方法另一个实施例的流程图; 图 4为本发明提供的视频广告播放方法再一个实施例的流程图; 图 5为本发明提供的服务器一个实施例的结构示意图;
图 6为本发明提供的服务器又一个实施例的结构示意图;
图 7为本发明提供的客户端一个实施例的结构示意图;
图 8为本发明提供的视频广告播放系统一个实施例的结构示意图。 具体实施方式 为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本发 明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描述, 显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于 本发明中的实施例, 本领域普通技术人员在没有做出创造性劳动前提下所获 得的所有其他实施例, 都属于本发明保护的范围。
图 1为本发明提供的视频广告播放方法一个实施例的流程图, 如图 1所 示, 该方法包括:
S101、 根据当前播放的视频文件的视频画面和 /或视频字幕和 /或音频内 容, 分析获取视频画面的图像特征数据, 视频字幕的字幕文本和音频内容的 音频文本中的至少一个。
S102、 将视频文件的图像特征数据、 字幕文本和音频文本中的至少一个 发送给服务器, 以使服务器根据视频文件的图像特征数据、 字幕文本和音频 文本中的至少一个确定匹配的广告文件。
S103、 播放服务器发送的匹配的广告文件。
以上步骤的执行主体为客户端, 具体可以是个人计算机、 手机等终端设 备上的各种视频播放器。
客户端可以根据当前播放的视频文件的视频内容, 获得指定位置的当前播 放画面, 提取当前播放的视频画面的图像特征数据, 客户端可以采用现有的各 种图像特征数据提取算法, 例如: 尺度不变特征转换(Scale-invariant feature transform; SIFT )算法等。 其中, 客户端提取的图像特征数据可以包括:
视频画面的颜色特征: 通常可以采用颜色累积直方图数据来表示, 颜色 累积直方图数据用于描述图像颜色的统计分布特征, 并且具有平移, 尺度, 旋转不变性; 视频画面的纹理特征: 通常可以采用灰度共生矩阵数据来表示, 可以用灰度共生矩阵数据的各种统计量作为纹理特征的度量。 灰度共生矩阵 来表示图像中相距(△ X,△ y)的两个灰度像素同时出现的联合概率分布。 若图 像的灰度级为 L级, 那么共生矩阵为 Ι_ χ L矩阵; 视频画面的形状特征: 可以 用图像的轮廓特征表示, 还可以用图像的区域特征表示。 图像的轮廓特征主 要针对物体的外边界, 而图像的区域特征则针对整个形状区域, 通过对边界 特征的描述来获取图像的形状参数。
客户端还可以利用现有的语音识别技术, 将视频文件的语音中的词汇内 容转换为计算机可读的输入, 例如: 按键、 二进制编码或者字符序列。
如果客户端当前播放的视频上具有字幕, 则进一步的, 客户端还可以根 据当前播放的视频文件, 提取字幕得到字幕文本。 因此, 客户端发送给服务 器的特征数据还包括字幕文本。
客户端可以采用现有技术中的各种视频文字提取方法提取字幕文本。 提 取字幕文本的过程可以包括: 客户端可以将视频片段切割成视频图像, 针对 视频图像进行处理; 然后再判断视频图像中是否包含有文字信息, 以及文字 信息在视频图像中的位置, 并将文字区域切割出来; 客户端可以利用文字信 息在时间上的冗余特性, 找到包含有同一文本的多个连续帧, 利用多帧融合 等方法使得文字区域得到增强; 然而再将提取出来的文字区域进行灰度化和 二值化, 对得到的白底黑字或黑底白字的文字图片进行识别, 得到字幕文本。 其中, 对文字图片进行识别可以采用现有的光学字符识别(Optical Character Recognition, OCR)等技术实现。
需要说明的是, 以上仅是给出了客户端分析获取视频画面的图像特征数 据, 视频字幕的字幕文本和音频内容的音频文本的一种实施方式, 而实际上, 客户端还可以采用其他方式对当前播放的视频画面进行分析, 获取视频画面 的图像特征数据,视频字幕的字幕文本和音频内容的音频文本中的至少一个。
客户端可以将分析获取的图像特征数据、 音频文本和字幕文本中的至少 一个发送给服务器, 相应的, 服务器可以将接收到的图像特征数据、 音频文 本和字幕文本中的至少一个, 与本地存储的各种广告文件相匹配, 确定与客 户端当前播放的视频画面向匹配的广告文件。 服务器确定了匹配的广告文件 之后, 可以将匹配的广告文件或者是广告链接发送到客户端, 供客户端播放。
本实施例提供的视频广告播放方法, 客户端根据当前播放的视频画面分 析获取图像特征数据、 字幕文本和音频文本的至少一个发送给服务器, 服务 器根据客户端提供的这些特征数据得到视频文件的特征融合结果向量, 并与 待投放的各个广告文件的特征融合结果向量进行相似度匹配计算确定相匹配 的广告文件, 再将匹配的广告发送给客户端播放, 从而使客户端播放的广告 更适合客户端当前正在播放的场景。 图 2为本发明提供的视频广告播放方法又一个实施例的流程图, 如图 2所 示, 该方法包括:
5201、 接收客户端发送的视频文件的图像特征数据、 字幕文本和音频文 本中的至少一个, 视频文件的图像特征数据、 字幕文本和音频文本由客户端 分别根据当前播放的视频文件的视频画面、 视频字幕和音频内容分析获取。
5202、 根据视频文件的图像特征数据、 字幕文本和音频文本中的至少一 个, 得到视频文件的特征融合结果向量。
5203、 将待投放的各个广告文件的特征融合结果向量与视频文件的特征 融合结果向量进行相似度匹配计算, 将相似度最大的一个或多个广告文件确 定为匹配的广告文件。
5204、 将匹配的广告文件发送给客户端。
以上步骤的执行主体为服务器。
客户端可以根据当前播放的视频内容, 获得指定位置的当前播放画面, 提取当前播放的视频画面的图像特征数据, 具体可以包括: 用于表示视频画 面图像颜色特征的颜色累积直方图数据、 用于表示视频画面图像纹理特征的 灰度共生矩阵数据、 用于表示视频画面图像形状特征的灰度梯度方向矩阵数 据等。 客户端还可以利用现有的语音识别技术, 是将人类的语音中的词汇内 容转换为计算机可读的输入, 例如: 按键、 二进制编码或者字符序列。
在另一种实施场景下, 如果客户端当前播放的视频上具有字幕, 则客户 端还可以根据当前播放的视频文件, 提取字幕得到字幕文本, 这种场景下, 客户端发送给服务器的视频文件的特征数据中还包括:视频文件的字幕文本。
服务器可以预先收集若干图片或视频画面, 这些图片可以是视频中某些 重要画面或指定需要插播广告的视频画面, 服务器可以对这些图片或视频画 面进行图像特征提取, 得到图像特征数据。 这些图像特征数据可以包括: 用 于表示视频画面图像颜色特征的颜色累积直方图数据、 用于表示视频画面图 像纹理特征的灰度共生矩阵数据、 用于表示视频画面图像形状特征的灰度梯 度方向矩阵数据等。 服务器可以对选取的这些图片进行标注, 标注这些图片 的内容或者所属类别。 服务器可以建立图像特征数据与标注之间的关系, 采 用机器学习算法, 例如: 支持向量机(Support Vector Machine; SVM ) 算 法, 对选取的特征数据进行训练, 得到图像特征数据分类模型。 机器学习算 法的实质是: 机器可以通过对训练图片的图像特征数据和标注进行学习, 得 到一些 "经验 ",从而能够对新数据进行分类。 而机器通过学习得到的 "经验,, 即为图像特征数据分类模型。
类似的, 服务器也可以预先选择若干字幕文件和音频文件, 通过机器学 习算法, 例如: SVM算法, 分别对这些字幕文件和音频文件的特征数据以及 标注进行训练, 从而分别得到字幕文本分类模型和音频文本分类模型。
服务器接收到客户端发送的图像特征数据、 字幕文本和音频文本中的至 少一个后, 一方面, 服务器可以将图像特征数据输入图像特征数据分类模型 进行分类, 得到图形特征数据结果向量, 该向量包括多个维度, 每一维度可 以代表一种类别, 例如: 体育类、 财经类、 娱乐类等。 该向量的每一维度表 示了输入的图像特征数据属于对应类别的可能性, 某一类别对应维度的数值 越大, 说明输入的图像特征属于该类别的可能性越大。 即, 服务器将输入的 图像特征数据输入图像特征数据分类模型并输出图像特征数据结果向量的过 程, 实际上是对图像特征数据进行分类的过程。
同样的, 服务器可以将字幕文本输入字幕文本分类模型, 得到字幕文本 分类结果向量; 服务器还可以将音频文本输入音频文本分类模型, 得到音频 文本分类结果向量。
在分别获取图像特征数据结果向量、 字幕文本分类结果向量和音频文本 分类结果向量中的至少一个之后, 服务器可以进一步对图形特征数据分类结 果向量、 字幕文本分类结果向量以及音频文本分类结果向量中的至少一个进 行加权融合计算, 即, 分别根据图像特征数据结果向量表示的图像特征数据 所属类别, 和 /或字幕文本分类结果向量表示的字幕文本所属类别, 和 /或音频 文本分类结果向量所属类别三者进行加权融合, 得到视频文件的特征融合结 果向量,该特征融合结果向量表示了客户端当前播放的视频内容所属的类别。 其中, 服务器进行加权融合的过程可以采用现有技术提供的各种加权融合算 法。
另一方面, 服务器也可以预先获取需要投放的各种广告文件对应的图像 特征数据和 /或音频文本, 对于有字幕的广告文件, 服务器还可以进一步获取 需要投放的各广告文件的字幕文本, 并将每个广告文件对应的图像特征数据 和 /或音频文本和 /或字幕文本, 分别输入到图像特征数据分类模型、 音频文本 分类模型以及字幕文本分类模型中, 得到每个广告文件对应的图像特征数据 结果向量、 音频文本分类结果向量和字幕文本分类结果向量, 再对广告文件 的图像特征数据结果向量、和 /或音频文本分类结果向量和 /或包括的字幕文本 分类结果向量进行融合计算, 得到广告文件的特征融合结果向量。 服务器获取客户端正在播放的视频文件对应的特征融合结果向量和需要 投放的每个广告对应的特征融合结果向量之后, 可以进一步将该视频文件对 应的特征融合结果向量与各种需要投放的广告文件对应的特征融合结果向量 进行相似度匹配计算, 根据相似度的高低确定一个或多个与客户端当前播放 视频内容最为匹配的广告文件。 其中, 服务器进行相似度匹配的过程可以采 用现有技术提供的各种相似度匹配算法。
服务器确定了匹配的广告文件之后, 可以将匹配的广告文件或者是广告 链接发送到客户端, 以供客户端播放。
本实施例提供的视频广告播放方法, 客户端根据当前播放的视频画面分 析获取图像特征数据、 字幕文本和音频文本的至少一个发送给服务器, 服务 器根据客户端提供的这些特征数据得到视频文件的特征融合结果向量, 并与 待投放的各个广告文件的特征融合结果向量进行相似度匹配计算确定相匹配 的广告文件, 再将匹配的广告发送给客户端播放, 从而使客户端播放的广告 更适合客户端当前正在播放的场景。
图 3为本发明提供的视频广告播放方法另一个实施例的流程图, 如图 3 所示, 本实施例提供了客户端向服务器提供的视频文件的特征数据包括: 图 像特征数据、 字幕文本和音频文本中的至少一个, 服务器根据图像特征数据、 字幕文本和、 音频文本中的至少一个确定相匹配的广告的一个具体实施例, 该方法包括:
S301a、 服务器对收集的训练视频画面进行图像特征提取, 得到训练视 频画面的图像特征数据 , 对训练视频画面进行文本标注 , 得到训练视频画面 的标注数据, 对训练视频画面的图像特征数据和标注数据进行支持向量机 SVM训练, 得到图像特征数据分类模型。
服务器可以收集若干图片, 这些图片可以是视频中某些重要画面或指定 需要插播广告的视频画面, 这些图片在此命名为训练视频画面。 服务器对训 练视频画面进行图像特征提取, 得到训练视频画面的图像特征数据, 这些图 像特征数据可以包括: 用于表示视频画面图像颜色特征的颜色累积直方图数 据、 用于表示视频画面图像纹理特征的灰度共生矩阵数据、 用于表示视频画 面图像形状特征的灰度梯度方向矩阵数据等。
服务器还可以进一步对训练视频画面进行文本标注, 即, 对训练视频画 面按照所属类别进行分类, 例如: 可以分为体育类、 财经类、 娱乐类等, 从 而得到训练视频画面的标注数据。 服务器可以将训练视频画面的图像特征数据和标注数据作为 SVM 分类 算法的输入,对图像特征数据和标注数据进行支持向量机 SVM训练,得到图 像特征数据分类模型。 即, 机器可以通过对训练图片的图像特征数据和标注 数据进行学习, 得到一些 "经验 ", 从而能够对新数据进行分类。 而机器通过 学习得到的 "经验" 即为图像特征数据分类模型。
S301 b、 服务器对收集的训练视频进行字幕提取, 得到训练视频的字幕 文本, 对训练视频进行文本标注, 得到训练视频的标注数据, 对训练视频的 字幕文本和标注数据进行 SVM训练, 得到字幕文本分类模型。
与 S301 a类似的, 服务器可以收集包含字幕的训练视频, 并对这些训练 视频进行字幕提取, 得到训练视频的字幕文本。 并且, 服务器可以对训练视 频进行文本标注, 得到训练视频的标注数据, 然后将训练视频的字幕文本和 标注数据作为 SVM分类算法的输入,对训练视频的字幕文本和标注数据进行 S VM训练, 得到字幕文本分类模型。
S301c、 服务器对收集的训练音频内容进行音频提取, 得到训练音频的 音频文本, 对训练音频进行文本标注, 得到训练音频的标注数据, 对训练音 频的视频文本和标注数据进行 SVM训练, 得到音频文本分类模型。
与 S301 a类似的, 服务器还可以收集包含音频的训练视频, 对这些训练 音频内容进行音频提取, 得到训练音频内容的音频文本。 服务器还需要对训 练视频画面音频进行文本标注, 得到训练视频画面音频的文本标注, 然而将 训练音频内容的音频文本和标注数据作为 SVM分类算法的输入,对训练音频 内容的视频音频文本和标注数据进行 SVM训练, 得到音频文本分类模型。
S301a-S301c为服务器通过 SVM训练得到图像特征数据分类模型、 字 幕文本分类模型和音频文本分类模型的过程。 以上几个步骤之间的顺序不分 先后。
S302、 服务器接收客户端发送的视频文件的图像特征数据、 字幕文本和 音频文本。
S303a、 服务器将视频文件的图像特征数据输入预设的图像特征数据分 类模型进行分类, 得到视频文件的图形特征数据分类结果向量; 和 /或, 服务 器将视频文件的字幕文本输入预设的字幕文本分类模型进行分类, 得到视频 文件的字幕文本分类结果向量; 和 /或, 服务器将视频文件的音频文本输入预 设的音频文本分类模型进行分类, 得到视频文件的音频文本分类结果向量; 其中, 图像特征数据分类模型、 字幕文本分类模型和音频文本分类模型 具有相同的分类维度。
由于服务器预先建立的图像特征数据分类模型、 字幕文本分类模型和音 频文本分类模型分别为用于对图像特征数据、 字幕文本和音频文本进行分类 的经验模型, 因此, 从图像特征数据分类模型、 字幕文本分类模型和音频文 本分类模型输出的视频文件的图形特征数据分类结果向量、 字幕文本分类结 果向量和音频文本分类结果向量分别体现了视频文件的图像特征数据、 字幕 文本和音频文本所属的类别。
由于图像特征数据分类模型、 字幕文本分类模型和音频文本分类模型的 类别和维度均相同, 因此, 视频文件的图像特征数据分类结果向量、 字幕文
(丄人…丄) 本分类结果向量和音频文本分类结果向量的默认值均可以取: , 其中包括 η个 1/η, η表示分类类别的维度数。
需要说明的是,如果客户端可以向服务器发送视频文件的图像特征数据、 字幕文本或者音频文本中的一个或多个, 例如: 某个视频无音频, 则客户端 可以向服务器发送图像特征数据和字幕文本; 则这种场景下, 服务器可以取 音频文本分类结果向量为默认值。 其他情况不——列举。
与获得视频文件的图形特征数据分类结果向量、 字幕文本分类结果向量 和视频文本分类结果向量相对应的, 服务器还可以根据各个待投放的广告文 件视频画面、 视频字幕和音频内容分别获取各广告文件的图像特征数据、 字 幕文本和音频文本, 并分别将各广告文件的图像特征数据、 字幕文本和音频 文本分别输入至图像特征数据分类模型、 字幕文本分类模型和音频文本分类 模型, 得到各广告的图像特征数据分类结果向量、 字幕文本分类结果向量或 音频文本分类结果向量, 即服务器还需要执行 S303M和 S303b2的操作, 以便进行后续的匹配操作。
S303b1、 服务器根据待投放的各广告文件的视频画面和 /或视频字幕和 / 或音频内容, 分别获取各广告文件的图像特征数据、 字幕文本和音频文本中 的至少一个。
其中, 服务器获取各广告文件的图像特征数据、 字幕文本和音频文本的 操作可以参考前述实施例中客户端获取视频文件的图像特征数据、 字幕文本 和音频文本的具体过程, 在此不再赘述。
S303b2、服务器将各广告文件的图像特征数据输入图像特征数据分类模 型进行分类, 得到各广告文件的图像特征数据分类结果向量; 和 /或, 将各广 告文件的字幕文本输入字幕文本分类模型进行分类, 得到各广告文件的字幕 文本分类结果向量; 和 /或, 将各广告文件的音频文本输入音频文本分类模型 进行分类, 得到各广告文件的音频文本分类结果向量。
其中, 广告文件的图像特征数据分类结果向量、 广告文件的字幕文本分 类结果向量和广告文件的音频文本分类结果向量具有相同的分类维度
需要说明的是, S303b1和 S303b2的操作可以在服务器接收到客户端发 送的视频文件的图像特征数据、字幕文本和音频文本中的至少一个之前进行, 也可以在收到图像特征数据、 字幕文本和音频文本中的至少一个之后进行。
S304a、 服务器对视频文件的图形特征数据分类结果向量、 字幕文本分 类结果向量以及音频文本分类结果向量中的至少一个进行加权融合计算, 得 到视频文件的特征融合结果向量。
本实施例提供了加权融合计算的一种方法, 假设分类维度为 n维, 视频文 件的图像特征数据分类模型得到的视频文件的图像特征数据分类结果向量 为:
ΰ = (H ,H ,...H ). 其中, 表示图形特征数据分类结果向量, 0<W'<1, i=1,2,〜n, W '为图 像特征数据输入图像特征数据分类模型后得到的图像特征数据分类结果向量 在维度 i的得分值。
字幕文本分类模型得到的视频文件的字幕文本分类结果向量为:
Κ = (ν1?ν2,...νκ). 其中, ^表示字幕文本分类结果向量, 0<V'<1, i=1,2,〜n, V '为字幕文 本输入字幕文本分类模型后得到的字幕文本分类结果向量在维度 i的得分值。
音频文本分类模型得到的视频文件的音频文本分类结果向量为:
W - (w, v2 .. v) 其中, ^表示音频文本分类结果向量, 0<W'<1, i=1,2,〜n, W '为音频 文本输入音频文本分类模型后得到的音频文本结果向量在维度 i的得分值。
服务器对视频文件的图形特征数据分类结果向量、字幕文本分类结果向量 以及音频文本分类结果向量进行加权融合可以采用以下公式:
Ά = α·ϋ + β·ν + γ·Ψ. 该公式表示特征融合结果向量为:视频文件的图像特征数据结果向量、 字幕文本结果向量和音频文本结果向量三者加权之和。 其中, ^表示特征融 合结果向量, a , β , ^分别为图像特征数据结果向量、 字幕文本结果向量 和音频文本结果向量赋予的权重参数。 ", Ρ , ^的取值计算公式为:
Figure imgf000014_0001
该公式表示向量 与单位向量/的夹角余弦值。其中, ^表示向量 的各维度得分值之和,
Figure imgf000014_0002
的各维度得分值平方之和的平方 才艮。 <¾'<1, i=1,2,...n。
Figure imgf000014_0003
该公式表示计算向量 ^与单位向量的夹角余弦值。 其中, 表示
, ^ , - V' 表示向量 ^的各维度得分值平方之和 的平 V'<1, i=1,2,...n(
Figure imgf000014_0004
该公式表示计算向量^与单位向量的夹角余弦值。其中, W表示 向量^的各维度得分值之和, ^ 表示向量 的各维度得分值平方之和 的平方根。 0< <1, i=1,2,...nt
= (U"..l); 其中包括 n个 1, 1表示单位向量, 1
cos(t ,/)
a
1
- + ^^ + - cos 该公式表示"取值等于:向量 与单位向量的夹角余弦值的倒数除以三
Figure imgf000015_0001
该公式表示取值等于:向量 与单位向量的夹角余弦值的倒数除以三
1
Figure imgf000015_0002
该公式表示 ^取值等于: 向量^与单位向量的夹角余弦值的倒数除以 三个向量 t7 , V , 分别与单位向量的夹角余弦值的倒数之和。
由于目前, 客户端播放的视频文件可能在服务器侧存储有对应的多个标 签, 每个标签用于标注视频文件某一片段或画面的内容, 因此, 可选的, 如 果服务器侧具有视频文件对应的多个标签, 则服务器可以在得到视频文件特 征融合结果向量 之后, 进一步通过视频文件对应的标签, 对特征融合结果 向量进行修正。 具体如下: 特征融合结果向量为: R = r、, "" ; 其中, 0< <1 , i=1 ,2,...n,
^为特征融合结果向量在维度 i的值。
服务器可以预先生成标签的得分向量, 具体可以是将多个标签分别与各 分类模型的分类维度进行映射 , 然后分别统计每个分类维度对应的标签数量 得到一个向量, 将该向量归一化作为视频文件对应的标签得分向量。 假设标 签的得分向量为: S = ^,^,'^) 其中, i=1,2,...n, '为标签的得分向量在维度 i的值。
服务器可以根据视频文件的标签得分向量对视频文件的特征融合结果向 量进行修正, 可以采用以下公式实现:
Τ = λ·Ά + μ·Ξ 其中, f表示修正后的最终的分类结果向量, A表示视频文件的特征融 合结果向量, ^表示标签得分向量, , ^为分别为视频文件的特征融合结 果向量、 标签得分向量赋予的权重参数。 T表示视频文件的特征融合结果向 量和标签的得分向量二者加权之和。
其中, 权重参数 , ^的取值计算公式为:
Figure imgf000016_0001
该公式表示计算向量 ^与单位向量 /的夹角余弦值。其中?? 表示向 量 的各维度得分值之和, 表示向量 ^的各维度得分值平方之和的 平方 < <1, i=1,2,...n。
Figure imgf000016_0002
S与单位向量的夹角余弦值。其中, ^ '表示向 量
Figure imgf000016_0003
的各维度得分值平方之和的 平方根。 0< <1, i=1,2,...n。
1
, cos(R,l)
Λ= 1 | 1
cos ( , ) cos( , ) 该公式表示参数 取值等于: 向量 与单位向量的夹角余弦值的倒数 除以两个向量 A , s分别与单位向量的夹角余弦值的倒数之和。
1
cos(S, l)
| 1
cos(R, /) cos^, / ) 该公式表示参数 取值等于: 向量 S与单位向量的夹角余弦值的倒 数除以两个向量 , s分别与单位向量的夹角余弦值的倒数之和。
以上仅是本实施例提供的加权融合算法的可行方式,但并不以此作为 对本发明的限制, 实际上, 本发明还可以采用现有的其他加权融合算法确定 视频文件或广告文件的特征融合结果向量。
S304b、 服务器分别对各广告文件的图形特征数据分类结果向量、 字幕 文本分类结果向量以及音频文本分类结果向量中的至少一个进行加权融合计 算, 得到各广告文件的特征融合结果向量。
与 S304a相类似的,服务器也可以对各广告文件的图形特征数据分类结 果向量、 字幕文本分类结果向量以及音频文本分类结果向量中的至少一个进 行加权融合计算, 加权融合计算的具体过程可参见 S304a, 在此不再赘述。
需要说明的是, S304b的操作可以在服务器接收到客户端发送的视频文 件的图像特征数据、 字幕文本和音频文本中的至少一个之前进行, 也可以在 收到图像特征数据、 字幕文本和音频文本中的至少一个之后进行。
S305、 服务器对各广告文件的特征融合结果向量与视频文件的特征融合 结果向量进行相似度匹配计算, 将相似度最大的一个或多个广告文件确定为 匹配的广告文件。
本实施例提供一种相似度匹配计算的方法, 具体如下:
假设任一广告文件的特征融合结果向量为:
— (« ^ JC ,· · ·· ) 其中, 0< ' <1 , i=1 ,2,...n, '为广告文件在维度 i的得分值。
假设视频文件的特征融合结果向量为: Y = (y^ y^-yJ 其中, 0< <1; i=1 ,2,...n; 为视频文件在维度 i的得分值。
Figure imgf000018_0001
该公式表示计算广告文件的特征融合结果向量和视频文件的特征融合 结果向量的夹角正弦值。 其中, 表示两向量对应维度值分别相乘之 和, 」ix' 表示向量 X的各维度得分值的平方之和的平方根, ^ίΐ^表示向 量 的各维度得分值的平方之和的平方根, 0< , <1 , i=1 ,2,...n , ,为广告 文件在维度 i的得分值, 0< <1; i=1 ,2,...n; 为视频文件在维度 i的得分值。 似度值最大的一个或几个广告确定为匹配的广告。
以上仅是本实施例提供的相似度匹配算法的可行方式, 但并不以此作为 对本发明的限制, 实际上, 本发明还可以采用现有的其他相似度匹配算法确 定与视频文件相匹配的广告文件。
S306、 服务器将匹配的广告文件发送给客户端。
服务器确定与视频文件相匹配的广告文件后, 可以将匹配的广告文件或 广告文件的链接发给客户端, 以供客户端播放。
本实施例提供的视频广告的播放方法, 可以应用在个人计算机、 手机等 终端的客户端, 例如: 视频播放器中插播广告, 尤其适合在视频播放点击暂 停时, 选择与当前播放视频内容最相近匹配的广告进行播放。
以下再以一个具体的例子对本发明提供的视频广告的播放方法进行说 明。假设客户端需要在播放某视频文件点击暂停按钮时插播广告。如图 4所示, 该方法具体包括:
S401、客户端获得当前播放视频文件的视频画面、视频字幕和音频内容。 客户端可以利用视频播放软件直接获取当然播放视频的画面截图, 作为 当前播放视频文件的视频画面。
客户端可以将视频片段切割成帧, 然后针对视频图像进行处理, 判断视 频图像中是否包含有文字信息, 以及文字信息在视频图像中的位置, 并将文 字区域切割出来形成文字区域。 最后将提取出来的文字区域进行灰度化和二 值化, 得到白底黑字或黑底白字的字幕文字图片。
客户端还可以通过视频播放器直接获得当前播放视频文件的音频内容, 还可以选择视频中截取的起始时间和结束时间之间的音频内容, 选择需要的 音频部分。
5402、 客户端根据当前播放的视频文件的视频画面、 视频字幕和音频内 容, 分析获取视频文件的图像特征数据, 视频字幕的字幕文本和音频内容的 音频文本。
客户端获取视频文件图像特征数据, 视频字幕的字幕文本和音频内容的 音频文本的过程可参见图 1所示实施例的对应描述, 在此不再赘述。
5403、 客户端将视频文件的图像特征数据、 字幕文本和音频文本发送给 服务器。
S404、 服务器根据视频文件的图像特征数据、 字幕文本和音频文本, 得 到视频文件的特征融合结果向量。
S405、 服务器将待投放的各个广告文件的特征融合结果向量与视频文件 的特征融合结果向量进行相似度匹配计算, 将相似度最大的一个或多个广告 文件确定为匹配的广告文件。
服务器在确定匹配的广告文件之前, 需要建立图像特征数据分类模型、 字幕文本分类模型和音频文本分类模型 ,其建模的具体过程可参见图 3所示的 实施例。 本实施例中, 服务器为各分类模型设置的分类维度为 5维, 例如可以 是: 汽车、 IT、 房产、 美食、 娱乐。
假设视频文件的图像特征数据输入图像特征数据分类模型得到的视频文 件的图像特征数据分类结果向量为:
U =(0.10,0.10,0.05,0.05,0.70);
视频文件的字幕文本输入字幕文本分类模型得到的视频文件的字幕文本 分类结果向量为:
^=(0.05,0.05,0.10,0,0.80);
视频文件的音频文本输入音频文本分类模型得到的视频文件的音频文本 分类结果向量为: =(0.07,0.08,0.10,0,0.75); 则视频文件的特征融合结果向量 R, 其计算过程可参见图 3所示实施例, 其中: cos(U ) = 厂 ) = 1.60
V5 - V0.515 其中, I = (1,1"..1)中包括51
1
cos
a = (i7,/)
1 1 1
cos (ϋ,ϊ) cos(F,/) cos(W,I)
0.625
0.625 + 0.552 + 0.585 = 0.355
1
cos(K,/)
β 1 1 1 0.552
+ +
cos(t7,/) cos(K,/) cos( ,/) " 0.625 + 0.552 + 0.585 = 0.313
1
cos(f ,7)
1 1 1 0.585
cos(t/,7) cos( ,7) cos(^,7) ~ 0.625 + 0.552 + 0.585 = 0.332 α · ϋ =(0.0355,0.0355,0.0178,0.0178,0.2485)
· :(0.0156,0.0156,0.0313,0,0.2505)
Figure imgf000020_0001
Ά = - ϋ + β - Ϋ + γ - =(0.0743,0.0777,0.0823, 0.0178,0.7480)
需要说明的是, 如果服务器侧不具有该视频文件的标签, 则可以直接将 上述过程得到的视频文件的特征融合结果向量 ^与各广告文件的特征融合结 果向量进行相似度匹配计算(本实施例中省略了各广告文件的特征融合结果 向量的计算过程) , 相似度最大的一个或多个广告, 即为与视频文件最为匹 配的目标广告文件。
如果服务器侧存储该视频文件的标签, 则可以将这些标签映射到各分类 模型的分类维度上, 统计映射到各分类维度的标签数量, 得到标签得分向量 s , 再采用标签得分向量 s对视频文件的特征融合结果向量 进行修正, 得 到最终的视频文件的特征融合结果向量 。 再将 Γ与各广告文件的特征融合 结果向量进行相似度匹配计算, 确定与视频文件相匹配的广告文件。
S406、 服务器将匹配的广告文件发送给客户端。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流 程, 是可以通过计算机程序来指令相关的硬件来完成, 的程序可存储于一计 算机可读取存储介质中, 该程序在执行时, 可包括如上述各方法的实施例的 流程。 其中, 的存储介质可为磁碟、 光盘、 只读存储记忆体 ( Read-Only Memory, ROM )或随机存储记忆体( Random Access Memory, RAM )等。
图 5为本发明提供的服务器一个实施例的结构示意图, 如图 5所示, 该 服务器包括: 接收器 11、 处理器 12和发送器 13; 其中:
接收器 11 , 用于接收客户端发送的视频文件的图像特征数据、 字幕文本 和音频文本中的至少一个, 视频文件的图像特征数据、 字幕文本和音频文本 由客户端分别根据当前播放的视频文件的视频画面、 视频字幕和音频内容分 析获取;
处理器 12, 用于根据视频文件的图像特征数据、 字幕文本和音频文本中 的至少一个, 得到视频文件的特征融合结果向量; 将待投放的各个广告文件 的特征融合结果向量与视频文件的特征融合结果向量进行相似度匹配计算, 将相似度最大的一个或多个广告文件确定为匹配的广告文件;
发送器 13, 用于将匹配的广告文件发送给客户端。
图 6为本发明提供的服务器又一个实施例的结构示意图, 如图 6所示, 该服务器包括: 接收器 11、 处理器 12、 发送器 13和存储器 14;
本实施例中, 处理器 12可以具体用于: 将视频文件的图像特征数据输入 预设的图像特征数据分类模型进行分类, 得到视频文件的图形特征数据分类 结果向量; 和 /或, 将视频文件的字幕文本输入预设的字幕文本分类模型进行 分类, 得到视频文件的字幕文本分类结果向量; 和 /或, 将视频文件的音频文 本输入预设的音频文本分类模型进行分类, 得到视频文件的音频文本分类结 果向量, 图像特征数据分类模型、 字幕文本分类模型和音频文本分类模型具 有相同的分类维度; 对视频文件的图形特征数据分类结果向量、 字幕文本分 类结果向量和音频文本分类结果向量中的至少一个进行加权融合计算, 得到 视频文件的特征融合结果向量。 进一步的, 处理器 12还可以用于: 对收集的训练视频画面进行图像特征 提取, 得到训练视频画面的图像特征数据; 对训练视频画面进行文本标注 , 得到训练视频画面的标注数据; 对训练视频画面的图像特征数据和标注数据 进行支持向量机 SVM训练 , 得到图像特征数据分类模型;
类似的, 处理器 12还可以用于: 对收集的训练视频进行字幕提取, 得到 训练视频的字幕文本; 对训练视频进行文本标注, 得到训练视频的标注数据; 对训练视频的字幕文本和标注数据进行 SVM训练, 得到字幕文本分类模型; 同样, 处理器 12还可以用于: 对收集的训练音频进行音频提取, 得到训 练音频的音频文本; 对训练音频进行文本标注, 得到训练音频的标注数据; 对训练音频的视频文本和标注数据进行 SVM训练, 得到音频文本分类模型。
作为一种可行的实施方式, 处理器 12还可以具体用于: 根据^ = " ^ + ^ ' ^ + 进行加权融合计算, 其中, 表示特征融 合结果向量, /为单位向量, 表示图形特征数据分类结果向量, 表示字 幕文本分类结果向量, ^表示音频文本分类结果向量, CC , β , 分别为图 像特征数据结果向量、 字幕文本结果向量和音频文本结果向量赋予的权重参
数, ,
Figure imgf000022_0001
1 1 | 1 | 1
cos 0,1) cos(V, I) cos V )
进一步的, 处理器 12还可以用于: 根据待投放的各广告文件的视频画面 和 /或视频字幕和 /或音频内容, 分别获取各广告文件的图像特征数据、 字幕文 本和音频文本中的至少一个; 将各广告文件的图像特征数据输入图像特征数 据分类模型进行分类,得到各广告文件的图像特征数据分类结果向量;和 /或, 将各广告文件的字幕文本输入字幕文本分类模型进行分类, 得到各广告文件 的字幕文本分类结果向量; 和 /或, 将各广告文件的音频文本输入音频文本分 类模型进行分类, 得到各广告文件的音频文本分类结果向量, 广告文件的图 像特征数据分类结果向量、 广告文件的字幕文本分类结果向量和广告文件的 音频文本分类结果向量具有相同的分类维度; 分别对各广告文件的图形特征 数据分类结果向量、 字幕文本分类结果向量和音频文本分类结果向量中的至 少一个进行加权融合计算, 得到各广告文件的特征融合结果向量。
存储器 14可以用于: 存储视频文件的多个标签,标签用于标注视频文件 的片段或画面内容;
相应的, 处理器 12还可以用于: 将多个标签分别与分类维度进行映射, 分别统计每个分类维度对应的标签数量 ,得到视频文件对应的标签得分向量; 采用视频文件的标签得分向量, 对视频文件的特征融合结果向量进行修正。
本发明实施例提供的服务器, 与本发明提供的视频播放方法相对应, 为 功能设备, 其执行视频播放方法的具体过程可参见方法
Figure imgf000023_0001
本发明实施例提供的服务器, 客户端根据当前播放的视频画面分析获取 图像特征数据、 字幕文本和音频文本的至少一个发送给服务器, 服务器根据 客户端提供的这些特征数据得到视频文件的特征融合结果向量, 并与待投放 的各个广告文件的特征融合结果向量进行相似度匹配计算确定相匹配的广告 文件, 再将匹配的广告发送给客户端播放, 从而使客户端播放的广告更适合 客户端当前正在播放的场景。
图 7为本发明提供的客户端一个实施例的结构示意图, 如图 7所示, 该 客户端包括: 处理器 21、 发送器 22和播放器 23;
处理器 21 , 用于根据当前播放的视频文件的视频画面和 /或视频字幕和 / 或音频内容, 分析获取视频画面的图像特征数据, 视频字幕的字幕文本和音 频内容的音频文本中的至少一个;
发送器 22, 用于将视频文件的图像特征数据、 字幕文本和音频文本中的 至少一个发送给服务器, 以使服务器根据视频文件的图像特征数据、 字幕文 本和音频文本中的至少一个确定匹配的广告文件;
播放器 23, 用于播放服务器发送的匹配的广告文件。
本发明实施例提供的客户端, 与本发明提供的视频播放方法相对应, 为 实现视频播放方法的功能设备, 其执行视频播放方法的具体过程可参见方法 实施例, 不再赘述。
本发明实施例提供的客户端, 根据当前播放的视频画面分析获取图像特 征数据、 字幕文本和音频文本的至少一个发送给服务器, 服务器根据客户端 提供的这些特征数据得到视频文件的特征融合结果向量, 并与待投放的各个 广告文件的特征融合结果向量进行相似度匹配计算确定相匹配的广告文件, 再将匹配的广告发送给客户端播放, 从而使客户端播放的广告更适合客户端 当前正在播放的场景。
图 8为本发明提供的视频广告播放系统一个实施例的结构示意图, 如图 8所示, 该系统包括: 客户端 1和服务器 2; 其中:
客户端 1 用于: 根据当前播放的视频文件的视频画面和 /或视频字幕和 / 或音频内容, 分析获取视频画面的图像特征数据, 视频字幕的字幕文本和音 频内容的音频文本中的至少一个; 将视频文件的图像特征数据、 字幕文本和 音频文本中的至少一个发送给服务器 2 , 以使服务器 2根据视频文件的图像 特征数据、 字幕文本和音频文本中的至少一个确定匹配的广告文件; 播放服 务器 2发送的匹配的广告文件;
服务器 2用于: 接收客户端 1发送的视频文件的图像特征数据、 字幕文 本和音频文本中的至少一个, 视频文件的图像特征数据、 字幕文本和音频文 本由客户端分别根据当前播放的视频文件的视频画面、 视频字幕和音频内容 分析获取; 根据视频文件的图像特征数据、 字幕文本和音频文本中的至少一 个, 得到视频文件的特征融合结果向量; 将待投放的各个广告文件的特征融 合结果向量与视频文件的特征融合结果向量进行相似度匹配计算, 将相似度 最大的一个或多个广告文件确定为匹配的广告文件; 将匹配的广告文件发送 给客户端 1。
本发明实施例提供的视频广告播放系统, 与本发明提供的视频播放方法 相对应, 为实现视频播放方法的系统, 其执行视频播放方法的具体过程可参 见方法实施例, 不再赘述。
本发明实施例提供的视频广告播放系统, 根据当前播放的视频画面分析 获取图像特征数据、 字幕文本和音频文本的至少一个发送给服务器, 服务器 根据客户端提供的这些特征数据得到视频文件的特征融合结果向量, 并与待 投放的各个广告文件的特征融合结果向量进行相似度匹配计算确定相匹配的 广告文件, 再将匹配的广告发送给客户端播放, 从而使客户端播放的广告更 适合客户端当前正在播放的场景。
最后应说明的是: 以上实施例仅用以说明本发明的技术方案, 而非对其 限制; 尽管参照前述实施例对本发明进行了详细的说明, 本领域的普通技术 人员应当理解: 其依然可以对前述各实施例所记载的技术方案进行修改, 或 者对其中部分技术特征进行等同替换; 而这些修改或者替换, 并不使相应技 术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims

权 利 要求
1、 一种视频广告播放方法, 其特征在于, 包括:
接收客户端发送的视频文件的图像特征数据、 字幕文本和音频文本中的 至少一个, 所述视频文件的图像特征数据、 字幕文本和音频文本由所述客户 端分别根据当前播放的所述视频文件的视频画面、 视频字幕和音频内容分析 获取;
根据所述视频文件的图像特征数据、字幕文本和音频文本中的至少一个, 得到所述视频文件的特征融合结果向量;
将待投放的各个广告文件的特征融合结果向量与所述视频文件的特征融 合结果向量进行相似度匹配计算, 将相似度最大的一个或多个广告文件确定 为匹配的广告文件;
将所述匹配的广告文件发送给所述客户端。
2、 根据权利要求 1所述的方法, 其特征在于, 所述根据所述视频文件的 图像特征数据、 字幕文本和音频文本中的至少一个, 得到所述视频文件的特 征融合结果向量, 包括:
将所述视频文件的图像特征数据输入预设的图像特征数据分类模型进行 分类, 得到所述视频文件的图形特征数据分类结果向量; 和 /或, 将所述视频 文件的字幕文本输入预设的字幕文本分类模型进行分类, 得到所述视频文件 的字幕文本分类结果向量; 和 /或, 将所述视频文件的音频文本输入预设的音 频文本分类模型进行分类, 得到所述视频文件的音频文本分类结果向量, 所 述图像特征数据分类模型、 所述字幕文本分类模型和所述音频文本分类模型 具有相同的分类维度;
对所述视频文件的图形特征数据分类结果向量、 字幕文本分类结果向量 和音频文本分类结果向量中的至少一个进行加权融合计算, 得到所述视频文 件的特征融合结果向量。
3、 根据权利要求 2所述的方法, 其特征在于, 所述接收客户端发送的视 频文件的图像特征数据、 字幕文本和音频文本中的至少一个之前, 还包括: 对收集的训练视频画面进行图像特征提取, 得到所述训练视频画面的图 像特征数据;
对所述训练视频画面进行文本标注 ,得到所述训练视频画面的标注数据; 对所述训练视频画面的图像特征数据和标注数据进行支持向量机 SVM 训练, 得到所述图像特征数据分类模型。
4、 根据权利要求 2所述的方法, 其特征在于, 所述接收客户端发送的视 频文件的图像特征数据、 字幕文本和音频文本中的至少一个之前, 还包括: 对收集的训练音频进行音频提取, 得到所述训练音频的音频文本; 对所述训练音频进行文本标注, 得到所述训练音频的标注数据; 对所述训练音频的视频文本和标注数据进行 SVM训练,得到所述音频文 本分类模型。
5、 根据权利要求 2所述的方法, 其特征在于, 所述接收客户端发送的视 频文件的图像特征数据、 字幕文本和音频文本中的至少一个之前, 还包括: 对收集的训练视频进行字幕提取, 得到所述训练视频的字幕文本; 对所述训练视频进行文本标注, 得到所述训练视频的标注数据; 对所述训练视频的字幕文本和标注数据进行 SVM训练,得到所述字幕文 本分类模型。
6、 根据权利要求 2-5任一项所述的方法, 其特征在于,
根据 = . + . f + 进行所述加权融合计算, 其中, 表示特征融 合结果向量, /为单位向量, 表示图形特征数据分类结果向量, 表示字 幕文本分类结果向量, 表示音频文本分类结果向量, a , β , 分别为图 像特征数据结果向量、 字幕文本结果向量和音频文本结果向量赋予的权重参
1 1
_ cos(t ,/) β = cos(V,I)
a = 1 1 1 P— , , i—
数, cos(t7,/) + cos( , /) + cos(^, /) , cos(0,I) cos(f,7) cos(W,I) ,
1 1 | 1 | 1
cos 0,1) cos( , /) cos(f , /)
7、 根据权利要求 1-5任一项所述的方法, 其特征在于, 所述将待投放的 各个广告文件的特征融合结果向量与所述视频文件的特征融合结果向量进行 相似度匹配计算之前, 还包括: 分别获取各广告文件的图像特征数据、 字幕文本和音频文本中的至少一个; 将各广告文件的图像特征数据输入所述图像特征数据分类模型进行分 类, 得到各广告文件的图像特征数据分类结果向量; 和 /或, 将各广告文件的 字幕文本输入所述字幕文本分类模型进行分类, 得到各广告文件的字幕文本 分类结果向量; 和 /或, 将各广告文件的音频文本输入所述音频文本分类模型 进行分类, 得到各广告文件的音频文本分类结果向量, 所述广告文件的图像 特征数据分类结果向量、 所述广告文件的字幕文本分类结果向量和所述广告 文件的音频文本分类结果向量具有相同的分类维度;
分别对各广告文件的图形特征数据分类结果向量、 字幕文本分类结果向 量和音频文本分类结果向量中的至少一个进行加权融合计算, 得到各广告文 件的特征融合结果向量。
8、 根据权利要求 2-5任一项所述的方法, 其特征在于, 若服务器存储有 所述视频文件的多个标签, 所述标签用于标注所述视频文件的片段或画面内 容, 则所述得到所述视频文件的特征融合结果向量之后, 还包括:
将所述多个标签分别与所述分类维度进行映射, 分别统计每个所述分类 维度对应的标签数量, 得到所述视频文件对应的标签得分向量;
采用所述视频文件的标签得分向量, 对所述视频文件的特征融合结果向 量进行修正。
9、 一种视频广告播放方法, 其特征在于, 包括:
根据当前播放的视频文件的视频画面和 /或视频字幕和 /或音频内容,分析 获取所述视频画面的图像特征数据, 所述视频字幕的字幕文本和所述音频内 容的音频文本中的至少一个;
将所述视频文件的图像特征数据、 所述字幕文本和所述音频文本中的至 少一个发送给服务器, 以使所述服务器根据所述视频文件的图像特征数据、 所述字幕文本和所述音频文本中的至少一个确定匹配的广告文件;
播放所述服务器发送的匹配的广告文件。
10、 一种服务器, 其特征在于, 包括:
接收器, 用于接收客户端发送的视频文件的图像特征数据、 字幕文本和 音频文本中的至少一个, 所述视频文件的图像特征数据、 字幕文本和音频文 本由所述客户端分别根据当前播放的所述视频文件的视频画面、 视频字幕和 音频内容分析获取;
处理器, 用于根据所述视频文件的图像特征数据、 字幕文本和音频文本 中的至少一个, 得到所述视频文件的特征融合结果向量; 将待投放的各个广 告文件的特征融合结果向量与所述视频文件的特征融合结果向量进行相似度 匹配计算, 将相似度最大的一个或多个广告文件确定为匹配的广告文件; 发送器, 用于将所述匹配的广告文件发送给所述客户端。
11、根据权利要求 10所述的服务器,其特征在于,所述处理器具体用于: 将所述视频文件的图像特征数据输入预设的图像特征数据分类模型进行分 类, 得到所述视频文件的图形特征数据分类结果向量; 和 /或, 将所述视频文 件的字幕文本输入预设的字幕文本分类模型进行分类, 得到所述视频文件的 字幕文本分类结果向量; 和 /或, 将所述视频文件的音频文本输入预设的音频 文本分类模型进行分类, 得到所述视频文件的音频文本分类结果向量, 所述 图像特征数据分类模型、 所述字幕文本分类模型和所述音频文本分类模型具 有相同的分类维度; 对所述视频文件的图形特征数据分类结果向量、 字幕文 本分类结果向量和音频文本分类结果向量中的至少一个进行加权融合计算, 得到所述视频文件的特征融合结果向量。
12、 根据权利要求 11所述的服务器, 其特征在于, 所述处理器还用于: 对收集的训练视频画面进行图像特征提取, 得到所述训练视频画面的图像特 征数据; 对所述训练视频画面进行文本标注 , 得到所述训练视频画面的标注 数据; 对所述训练视频画面的图像特征数据和标注数据进行支持向量机 SVM 训练, 得到所述图像特征数据分类模型;
所述处理器还用于: 对收集的训练视频进行字幕提取, 得到所述训练视 频的字幕文本; 对所述训练视频进行文本标注, 得到所述训练视频的标注数 据;对所述训练视频的字幕文本和标注数据进行 SVM训练,得到所述字幕文 本分类模型;
所述处理器还用于: 对收集的训练音频进行音频提取, 得到所述训练音 频的音频文本; 对所述训练音频进行文本标注, 得到所述训练音频的标注数 据;对所述训练音频的视频文本和标注数据进行 SVM训练,得到所述音频文 本分类模型。
13、 根据权利要求 11或 12所述的服务器, 其特征在于, 所述处理器具体用于: 根据^ = α · ϋ + β· Ϋ + γ · ^进行所述加权融合 计算, 其中, 表示特征融合结果向量, 为单位向量, 表示图形特征数 据分类结果向量, ^表示字幕文本分类结果向量, ^表示音频文本分类结果 向量, a , β , 分别为图像特征数据结果向量、 字幕文本结果向量和音频 1
Figure imgf000029_0001
R _ cos(f,7) y = cos(W,I)
μ = ~ ϊ ϊ Ϊ ~ _ 1 , 1 , 1
cos(t/,/) cos( ,/) cos(^,7) , cos(t7, 7) cos(F, 7) cos(f , )。
14、 根据权利要求 10-12任一项所述的服务器, 其特征在于, 所述处理 器还用于:根据所述待投放的各广告文件的视频画面和 /或视频字幕和 /或音频 内容, 分别获取各广告文件的图像特征数据、 字幕文本和音频文本中的至少 一个; 将各广告文件的图像特征数据输入所述图像特征数据分类模型进行分 类, 得到各广告文件的图像特征数据分类结果向量; 和 /或, 将各广告文件的 字幕文本输入所述字幕文本分类模型进行分类, 得到各广告文件的字幕文本 分类结果向量; 和 /或, 将各广告文件的音频文本输入所述音频文本分类模型 进行分类, 得到各广告文件的音频文本分类结果向量, 所述广告文件的图像 特征数据分类结果向量、 所述广告文件的字幕文本分类结果向量和所述广告 文件的音频文本分类结果向量具有相同的分类维度; 分别对各广告文件的图 形特征数据分类结果向量、 字幕文本分类结果向量和音频文本分类结果向量 中的至少一个进行加权融合计算, 得到各广告文件的特征融合结果向量。
15、 根据权利要求 10-12任一项所述的服务器, 其特征在于, 还包括: 存储器, 用于存储所述视频文件的多个标签, 所述标签用于标注所述视 频文件的片段或画面内容;
所述处理器还用于: 将所述多个标签分别与所述分类维度进行映射, 分 别统计每个所述分类维度对应的标签数量, 得到所述视频文件对应的标签得 分向量; 采用所述视频文件的标签得分向量, 对所述视频文件的特征融合结 果向量进行修正。
16、 一种客户端, 其特征在于, 包括:
处理器,用于根据当前播放的视频文件的视频画面和 /或视频字幕和 /或音 频内容, 分析获取所述视频画面的图像特征数据, 所述视频字幕的字幕文本 和所述音频内容的音频文本中的至少一个;
发送器, 用于将所述视频文件的图像特征数据、 所述字幕文本和所述音 频文本中的至少一个发送给服务器, 以使所述服务器根据所述视频文件的图 像特征数据、 所述字幕文本和所述音频文本中的至少一个确定匹配的广告文 件;
播放器, 用于播放所述服务器发送的匹配的广告文件。
17、 一种视频广告播放系统, 其特征在于, 包括如权利要求 10-15任一 项所述的服务器和如权利要求 16所述的客户端。
PCT/CN2011/082747 2011-11-23 2011-11-23 视频广告播放方法、设备和系统 WO2012167568A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201180002916.5A CN103503463B (zh) 2011-11-23 视频广告播放方法、设备和系统
EP11867149.4A EP2785058A4 (en) 2011-11-23 2011-11-23 METHOD, DEVICE AND SYSTEM FOR BROADCASTING VIDEO TURNING
PCT/CN2011/082747 WO2012167568A1 (zh) 2011-11-23 2011-11-23 视频广告播放方法、设备和系统
US14/285,192 US20140257995A1 (en) 2011-11-23 2014-05-22 Method, device, and system for playing video advertisement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/082747 WO2012167568A1 (zh) 2011-11-23 2011-11-23 视频广告播放方法、设备和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/285,192 Continuation US20140257995A1 (en) 2011-11-23 2014-05-22 Method, device, and system for playing video advertisement

Publications (1)

Publication Number Publication Date
WO2012167568A1 true WO2012167568A1 (zh) 2012-12-13

Family

ID=47295411

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/082747 WO2012167568A1 (zh) 2011-11-23 2011-11-23 视频广告播放方法、设备和系统

Country Status (3)

Country Link
US (1) US20140257995A1 (zh)
EP (1) EP2785058A4 (zh)
WO (1) WO2012167568A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103024479A (zh) * 2012-12-17 2013-04-03 深圳先进技术研究院 视频内容中自适应投放广告的方法及系统
CN104244098A (zh) * 2014-10-08 2014-12-24 三星电子(中国)研发中心 提供内容的方法、终端、服务器和系统
WO2015010265A1 (en) * 2013-07-24 2015-01-29 Thomson Licensing Method, apparatus and system for covert advertising
WO2015161758A1 (en) * 2014-04-22 2015-10-29 Tencent Technology (Shenzhen) Company Limited Method for controlling network media information publication, apparatus, and server
CN105260368A (zh) * 2014-07-15 2016-01-20 阿里巴巴集团控股有限公司 一种视频数据的编辑、业务对象的推送方法、装置和系统
CN106792003A (zh) * 2016-12-27 2017-05-31 西安石油大学 一种智能广告插播方法、装置及服务器
CN107659545A (zh) * 2016-09-28 2018-02-02 腾讯科技(北京)有限公司 一种媒体信息处理方法及媒体信息处理系统、电子设备
CN109408639A (zh) * 2018-10-31 2019-03-01 广州虎牙科技有限公司 一种弹幕分类方法、装置、设备和存储介质
CN110472002A (zh) * 2019-08-14 2019-11-19 腾讯科技(深圳)有限公司 一种文本相似度获取方法和装置
CN111767726A (zh) * 2020-06-24 2020-10-13 北京奇艺世纪科技有限公司 数据处理方法及装置
CN113473179A (zh) * 2021-06-30 2021-10-01 北京百度网讯科技有限公司 视频处理方法、装置、电子设备和介质
CN115545020A (zh) * 2022-12-01 2022-12-30 浙江出海云技术有限公司 一种基于大数据的广告引流效果分析方法

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI569648B (zh) * 2012-09-27 2017-02-01 晨星半導體股份有限公司 顯示方法與顯示裝置
EP3745284A1 (en) * 2015-11-16 2020-12-02 Huawei Technologies Co., Ltd. Model parameter fusion method and apparatus
CN105912615A (zh) * 2016-04-05 2016-08-31 重庆大学 一种基于人类语音内容索引的音频和视频文件管理方法
CN107257338B (zh) * 2017-06-16 2018-09-28 腾讯科技(深圳)有限公司 媒体数据处理方法、装置及存储介质
CN108184153A (zh) * 2017-12-29 2018-06-19 伟乐视讯科技股份有限公司 一种与视频内容相匹配的广告插播系统及方法
CN110620946B (zh) * 2018-06-20 2022-03-18 阿里巴巴(中国)有限公司 字幕显示方法及装置
KR102005112B1 (ko) * 2018-10-16 2019-07-29 (주) 씨이랩 콘텐츠 스트리밍 내 광고 서비스 제공 방법
US11379519B2 (en) * 2018-12-07 2022-07-05 Seoul National University R&Db Foundation Query response device and method
CN111629273B (zh) * 2020-04-14 2022-02-11 北京奇艺世纪科技有限公司 一种视频管理方法、装置、系统及存储介质
CN112203122B (zh) * 2020-10-10 2024-01-26 腾讯科技(深圳)有限公司 基于人工智能的相似视频处理方法、装置及电子设备
CN112822513A (zh) * 2020-12-30 2021-05-18 百视通网络电视技术发展有限责任公司 基于视频内容的广告投放展示方法、设备及存储介质
CN113158875B (zh) * 2021-04-16 2022-07-01 重庆邮电大学 基于多模态交互融合网络的图文情感分析方法及系统
US11842367B1 (en) * 2021-07-01 2023-12-12 Alphonso Inc. Apparatus and method for identifying candidate brand names for an ad clip of a query video advertisement using OCR data
CN116524394A (zh) * 2023-03-30 2023-08-01 北京百度网讯科技有限公司 视频检测方法、装置、设备以及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1582444A (zh) * 1999-12-30 2005-02-16 诺基亚有限公司 选择性媒体流广告技术
CN101046871A (zh) * 2006-03-28 2007-10-03 中兴通讯股份有限公司 一种流媒体服务器
CN101072340A (zh) * 2007-06-25 2007-11-14 孟智平 流媒体中加入广告信息的方法与系统
CN101179739A (zh) * 2007-01-11 2008-05-14 腾讯科技(深圳)有限公司 一种插入广告的方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6044376A (en) * 1997-04-24 2000-03-28 Imgis, Inc. Content stream analysis
US20090089830A1 (en) * 2007-10-02 2009-04-02 Blinkx Uk Ltd Various methods and apparatuses for pairing advertisements with video files
US20110251896A1 (en) * 2010-04-09 2011-10-13 Affine Systems, Inc. Systems and methods for matching an advertisement to a video

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1582444A (zh) * 1999-12-30 2005-02-16 诺基亚有限公司 选择性媒体流广告技术
CN101046871A (zh) * 2006-03-28 2007-10-03 中兴通讯股份有限公司 一种流媒体服务器
CN101179739A (zh) * 2007-01-11 2008-05-14 腾讯科技(深圳)有限公司 一种插入广告的方法及装置
CN101072340A (zh) * 2007-06-25 2007-11-14 孟智平 流媒体中加入广告信息的方法与系统

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103024479A (zh) * 2012-12-17 2013-04-03 深圳先进技术研究院 视频内容中自适应投放广告的方法及系统
WO2014094497A1 (zh) * 2012-12-17 2014-06-26 深圳先进技术研究院 视频内容中自适应投放广告的方法及系统
WO2015010265A1 (en) * 2013-07-24 2015-01-29 Thomson Licensing Method, apparatus and system for covert advertising
WO2015161758A1 (en) * 2014-04-22 2015-10-29 Tencent Technology (Shenzhen) Company Limited Method for controlling network media information publication, apparatus, and server
US10028019B2 (en) 2014-04-22 2018-07-17 Tencent Technology (Shenzhen) Company Limited Method for controlling network media information publication, apparatus, and server
CN105260368A (zh) * 2014-07-15 2016-01-20 阿里巴巴集团控股有限公司 一种视频数据的编辑、业务对象的推送方法、装置和系统
CN104244098A (zh) * 2014-10-08 2014-12-24 三星电子(中国)研发中心 提供内容的方法、终端、服务器和系统
CN104244098B (zh) * 2014-10-08 2018-07-10 三星电子(中国)研发中心 提供内容的方法、终端、服务器和系统
WO2018059333A1 (zh) * 2016-09-28 2018-04-05 腾讯科技(深圳)有限公司 一种媒体信息处理方法、系统、电子设备及存储介质
CN107659545A (zh) * 2016-09-28 2018-02-02 腾讯科技(北京)有限公司 一种媒体信息处理方法及媒体信息处理系统、电子设备
CN106792003A (zh) * 2016-12-27 2017-05-31 西安石油大学 一种智能广告插播方法、装置及服务器
CN106792003B (zh) * 2016-12-27 2020-04-14 西安石油大学 一种智能广告插播方法、装置及服务器
CN109408639A (zh) * 2018-10-31 2019-03-01 广州虎牙科技有限公司 一种弹幕分类方法、装置、设备和存储介质
CN110472002A (zh) * 2019-08-14 2019-11-19 腾讯科技(深圳)有限公司 一种文本相似度获取方法和装置
CN111767726A (zh) * 2020-06-24 2020-10-13 北京奇艺世纪科技有限公司 数据处理方法及装置
CN111767726B (zh) * 2020-06-24 2024-02-06 北京奇艺世纪科技有限公司 数据处理方法及装置
CN113473179A (zh) * 2021-06-30 2021-10-01 北京百度网讯科技有限公司 视频处理方法、装置、电子设备和介质
CN113473179B (zh) * 2021-06-30 2022-12-02 北京百度网讯科技有限公司 视频处理方法、装置、电子设备和介质
CN115545020A (zh) * 2022-12-01 2022-12-30 浙江出海云技术有限公司 一种基于大数据的广告引流效果分析方法

Also Published As

Publication number Publication date
US20140257995A1 (en) 2014-09-11
EP2785058A1 (en) 2014-10-01
CN103503463A (zh) 2014-01-08
EP2785058A4 (en) 2014-12-03

Similar Documents

Publication Publication Date Title
WO2012167568A1 (zh) 视频广告播放方法、设备和系统
CN109145784B (zh) 用于处理视频的方法和装置
CN110166827B (zh) 视频片段的确定方法、装置、存储介质及电子装置
US8750602B2 (en) Method and system for personalized advertisement push based on user interest learning
WO2018033154A1 (zh) 手势控制方法、装置和电子设备
CN112533051B (zh) 弹幕信息显示方法、装置、计算机设备和存储介质
CN110232340B (zh) 建立视频分类模型以及视频分类的方法、装置
CN108513139B (zh) 视频直播中的虚拟对象识别方法、装置、存储介质和设备
US20220147735A1 (en) Face-aware person re-identification system
CN110879974B (zh) 一种视频分类方法和装置
CN111836118B (zh) 视频处理方法、装置、服务器及存储介质
CN112132030B (zh) 视频处理方法及装置、存储介质及电子设备
CN110796089A (zh) 用于训练换脸模型的方法和设备
TW202042172A (zh) 智慧教學顧問生成方法、系統、設備及儲存介質
CN111160134A (zh) 一种以人为主体的视频景别分析方法和装置
WO2023045635A1 (zh) 多媒体文件的字幕处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品
CN112102157A (zh) 视频换脸方法、电子设备和计算机可读存储介质
CN113642536A (zh) 数据处理方法、计算机设备以及可读存储介质
KR102460595B1 (ko) 게임 방송에서의 실시간 채팅 서비스 제공 방법 및 장치
JP2016015019A (ja) サービス提供装置、方法、及びプログラム
CN113761281A (zh) 虚拟资源处理方法、装置、介质及电子设备
CN113762056A (zh) 演唱视频识别方法、装置、设备及存储介质
CN113221690A (zh) 视频分类方法及装置
CN112132026A (zh) 动物识别方法及装置
CN112836732B (zh) 数据标注的校验方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11867149

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2011867149

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2011867149

Country of ref document: EP