WO2018166288A1 - 信息呈现方法和装置 - Google Patents

信息呈现方法和装置 Download PDF

Info

Publication number
WO2018166288A1
WO2018166288A1 PCT/CN2018/072285 CN2018072285W WO2018166288A1 WO 2018166288 A1 WO2018166288 A1 WO 2018166288A1 CN 2018072285 W CN2018072285 W CN 2018072285W WO 2018166288 A1 WO2018166288 A1 WO 2018166288A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
presented
information
target item
key frame
Prior art date
Application number
PCT/CN2018/072285
Other languages
English (en)
French (fr)
Inventor
李川
游正朋
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2018166288A1 publication Critical patent/WO2018166288A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2668Creating a channel for a dedicated end-user group, e.g. insertion of targeted commercials based on end-user profiles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering

Definitions

  • the present application relates to the field of computer technologies, and in particular, to the field of video technologies, and in particular, to an information presentation method and apparatus.
  • the personalized advertisement recommendation can effectively reduce the audience's passive acceptance of the established The discomfort of the advertisement. Therefore, it is of great research significance and practical value to analyze the content of various online videos and conduct personalized recommendation of related advertising service information such as online shopping.
  • the purpose of the present application is to propose an improved information presentation method and apparatus to solve the technical problems mentioned in the background section above.
  • the embodiment of the present application provides an information presentation method, the method includes: detecting a key frame in a target video, where the key frame is a frame in which the image entropy in the target video is greater than a preset image entropy threshold; Detecting a key frame, detecting an image of the target item from the key frame; determining whether the number of frames continuously presenting the image of the target item after the key frame is greater than a predetermined number of frames in response to detecting the image of the target item from the key frame; If it is greater than the predetermined number of frames, the to-be-presented information matching the image of the target item is acquired, and the information to be presented is presented in the frame in which the image of the target item is continuously presented.
  • detecting a key frame in the target video includes: acquiring a frame whose image entropy is greater than a preset image entropy threshold as a key frame; and according to a play order of the target video, obtaining an image entropy after the key frame is greater than a preset a first frame of the image entropy threshold; determining whether the similarity between the first frame and the key frame is less than a preset similarity threshold; if less than the preset similarity threshold, determining that the first frame is a key frame.
  • detecting an image of the target item from the key frame comprises: detecting an image of the target item from the key frame based on a pre-trained convolutional neural network, wherein the convolutional neural network is used to identify image characteristics of the target item And determining an image of the target item based on the image characteristics.
  • determining whether the number of frames of the image of the target item continuously presented after the key frame is greater than a predetermined number of frames comprises: determining whether the image of the target item is continuously presented in different frames after the key frame using a compression tracking algorithm If it is continuously presented, the number of frames of the image of the target item is continuously presented, and it is determined whether the number of frames is greater than a predetermined number of frames.
  • presenting the information to be presented in a frame that continuously presents an image of the target item comprises: determining location information of the image of the target item in a frame that continuously presents the image of the target item; determining the information to be presented based on the location information Presenting the location; presenting the information to be presented at the rendering location.
  • acquiring information to be presented that matches an image of the target item includes: acquiring a to-be-presented information set, wherein the to-be-presented information includes a picture; determining a picture and a target in each to-be-presented information in the to-be-presented information set The degree of similarity between the images of the items; at least one piece of information to be presented is selected from the set of information to be presented in descending order of similarity.
  • the information to be presented includes text information; and obtaining information to be presented that matches the image of the target item includes acquiring text information that matches the category of the image of the target item.
  • acquiring the to-be-presented information that matches the image of the target item includes: acquiring a category label of the user viewing the target video through the terminal, wherein the user's category label is obtained by performing big data analysis on the user's behavior data. Obtaining at least one to-be-presented information that matches the category label of the user from the set of information to be presented.
  • an embodiment of the present application provides an information presentation apparatus, including: a key frame detecting unit, configured to detect a key frame in a target video, where the key frame is an image entropy larger than a preset image in the target video.
  • a frame of an entropy threshold configured to detect an image of the target item from the key frame in response to detecting the key frame; and a determining unit configured to determine the key frame in response to detecting the image of the target item from the key frame And then, whether the number of frames of the image of the target item is continuously displayed is greater than a predetermined number of frames; and a rendering unit, configured to acquire information to be presented that matches the image of the target item if the number of frames is greater than a predetermined number of frames, and continuously present the target item The information to be presented is presented in the frame of the image.
  • the key frame detecting unit is further configured to: acquire a frame whose image entropy is greater than a preset image entropy threshold as a key frame; according to a playing order of the target video, an image entropy after acquiring the key frame is greater than a preset image entropy a first frame of the threshold; determining whether the similarity between the first frame and the key frame is less than a preset similarity threshold; if less than the preset similarity threshold, determining that the first frame is a key frame.
  • the image detecting unit is further configured to: detect an image of the target item from the key frame based on the pre-trained convolutional neural network, wherein the convolutional neural network is used to identify image features of the target item and determine according to the image characteristics An image of the target item.
  • the determining unit is further configured to: determine whether the image of the target item is continuously presented in different frames after the key frame using a compression tracking algorithm; if continuously presented, accumulating the number of frames of the image continuously presenting the target item And determine if the number of frames is greater than a predetermined number of frames.
  • the rendering unit is further configured to: determine location information of the image of the target item in a frame that continuously presents the image of the target item; determine a presentation location of the information to be presented based on the location information; present the information to be presented on the presentation location .
  • the rendering unit is further configured to: acquire a to-be-presented information set, where the to-be-presented information includes a picture; determine a similarity between the picture in each of the to-be-presented information in the to-be-presented information set and the image of the target item And selecting at least one piece of information to be presented from the set of information to be presented in descending order of similarity.
  • the information to be presented includes text information; and the rendering unit is further configured to: acquire text information that matches a category of the image of the target item.
  • the presenting unit is further configured to: acquire a category label of the user who views the target video through the terminal, where the category label of the user is obtained by performing big data analysis on the behavior data of the user; Obtain at least one to-be-presented information that matches the user's category tag.
  • an embodiment of the present application provides an apparatus, including: one or more processors; a storage device, configured to store one or more programs, when one or more programs are executed by one or more processors, One or more processors are caused to implement the method of any of the first aspects.
  • the embodiment of the present application provides a computer readable storage medium, where the computer program is stored, and when the program is executed by the processor, the method in any one of the first aspects is implemented.
  • the information presenting method and apparatus present the information to be presented on the frame of the image of the target item continuously by detecting the image of the target item in the key frame in the target video, and the application is based on the content of the target video.
  • Targeted information presentation improves the accuracy of information presentation, thereby reducing costs and increasing user click-through rates.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a flow chart of one embodiment of an information presentation method in accordance with the present application.
  • 3a is a schematic diagram of a process of constructing a compression vector of an information presentation method according to the present application
  • FIG. 3b is a schematic diagram of an information presentation process of an information presentation method according to the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of an information presentation apparatus according to the present application.
  • FIG. 6 is a block diagram of a computer system suitable for use in implementing the apparatus of the embodiments of the present application.
  • FIG. 1 illustrates an exemplary system architecture 100 in which an embodiment of an information presentation method or information presentation apparatus of the present application may be applied.
  • system architecture 100 can include terminal devices 101, 102, 103, network 104, and server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • Network 104 may include various types of connections, such as wired, wireless communication links, fiber optic cables, and the like.
  • the user can interact with the server 105 over the network 104 using the terminal devices 101, 102, 103 to receive or transmit messages and the like.
  • Various terminal applications supporting video files can be installed on the terminal devices 101, 102, and 103, such as a web browser application, a shopping application, a search application, an instant communication tool, social networking software, and the like.
  • the terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting video playback, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic The video specialist compresses the standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV) player, laptop portable computer and desktop computer, and the like.
  • MP3 players Motion Picture Experts Group Audio Layer III, dynamic The video specialist compresses the standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV
  • the server 105 may be a server that provides various services, such as a background video server that provides support for video displayed on the terminal devices 101, 102, 103.
  • the background video server can analyze and process data such as the received video playback request, and feed back the processing result (for example, video data) to the terminal device.
  • the information presentation method provided by the embodiment of the present application is generally performed by the server 105. Accordingly, the information presentation device is generally disposed in the server 105.
  • terminal devices, networks, and servers in Figure 1 is merely illustrative. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
  • the information presentation method includes the following steps:
  • Step 201 Detect key frames in the target video.
  • the electronic device (for example, the server shown in FIG. 1) on which the information presentation method runs may receive a video play request from a terminal that the user performs video playback by using a wired connection manner or a wireless connection manner, according to the video.
  • the play request acquires the target video and detects key frames in the target video.
  • the key frame is a frame in which the image entropy in the target video is greater than a preset image entropy threshold.
  • the image entropy is expressed as the bit average of the set of gray levels of the image, unit bits/pixel, which also describes the average amount of information of the image source.
  • Image entropy is defined as:
  • H is the image entropy and p i is the probability of a pixel with gray scale i in the image.
  • detecting a key frame in the target video includes: acquiring a frame whose image entropy is greater than a preset image entropy threshold as a key frame; and acquiring a key frame according to a play order of the target video The image entropy is greater than the first frame of the preset image entropy threshold; determining whether the similarity between the first frame and the key frame is less than a preset similarity threshold; if less than the preset similarity threshold, determining that the first frame is Keyframe.
  • the target video contains multiple independent scenes, and the key frames of the image containing the target object are extracted in each independent scene, which helps to reduce the repeated detection, thereby reducing the complexity of the algorithm.
  • the present application uses the event information of consecutive frames in the video to detect key frames in the video.
  • the so-called event refers to dividing the video into independent frame units. In each unit, the continuity between frames and frames is strong, the difference of image information is small, and the image difference between different units is large.
  • the similarity of the image is characterized by pixel differences between the images. As shown below:
  • sim is the similarity
  • curFrame, and preFrame are the pixel values of the same pixel in two consecutive frames
  • abs is the absolute value.
  • the obtained first image entropy is greater than the preset image entropy threshold as a key frame
  • the pixel value of any pixel on the key frame is preFrame.
  • the pixel value of another pixel point in the same frame as the pixel point in the frame after the key frame is curFrame. If the value of sim calculated according to formula 2 is less than a preset similarity threshold, the key frame is followed by the key frame. The frame is also determined as a key frame.
  • Step 202 in response to detecting the key frame, detecting an image of the target item from the key frame.
  • images of a plurality of items in a key frame such as images of T-shirts, hats, shoes, drinks, and the like.
  • An image of the target article can be detected from these images for targeted information presentation. Rather than presenting image-related information for all items contained in a keyframe. For example, when it is necessary to present information related to a T-shirt, the T-shirt is used as a target item, and an image of the T-shirt is detected.
  • detecting an image of the target item from the key frame includes: detecting an image of the target item from the key frame based on the pre-trained convolutional neural network, wherein the convolutional neural network is used An image feature of the target item is identified and an image of the target item is determined based on the image feature. Extracting the target item with the convolutional neural network can effectively identify the position of the image of the target item in the key frame and the category information, thereby facilitating subsequent target tracking and item recommendation. For a picture of the input convolutional neural network, the candidate regions are first extracted, 1000 candidate regions are extracted for each picture, and then the image size is normalized for each candidate region, and then the convolutional neural network is used to extract the high-dimensional of the candidate regions.
  • the candidate regions are classified by the fully connected layer.
  • the objectives of the pre-trained network detection of the present application may include clothing such as shoes, tops, shorts, skirts, dresses, and the like. This information is important for subsequent item recommendations.
  • the location information of the target item facilitates the initialization of the location of the subsequent target tracking.
  • CNN Convolutional Neural Networks
  • a convolutional neural network is a feedforward neural network whose artificial neurons respond to a surrounding area of a part of the coverage and perform well for large image processing.
  • the basic structure of the CNN includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to the local acceptance domain of the previous layer, and the local features are extracted. Once the local feature is extracted, its positional relationship with other features is also determined; the second is the computing layer, each computing layer of the network is composed of multiple feature mapping layers, each feature mapping layer is a plane The weights of all neurons on the plane are equal.
  • the feature mapping structure uses a small sigmoid function that affects the function kernel as the activation function of the convolutional network, so that the feature map has displacement invariance. In addition, since the neurons on one mapping surface share weights, the number of network free parameters is reduced.
  • Each feature extraction layer in the convolutional neural network is followed by a computational layer for local averaging and secondary extraction. This unique two-feature extraction structure reduces the feature resolution. Its artificial neurons respond to a surrounding area of a portion of the coverage and perform well for large image processing.
  • Convolutional neural networks form a more abstract high-level representation of attribute categories or features by combining low-level features to discover distributed feature representations of data.
  • the essence of deep learning is to learn more useful features by constructing machine learning models with many hidden layers and massive training data, so as to improve the accuracy of classification or prediction.
  • the convolutional neural network can be used to identify features of the target item in the key frame, wherein the characteristics of the target item can include features such as color, texture, shading, direction change, texture, and the like of the target item.
  • Step 203 in response to detecting an image of the target item from the key frame, determining whether the number of frames of the image in which the target item is continuously presented after the key frame is greater than a predetermined number of frames.
  • multiple tracking algorithms may be employed to track the image of the target item detected in step 202 in successive frames. It is only meaningful to present the information of the target item in a plurality of consecutive frames. Selecting a frame of the target item whose image time exceeds a certain threshold is used for delivery. On the one hand, the user has enough time to click on the information to be presented, such as an advertisement, and on the other hand, the number of information to be presented can be effectively reduced, thereby not affecting the viewing experience of the user. . The user clicks on the information item to enter the webpage of the item corresponding to the information to be presented. Tracking algorithms such as tracking learning and detection (TLD) can be used to track the image of the target item.
  • TLD tracking learning and detection
  • determining whether the number of frames of the image of the target item continuously presented after the key frame is greater than a predetermined number of frames includes: determining whether the image of the target item is continuously presented by using a compression tracking algorithm In a different frame after the key frame; if continuously presented, the number of frames of the image of the target item is continuously presented, and it is determined whether the number of frames is greater than a predetermined number of frames.
  • Compressed tracking is a simple and efficient tracking algorithm based on compressed sensing. Firstly, the multi-scale image features are reduced by random perceptual moments in accordance with the condition of restricted isometry property (RIP), and then the features of the reduced dimension are classified by simple naive Bayesian classifier.
  • the features of the image are extracted first, and then classified by the classifier.
  • the difference is that the feature extraction adopts compressed sensing, and the classifier adopts naive Bayes.
  • the classifier is then updated by online learning per frame.
  • the compression tracking algorithm flow is as follows:
  • Figure 3a shows an n ⁇ m sparse matrix that transforms the x (m-dimensional) of a high-dimensional image space into a low-dimensional space v (n-dimensional).
  • the arrow indicates an element of a non-zero element perceptual x of a row of the measurement matrix R, equivalent to a square window filter and a gray-scale convolution of a fixed position of the input image.
  • the construction process of the classifier is as follows: for each sample z (m-dimensional vector), its low-dimensional representation is v (n-dimensional vector, n is much smaller than m). Assume that the elements in v are independently distributed. It can be modeled by the Naive Bayes classifier.
  • H(v) is a classifier
  • y ⁇ 0,1 ⁇ represents a sample label
  • y 1) and p(v i
  • the model needs to be updated continuously, that is, the mean and variance of the positive and negative samples are recalculated based on the newly detected samples, and the update method is as follows:
  • Step 204 If it is greater than a predetermined number of frames, acquire information to be presented that matches the image of the target item, and present the information to be presented in a frame that continuously presents the image of the target item.
  • the type of the target item, the trajectory, the number of frames appearing, the duration, and the like can be extracted from the target video.
  • This information will help to implement personalized recommendations for user information. Matching the to-be-presented information from the preset information to be presented, and combining the frame to be presented with the image of the image of the target item into a new frame by modifying the frame data or superimposing to present the new frame in the newly generated frame.
  • Present information may be text or pictures linked to the web page. As shown in FIG.
  • the target item "T-shirt” 304 is detected in the key frame in the target video, and the picture 305 associated with the "T-shirt” that can be linked to the web page is matched from the preset information to be presented. And rendered in keyframes.
  • the user can enter the relevant webpage to browse the information associated with the "T-shirt”.
  • the target item "shoes” 306 is detected in the key frames in the target video, and the pictures 307 associated with the "shoes” that can be linked to the web pages are matched from the preset information to be presented and presented in the key frames. After clicking the picture 307, the user can enter the relevant webpage to browse the information associated with the "shoes”.
  • presenting the information to be presented in a frame that continuously presents an image of the target item including: determining location information of the image of the target item in a frame that continuously presents the image of the target item; The location information determines a presentation location of the information to be presented; the information to be presented is presented at the presentation location.
  • the presentation position of the information to be presented may be in the vicinity of the image of the target item, or may be in another position that does not obscure the image of the target item.
  • the presentation position of the information to be presented may be determined according to the size of the image of the target item.
  • the target item is a pair of shoes and the information to be presented is a shoe advertisement, which occupies a position larger than the shoe image itself, it is not suitable for the shoes.
  • the image is advertised, and an ad should be placed next to the shoe image.
  • the target item is a wardrobe, since the size of the wardrobe image is relatively large, it is more suitable to superimpose the information to be presented directly on the wardrobe image.
  • the method provided in the above embodiment of the present application achieves targeted information presentation by associating the content of the target video with the information to be presented, and improves the hit rate of the information to be presented.
  • the flow 400 of the information presentation method includes the following steps:
  • Step 401 Detect key frames in the target video.
  • Step 402 in response to detecting the key frame, detecting an image of the target item from the key frame.
  • Step 403 in response to detecting an image of the target item from the key frame, determining whether the number of frames of the image in which the target item is continuously presented after the key frame is greater than a predetermined number of frames.
  • Steps 401-403 are substantially the same as steps 201-203, and therefore are not described again.
  • Step 404 If the number of frames is greater than a predetermined number of frames, obtain a set of information to be presented.
  • the information to be presented having a higher degree of similarity with the target item image is matched from the preset information to be presented.
  • the to-be-presented information may include a picture.
  • Step 405 Determine a similarity between the picture in each of the information to be presented in the information set to be presented and the image of the target item.
  • the similarity between the histogram of the picture and the histogram of the image of the target item may be determined.
  • the histogram data is generated for the pixel data of the image of the target object and the image to be presented, the histogram data of the respective images is normalized, and the histogram data is calculated by using the Bhattachary coefficient algorithm.
  • the image similarity value is in the range of [0, 1], with 0 being extremely different and 1 being extremely similar (same).
  • the text information matching the category of the image of the target item is acquired.
  • the category is determined based on the keyword in the text information, and the category of the image of the target item is matched to obtain the similarity.
  • the text information is "XX shoes priced at 299 yuan”
  • the similarity between the text information and the target object "sneakers” can reach 90%
  • the similarity can reach 70%
  • the similarity between the image of the target item "sneakers” and the text information "XX basketball price of 299 yuan” may be only 10%.
  • Step 406 Select at least one piece of information to be presented from the set of information to be presented in descending order of similarity.
  • At least one piece of information to be presented is selected based on the similarity determined in step 405.
  • the number of selected information to be presented may be proportional to the size of the image of the target item. For example, an image with a larger area can display a few more information to be presented. It is better to display only one image to be presented in a smaller area to avoid being overwhelmed.
  • acquiring the to-be-presented information that matches the image of the target item includes: acquiring a category label of the user who views the target video through the terminal, where the user's category label is through the user
  • the behavior data is obtained by performing big data analysis; and at least one to-be-presented information matching the category label of the user is obtained from the information set to be presented. That is, the presence information is further filtered based on the personal characteristics of the user, and the information to be presented is selected in a targeted manner for the user. For example, through big data analysis, it can be determined that the user viewing the target video is a female, and the information related to the female product can be selected as the information to be presented.
  • the information recommendation model to be presented can be effectively predicted by establishing a user, the information to be presented, and the image combination of the target item, so that the click rate (ctr, Click-Through-Rate) of the information to be presented can be effectively predicted, and the highest estimated click rate is to be presented.
  • the features of the recommendation model mainly include three characteristics of the user feature, the feature of the item to be presented by the information, and the image of the target item detected from the target video.
  • the user's characteristics mainly include the user's age, gender, region, occupation, platform and other information that can be obtained through the user's big data portrait.
  • the characteristics of the item to be presented with the information mainly include the type of the target item, the price, the origin of the item (or the location of the seller), and the overall click rate of the information to be presented.
  • the features of the image of the target item mainly include the similarity between the image of the target item detected in the target video and the item involved in the information to be presented, and the length of time in which the image of the target item in the target video appears.
  • the processing of the features of the items involved in presenting the information mainly includes discretization and feature crossing.
  • the features of the information recommendation model to be presented mainly include the three categories discussed above.
  • the initial features include discrete features (such as user gender, user region, etc.) and continuous features (such as item price, user age, image of the target item and to be presented).
  • discrete features such as user gender, user region, etc.
  • continuous features such as item price, user age, image of the target item and to be presented.
  • the similarity of the items involved in the information, the click rate of the information to be presented, etc. are both continuous values, but their meanings are different, the comparison of age and size does not make sense to present information recommendations, and the size of the click rate is meaningful, so it is necessary to discretize the above features.
  • the processed feature can be stretched into a vector as the final feature. But this approach is a linear model, ignoring the interaction between features. For example, the combination of gender and item type has a direct impact on the information click rate. Therefore, the intersection of features can effectively improve the accuracy of model prediction.
  • the method of feature intersection is to combine two features into new continuous features, such as gender and item category (m class) combination to produce 2m discrete features.
  • the discrete feature vector formed by the present application be x and the dimension of the feature be 113.
  • X1 ⁇ x10 are the user age feature segments; x11 ⁇ x18 are user geographic feature segments; x19 ⁇ x25 are user occupation feature segments; x26 ⁇ x30 are user viewing video platform feature segments; x31 ⁇ x38 are item category feature segments; x39 ⁇ X50 is the item price characteristic segment; x51 ⁇ x58 is the item geographical feature segment; x59 ⁇ x60 is the item click rate characteristic segment; x61 ⁇ x65 is the detection target appearance duration feature segment; x66 ⁇ x75 is the detection target and the advertisement item similarity feature segment ; x76 ⁇ x91 is the item category / user gender combination feature segment; x92 ⁇ x113 is the user gender / item price combination feature segment.
  • Logistic Regression is an algorithm widely used in advertising recommendations.
  • D (x 1 , y 1 ), (x 2 , y 2 )...(x N , y N ), where To build a feature, y i ads are clicked, 1 is a click, and -1 is a click.
  • g( ⁇ T x) is the sigmoid function mentioned
  • x is the eigenvector
  • is the parameter vector
  • the corresponding decision function is:
  • the parameters in the model are solved next.
  • the maximum likelihood estimation is used, that is, a set of parameters is found such that the likelihood (probability) of the data under this set of parameters is larger.
  • the likelihood L( ⁇ ) can be expressed as:
  • the optimal parameters can be obtained by maximizing the above likelihood function.
  • the gradient descent is used to solve the parameters, and the optimal value is approximated by adjusting the value of the parameter in one direction in which the objective function changes the fastest at each step.
  • a recommendation system that recommends information to be presented is obtained.
  • the predetermined number of to-be-presented information retrieved from the information to be presented is calculated to perform a click rate prediction, and the to-be-presented information with the highest estimated click rate is selected for presentation.
  • the flow 400 of the information presentation method in the present embodiment highlights the step of selecting the presentation information as compared to the embodiment corresponding to FIG. Therefore, the information to be presented can be accurately selected, the hit rate of the information to be presented is extracted, the effective to-be-presented information is presented as much as possible, and the cost of placing the information to be presented is reduced.
  • the present application provides an embodiment of an information presentation apparatus, and the apparatus embodiment corresponds to the method embodiment shown in FIG. Used in a variety of electronic devices.
  • the information presentation apparatus 500 of the present embodiment includes a key frame detecting unit 501, an image detecting unit 502, a determining unit 503, and a presenting unit 504.
  • the key frame detecting unit 501 is configured to detect a key frame in the target video, where the key frame is a frame in which the image entropy in the target video is greater than a preset image entropy threshold; and the image detecting unit 502 is configured to respond to the detection of the key frame.
  • the determining unit 503 is configured to determine, according to the image of the target item detected from the key frame, whether the number of frames continuously presenting the image of the target item after the key frame is greater than a predetermined number of frames;
  • the presentation unit 504 is configured to acquire the to-be-presented information that matches the image of the target item if greater than the predetermined number of frames, and present the information to be presented in the frame in which the image of the target item is continuously presented.
  • the specific processing of the key frame detecting unit 501, the image detecting unit 502, the determining unit 503, and the presenting unit 504 of the information presenting apparatus 500 may refer to step 201, step 202, and step 203 in the corresponding embodiment of FIG. 2, Step 204.
  • the key frame detecting unit 501 is further configured to: acquire a frame whose image entropy is greater than a preset image entropy threshold as a key frame; and obtain a key frame according to a play order of the target video.
  • the first frame of the image entropy is greater than the preset image entropy threshold; determining whether the similarity between the first frame and the key frame is less than a preset similarity threshold; if less than the preset similarity threshold, determining that the first frame is the key frame.
  • the image detecting unit 502 is further configured to: detect an image of the target item from the key frame based on the pre-trained convolutional neural network, where the convolutional neural network is used to identify the target item. The image features and determine an image of the target item based on the image characteristics.
  • the determining unit 503 is further configured to: determine, by using a compression tracking algorithm, whether the image of the target item is continuously presented in different frames after the key frame; if continuously presented, the cumulative continuous presentation The number of frames of the image of the target item, and determines whether the number of frames is greater than a predetermined number of frames.
  • the presenting unit 504 is further configured to: determine location information of the image of the target item in a frame that continuously presents the image of the target item; determine a presentation position of the information to be presented according to the location information; The information to be presented is presented at the presentation location.
  • the presenting unit 504 is further configured to: obtain a to-be-presented information set, where the to-be-presented information includes a picture; and determine a picture and a target in each to-be-presented information in the to-be-presented information set.
  • the degree of similarity between the images of the items; at least one piece of information to be presented is selected from the set of information to be presented in descending order of similarity.
  • the to-be-presented information includes text information; and the presentation unit 504 is further configured to: acquire text information that matches a category of the image of the target item.
  • the presenting unit 504 is further configured to: obtain a category label of a user who views the target video through the terminal, where the user's category label is obtained by performing big data analysis on the behavior data of the user. Obtaining at least one to-be-presented information that matches the category label of the user from the set of information to be presented.
  • FIG. 6 a block diagram of a computer system 600 suitable for use in implementing the apparatus of the embodiments of the present application is shown.
  • the device shown in FIG. 6 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present application.
  • computer system 600 includes a central processing unit (CPU) 601 that can be loaded into a program in random access memory (RAM) 603 according to a program stored in read only memory (ROM) 602 or from storage portion 608. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM random access memory
  • various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • the central processing unit (CPU) 601 the above-described functions defined in the method of the present application are performed.
  • the computer readable medium described herein may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
  • the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
  • a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the logic functions for implementing the specified.
  • Executable instructions can also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present application may be implemented by software or by hardware.
  • the described unit may also be provided in the processor, for example, as a processor including a key frame detecting unit, an image detecting unit, a determining unit, and a presenting unit.
  • the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the key frame detecting unit may also be described as “a unit that detects key frames in the target video”.
  • the present application also provides a computer readable medium, which may be included in the apparatus described in the above embodiments, or may be separately present and not incorporated into the apparatus.
  • the computer readable medium carries one or more programs, when the one or more programs are executed by the device, causing the device to: detect key frames in the target video, wherein the key frame is the image entropy in the target video is greater than the pre- a frame of an image entropy threshold; detecting an image of the target item from the key frame in response to detecting the key frame; determining to continuously present the image of the target item after the key frame in response to detecting the image of the target item from the key frame Whether the number of frames is greater than a predetermined number of frames; if greater than the predetermined number of frames, the information to be presented that matches the image of the target item is acquired, and the information to be presented is presented in a frame in which the image of the target item is continuously presented.

Abstract

本申请公开了信息呈现方法和装置。该方法的一具体实施方式包括:检测目标视频中的关键帧,其中,关键帧为目标视频中图像熵大于预设的图像熵阈值的帧;响应于检测到关键帧,从关键帧中检测目标物品的图像;响应于从关键帧中检测到目标物品的图像,确定在关键帧之后连续呈现目标物品的图像的帧的数目是否大于预定的帧数;若大于预定的帧数,则获取与目标物品的图像匹配的待呈现信息,并在连续呈现目标物品的图像的帧中呈现待呈现信息。该实施方式能够对目标视频中的目标物品针对性地呈现待呈现信息,提高信息推送的准确率。

Description

信息呈现方法和装置
相关申请的交叉引用
本申请要求于2017年3月15日提交的中国专利申请号为“201710152564.0”的优先权,其全部内容作为整体并入本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及视频技术领域,尤其涉及信息呈现方法和装置。
背景技术
随着互联网的快速普及和数字影像采集处理技术的发展,网络视频行业迅速崛起,并在人们的日常生活中起着越来越重要的作用。作为一种包含图像、声音、文字等多信息的综合性媒体,视频具有强大的信息承载和传播能力,因此视频的语义分析和理解早已成为多媒体信息处理领域的一个重要研究方向。另一方面,随着电子商务平台快速成长,网络购物逐渐成为人们最长选用的购物方式,这为网络视频行业与电子商务的有机结合带来了商机。
分析视频内容并将其与用户个性化信息相结合,形成个性化的广告推荐系统有助于提升广告的点击率及转化率,另一方面个性化的广告推荐能有效降低观众只能被动接受既定的广告的不适感。因此,针对各种网络视频的内容分析并进行网络购物等相关广告服务信息的个性化推荐具有重要的研究意义和实用价值。
发明内容
本申请的目的在于提出一种改进的信息呈现方法和装置,来解决以上背景技术部分提到的技术问题。
第一方面,本申请实施例提供了一种信息呈现方法,该方法包括: 检测目标视频中的关键帧,其中,关键帧为目标视频中图像熵大于预设的图像熵阈值的帧;响应于检测到关键帧,从关键帧中检测目标物品的图像;响应于从关键帧中检测到目标物品的图像,确定在关键帧之后连续呈现目标物品的图像的帧的数目是否大于预定的帧数;若大于预定的帧数,则获取与目标物品的图像匹配的待呈现信息,并在连续呈现目标物品的图像的帧中呈现待呈现信息。
在一些实施例中,检测目标视频中的关键帧,包括:获取图像熵大于预设的图像熵阈值的帧作为关键帧;按照目标视频的播放顺序,获取关键帧之后的图像熵大于预设的图像熵阈值的第一帧;确定第一帧与关键帧的相似度是否小于预设的相似度阈值;若小于预设的相似度阈值,则确定出第一帧为关键帧。
在一些实施例中,从关键帧中检测目标物品的图像,包括:基于预先训练的卷积神经网络从关键帧中检测目标物品的图像,其中,卷积神经网络用于识别目标物品的图像特征并根据图像特征确定目标物品的图像。
在一些实施例中,确定在关键帧之后连续呈现目标物品的图像的帧的数目是否大于预定的帧数,包括:采用压缩跟踪算法确定目标物品的图像是否连续呈现在关键帧之后的不同的帧中;若连续呈现,则累计连续呈现目标物品的图像的帧的数目,并确定帧的数目是否大于预定的帧数。
在一些实施例中,在连续呈现目标物品的图像的帧中呈现待呈现信息,包括:确定目标物品的图像在连续呈现目标物品的图像的帧中的位置信息;根据位置信息确定待呈现信息的呈现位置;在呈现位置上呈现待呈现信息。
在一些实施例中,获取与目标物品的图像匹配的待呈现信息,包括:获取待呈现信息集合,其中,待呈现信息包括图片;确定待呈现信息集合中每条待呈现信息中的图片与目标物品的图像之间的相似度;按照相似度由大到小的顺序从待呈现信息集合中选取至少一条待呈现信息。
在一些实施例中,待呈现信息包括文字信息;以及获取与目标物 品的图像匹配的待呈现信息,包括:获取与目标物品的图像的类别匹配的文字信息。
在一些实施例中,获取与目标物品的图像匹配的待呈现信息,包括:获取通过终端观看目标视频的用户的类别标签,其中,用户的类别标签是通过对用户的行为数据进行大数据分析得到的;从待呈现信息集合中获取与用户的类别标签匹配的至少一条待呈现信息。
第二方面,本申请实施例提供了一种信息呈现装置,该装置包括:关键帧检测单元,用于检测目标视频中的关键帧,其中,关键帧为目标视频中图像熵大于预设的图像熵阈值的帧;图像检测单元,用于响应于检测到关键帧,从关键帧中检测目标物品的图像;确定单元,用于响应于从关键帧中检测到目标物品的图像,确定在关键帧之后连续呈现目标物品的图像的帧的数目是否大于预定的帧数;呈现单元,用于若大于预定的帧数,则获取与目标物品的图像匹配的待呈现信息,并在连续呈现目标物品的图像的帧中呈现待呈现信息。
在一些实施例中,关键帧检测单元进一步用于:获取图像熵大于预设的图像熵阈值的帧作为关键帧;按照目标视频的播放顺序,获取关键帧之后的图像熵大于预设的图像熵阈值的第一帧;确定第一帧与关键帧的相似度是否小于预设的相似度阈值;若小于预设的相似度阈值,则确定出第一帧为关键帧。
在一些实施例中,图像检测单元进一步用于:基于预先训练的卷积神经网络从关键帧中检测目标物品的图像,其中,卷积神经网络用于识别目标物品的图像特征并根据图像特征确定目标物品的图像。
在一些实施例中,确定单元进一步用于:采用压缩跟踪算法确定目标物品的图像是否连续呈现在关键帧之后的不同的帧中;若连续呈现,则累计连续呈现目标物品的图像的帧的数目,并确定帧的数目是否大于预定的帧数。
在一些实施例中,呈现单元进一步用于:确定目标物品的图像在连续呈现目标物品的图像的帧中的位置信息;根据位置信息确定待呈现信息的呈现位置;在呈现位置上呈现待呈现信息。
在一些实施例中,呈现单元进一步用于:获取待呈现信息集合, 其中,待呈现信息包括图片;确定待呈现信息集合中每条待呈现信息中的图片与目标物品的图像之间的相似度;按照相似度由大到小的顺序从待呈现信息集合中选取至少一条待呈现信息。
在一些实施例中,待呈现信息包括文字信息;以及呈现单元进一步用于:获取与目标物品的图像的类别匹配的文字信息。
在一些实施例中,呈现单元进一步用于:获取通过终端观看目标视频的用户的类别标签,其中,用户的类别标签是通过对用户的行为数据进行大数据分析得到的;从待呈现信息集合中获取与用户的类别标签匹配的至少一条待呈现信息。
第三方面,本申请实施例提供了一种设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面中任一实施例中的方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面中任一实施例中的方法。
本申请实施例提供的信息呈现方法和装置,通过检测目标视频中的关键帧中的目标物品的图像,在连续呈现目标物品的图像的帧上呈现待呈现信息,本申请基于目标视频的内容进行针对性的信息呈现,提高了信息呈现的精准度,从而降低成本并提高用户的点击率。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的信息呈现方法的一个实施例的流程图;
图3a是根据本申请的信息呈现方法的压缩向量的构建过程的示意图;
图3b是根据本申请的信息呈现方法的信息呈现过程的示意图;
图4是根据本申请的信息呈现方法的又一个实施例的流程图;
图5是根据本申请的信息呈现装置的一个实施例的结构示意图;
图6是适于用来实现本申请实施例的设备的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的信息呈现方法或信息呈现装置的实施例的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种支持播放视频文件的客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、社交平台软件等。
终端设备101、102、103可以是具有显示屏并且支持视频播放的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器,例如对终端设备101、 102、103上显示的视频提供支持的后台视频服务器。后台视频服务器可以对接收到的视频播放请求等数据进行分析等处理,并将处理结果(例如视频数据)反馈给终端设备。
需要说明的是,本申请实施例所提供的信息呈现方法一般由服务器105执行,相应地,信息呈现装置一般设置于服务器105中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本申请的信息呈现方法的一个实施例的流程200。该信息呈现方法,包括以下步骤:
步骤201,检测目标视频中的关键帧。
在本实施例中,信息呈现方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式从用户利用其进行视频播放的终端接收视频播放请求,根据视频播放请求获取目标视频,并检测目标视频中的关键帧。其中,关键帧为该目标视频中图像熵大于预设的图像熵阈值的帧。图像熵表示为图像灰度级集合的比特平均数,单位比特/像素,也描述了图像信源的平均信息量。图像熵的定义为:
Figure PCTCN2018072285-appb-000001
其中H是图像熵,p i是图像中灰度为i的像素的概率。获取目标视频中图像熵大于预设的图像熵阈值的帧,可以去除视频中的空白帧,进一步降低算法的的复杂度。
在本实施例的一些可选的实现方式中,检测目标视频中的关键帧,包括:获取图像熵大于预设的图像熵阈值的帧作为关键帧;按照目标视频的播放顺序,获取关键帧之后的图像熵大于预设的图像熵阈值的第一帧;确定第一帧与关键帧的相似度是否小于预设的相似度阈值;若小于预设的相似度阈值,则确定出第一帧为关键帧。一般情况下,目标视频中包含多个独立的场景,在每个独立的场景中提取出包含目标物品的图像的关键帧,有助于减少重复检测,从而降低算法的复杂度。本申请利用视频中连续帧的事件信息,检出视频中的关键帧。所谓事件是指将视频分为独立的帧单元,在每个单元中帧与帧之间连续性较强,图像信息差异较小,而不同单元之间的图像差异度较大。图 像的相似度采用图像之间像素差值刻画。如下式所示:
sim=-abs(curFrame-preFrame)       (公式2)
其中sim是相似度,curFrame,preFrame分别为同一像素点在连续两帧图像中的像素值,abs为绝对值。按照视频的播放顺序,获取到的第一个图像熵大于预设的图像熵阈值的帧作为关键帧,该关键帧上任一像素点的像素值为preFrame。该关键帧之后的帧中与该像素点处在相同位置的另一像素点的像素值为curFrame,如果根据公式2计算得到的sim的值小于预设的相似度阈值,则将该关键帧之后的帧也确定为关键帧。
步骤202,响应于检测到关键帧,从关键帧中检测目标物品的图像。
在本实施例中,关键帧中可能存在多个物品的图像,例如,T恤、帽子、鞋、饮料等图像。可以从这些图像中检测出目标物品的图像,进行针对性地信息呈现。而不是呈现关键帧中包含的所有物品的图像相关信息。例如,需要呈现与T恤相关的信息时,将T恤作为目标物品,检测T恤的图像。
在本实施例的一些可选的实现方式中,从关键帧中检测目标物品的图像,包括:基于预先训练的卷积神经网络从关键帧中检测目标物品的图像,其中,卷积神经网络用于识别目标物品的图像特征并根据图像特征确定目标物品的图像。用卷积神经网提取目标物品,能够有效识别目标物品的图像在关键帧中的位置以及类别信息,从而便于后续目标追踪以及物品推荐。对于一张输入卷积神经网络的图片,首先提取候选区域,每张图片提取1000个候选区域,然后对每个候选区域进行图片大小归一化,然后采用卷积神经网提取候选区域的高维特征,最后通过全连接层,对候选区域进行分类。通过对每个区域进行分类,从而提取关键帧上的目标物品的图像,还可以确定其位置。本申请的预先训练出的网络检测的目标可以包括衣物类,如鞋子,上衣,短裤,短裙,连衣裙等。这些信息对于后续的物品推荐有重要意义。目标物品的位置信息便于后续目标跟踪的位置初始化。
卷积神经网络(Convolutional Neural Networks,CNN)是一种人工神经网络。卷积神经网络是一种前馈神经网络,它的人工神经元可 以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现。一般地,CNN的基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部的特征。一旦该局部特征被提取后,它与其它特征间的位置关系也随之确定下来;其二是计算层,网络的每个计算层由多个特征映射层组成,每个特征映射层是一个平面,平面上所有神经元的权值相等。特征映射结构采用影响函数核小的sigmoid函数作为卷积网络的激活函数,使得特征映射具有位移不变性。此外,由于一个映射面上的神经元共享权值,因而减少了网络自由参数的个数。卷积神经网络中的每一个特征提取层都紧跟着一个用来求局部平均与二次提取的计算层,这种特有的两次特征提取结构减小了特征分辨率。它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现。卷积神经网络通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。深度学习的实质,是通过构建具有很多隐层的机器学习模型和海量的训练数据,来学习更有用的特征,从而融合后提升分类或预测的准确性。该卷积神经网络可用于识别关键帧中的目标物品的特征,其中,该目标物品的特征可包括目标物品的颜色、纹理、阴影、方向变化、质地等特征。
步骤203,响应于从关键帧中检测到目标物品的图像,确定在关键帧之后连续呈现目标物品的图像的帧的数目是否大于预定的帧数。
在本实施例中,可采用多种跟踪算法在连续帧中跟踪步骤202中检测到的目标物品的图像。只有在连续多个帧中都出现了目标物品的图像,再呈现待呈现信息才有意义。选取目标物品的图像存在时间超过一定阈值的帧进行投放,一方面用户有足够的时间去点击待呈现信息,例如广告,一方面也可以有效降低待呈现信息数量,从而不影响用户的观影体验。用户点击信息条目即可进入待呈现信息对应的物品网页。可采用诸如跟踪学习和检测(TLD,tracking learning and detection)等跟踪算法来进行目标物品的图像的跟踪。
在本实施例的一些可选的实现方式中,确定在关键帧之后连续呈现目标物品的图像的帧的数目是否大于预定的帧数,包括:采用压缩 跟踪算法确定目标物品的图像是否连续呈现在关键帧之后的不同的帧中;若连续呈现,则累计连续呈现目标物品的图像的帧的数目,并确定帧的数目是否大于预定的帧数。压缩跟踪是一种简单高效地基于压缩感知的跟踪算法。首先利用符合压缩感知(restricted isometry property,RIP)条件的随机感知矩对多尺度图像特征进行降维,然后在降维后的特征上采用简单的朴素贝叶斯分类器进行分类。和一般的模式分类架构一样:先提取图像的特征,再通过分类器对其分类,不同在于这里特征提取采用压缩感知,分类器采用朴素贝叶斯。然后每帧通过在线学习更新分类器。
压缩跟踪算法流程如下:
(1)在t帧的时候,我们采样得到若干张目标(正样本)和背景(负样本)的图像片,然后对他们进行多尺度变换,再通过一个稀疏测量矩阵对多尺度图像特征进行降维,然后通过降维后的特征(包括目标和背景,属二分类问题)去训练朴素贝叶斯分类器。
(2)在t+1帧的时候,我们在上一帧跟踪到的目标位置的周围采样n个扫描窗口(避免去扫描整幅图像),通过同样的稀疏测量矩阵对其降维,提取特征,然后用第t帧训练好的朴素贝叶斯分类器进行分类,分类分数最大的窗口就认为是目标窗口。这样就实现了从t帧到t+1帧的目标跟踪。
压缩向量的构建过程如图3a所示,图3a表明一个n×m的稀疏矩阵,它可以将一个高维图像空间的x(m维)变换到一个低维的空间v(n维),数学表达就是:v=Rx,其中,矩阵R中,301、303和302分别代表矩阵元素为负数、正数和零。箭头表示测量矩阵R的一行的一个非零元素感知x中的一个元素,等价于一个方形窗口滤波器和输入图像某一固定位置的灰度卷积。
通过采用上面的稀疏随机矩阵R将x投影到低维空间的v。这个随机矩阵R只需要在程序启动时计算一次,然后在跟踪过程中保持不变。通过积分图,我们可以高效的计算v。
分类器的构建过程如下所示:对每个样本z(m维向量),它的低维表示是v(n维向量,n远小于m)。假定v中的各元素是独立分布 的。可以通过朴素贝叶斯分类器来建模。
Figure PCTCN2018072285-appb-000002
其中,H(v)是分类器,y∈{0,1}代表样本标签,y=0表示负样本,y=1表示正样本,假设两个类的先验概率相等,p(y=1)=p(y=0)=0.5。假定在分类器H(v)中的条件概率p(v i|y=1)和p(v i|y=0)也属于高斯分布,其均值和方差分别为
Figure PCTCN2018072285-appb-000003
为适应长时跟踪,需要不断更新模型,即根据新检测到的样本去重新计算正负样本的均值和方差,其更新方式如下:
Figure PCTCN2018072285-appb-000004
Figure PCTCN2018072285-appb-000005
公式4和公式5中λ>0是学习因子,在实际应用中为避免误差的累积,本申请取λ=0.85。
步骤204,若大于预定的帧数,则获取与目标物品的图像匹配的待呈现信息,并在连续呈现目标物品的图像的帧中呈现待呈现信息。
在本实施例中,基于步骤202的目标物品图像的检测和步骤203的目标物品图像的跟踪步骤,可以从目标视频中提取到目标物品的种类、轨迹、出现的帧数以及时长等。这些信息将有助于实现针对用户信息的个性化推荐。从预设的待呈现信息库里匹配出待呈现信息,通过修改帧数据或者叠加的方式将待呈现信息与呈现目标物品的图像的帧组合成新的帧,以在新生成的帧中呈现待呈现信息。该待呈现信息可以是链接到网页上的文字或图片。如图3b所示,在目标视频中的关键帧中检测出目标物品“T恤”304,从预设的待呈现信息库里匹配出与“T恤”相关联的能够链接到网页的图片305并在关键帧中呈现。用户点击图片305后,可进入相关网页浏览与“T恤”相关联的信息。在目标视频中的关键帧中检测出目标物品“鞋”306,从预设的待呈现信息库里匹配出与“鞋”相关联的能够链接到网页的图片307并在关键帧中呈现。用户点击图片307后,可进入相关网页浏览与“鞋”相关联的信息。
在本实施例的一些可选的实现方式中,在连续呈现目标物品的图像的帧中呈现待呈现信息,包括:确定目标物品的图像在连续呈现目 标物品的图像的帧中的位置信息;根据位置信息确定待呈现信息的呈现位置;在呈现位置上呈现待呈现信息。待呈现信息的呈现位置可以在目标物品的图像附近,也可以在其它不遮挡目标物品的图像的位置。可根据目标物品的图像的大小确定待呈现信息的呈现位置,例如,如果目标物品是一双鞋子,而待呈现信息是鞋子广告,其占据的位置比鞋子图像本身还要大,则不适宜在鞋子的图像上贴广告,而应该在鞋子图像旁边加广告。如果目标物品是个衣柜,由于衣柜图像的尺寸比较大,因此比较适合在衣柜图像上直接叠加待呈现信息。
本申请的上述实施例提供的方法通过将目标视频的内容和待呈现信息相关联,实现了富于针对性的信息呈现,提高了待呈现信息的命中率。
进一步参考图4,其示出了信息呈现方法的又一个实施例的流程400。该信息呈现方法的流程400,包括以下步骤:
步骤401,检测目标视频中的关键帧。
步骤402,响应于检测到关键帧,从关键帧中检测目标物品的图像。
步骤403,响应于从关键帧中检测到目标物品的图像,确定在关键帧之后连续呈现目标物品的图像的帧的数目是否大于预定的帧数。
步骤401-403与步骤201-203基本相同,因此不再赘述。
步骤404,若大于预定的帧数,则获取待呈现信息集合。
在本实施例中,当步骤403中确定的帧数大于预定的帧数时,从预设的待呈现信息库里匹配出和目标物品图像相似度较高的待呈现信息。该待呈现信息可以包括图片。
步骤405,确定待呈现信息集合中每条待呈现信息中的图片与目标物品的图像之间的相似度。
在本实施例中,如果该待呈现信息中包括图片,则可以确定图片的直方图与目标物品的图像的直方图之间的相似度。首先对目标物品图像与待呈现信息的图片的像素数据,生成各自直方图数据,对各自图像直方图数据进行归一化再使用巴氏系数(Bhattacharyya coefficient)算法对直方图数据进行计算,最终得出图像相似度值,其值范围在[0, 1]之间,0表示极其不同,1表示极其相似(相同)。
在本实施例的一些可选的实现方式中,如果该待呈现信息包括文字信息,则获取与目标物品的图像的类别匹配的文字信息。根据文字信息中的关键词确定类别,与目标物品的图像的类别进行匹配,得到相似度。例如,文字信息为“XX球鞋售价299元”,该文字信息与目标物品“球鞋”的图像的相似度可以达到90%,目标物品“球鞋”的图像与文字信息“XX皮鞋售价299元”的相似度可以达到70%,目标物品“球鞋”的图像与文字信息“XX篮球售价299元”的相似度可能仅为10%。
步骤406,按照相似度由大到小的顺序从待呈现信息集合中选取至少一条待呈现信息。
在本实施例中,基于步骤405确定的相似度选取至少一条待呈现信息。所选取的待呈现信息的数目可以与目标物品的图像的面积大小成正比。例如,面积比较大的图像可以多显示几条待呈现信息。面积比较小的图像最好只显示一条待呈现信息,以避免喧宾夺主。
在本实施例的一些可选的实现方式中,获取与目标物品的图像匹配的待呈现信息,包括:获取通过终端观看目标视频的用户的类别标签,其中,用户的类别标签是通过对用户的行为数据进行大数据分析得到的;从待呈现信息集合中获取与用户的类别标签匹配的至少一条待呈现信息。即,基于用户的个人特征对待呈现信息进行进一步筛选,对用户针对性地选取待呈现信息。例如,通过大数据分析可以确定观看目标视频的用户为女性,则可选取女性用品相关信息作为待呈现信息。
可以通过建立一个用户、待呈现信息、目标物品的图像组合的待呈现信息推荐模型,可以有效预测待呈现信息的点击率(ctr,Click-Through-Rate),推送预估点击率最高的待呈现信息,从而提升待呈现信息投放的转化率。该推荐模型的特征主要包括用户特征、待呈现信息所涉及的物品的特征以及从目标视频检测出的目标物品的图像的特征三种。用户的特征主要包括用户的年龄、性别、地域、职业、平台等可通过用户大数据画像得到的信息。待呈现信息涉及的物品的 特征主要包括目标物品的种类、价格、物品产地(或卖家所在地)、待呈现信息总体点击率。在目标物品的图像的特征主要包括在目标视频中检测到的目标物品的图像与待呈现信息涉及的物品的相似度以及目标视频中目标物品的图像出现的时长。
对待呈现信息涉及的物品的特征的处理主要包括离散化和特征交叉两种。
(一)离散化
待呈现信息推荐模型的特征主要包括以上论述的三个种类,初始的特征中包括离散特征(如用户性别、用户地域等)和连续特征(如物品价格、用户年龄、目标物品的图像与待呈现信息涉及的物品的相似度、待呈现信息的点击率等)。其中点击率与年龄虽然都是连续数值,但其本身意义不同,年龄大小的比较对待呈现信息推荐没有意义,而点击率的大小则是有意义的,因此需要对上述特征做离散化的处理。
离散化特征的处理方式如下:将连续特征做分段处理。如点击率ctr分为10段,如果ctr=0.05,则对应特征位置1。其他类型的特征处理类似。
(二)特征交叉
特征离散化处理之后,可以将处理之后的特征拉伸为一个向量,作为最终特征。但这种方式是线性模型,忽略了特征之间的相互作用。如性别和物品种类的组合对待呈现信息点击率有很直接的影响。因此对特征进行交叉能有效提升模型预测的准确率。特征交叉的方法即把两个特征组合形成新的连续特征,如性别和物品类别(m类)组合之后则产生2m个离散的特征。
设本申请形成的离散特征向量为x,特征的维度为113。其中x1~x10为用户年龄特征段;x11~x18为用户地域特征段;x19~x25为用户职业特征段;x26~x30为用户观看视频平台特征段;x31~x38为物品类别特征段;x39~x50为物品价格特征段;x51~x58为物品地域特征段;x59~x60为物品点击率特征段;x61~x65为检测目标出现时长特征段;x66~x75为检测目标与广告物品相似度特征段;x76~x91为物品类别/用户性别组合特征段;x92~x113为用户性别/物品价格组合特征段。
基于逻辑回归模型推荐待呈现信息。逻辑回归模型(Logic Regression,LR),是一个被广泛应用在广告推荐中的算法。设训练数据集为D=(x 1,y 1),(x 2,y 2)...(x N,y N),其中
Figure PCTCN2018072285-appb-000006
为构建特征,y i广告是否被点击,1为点击,-1为未点击。
LR的基本假设为,条件概率P(y=1|x;θ)满足如下表达式:
Figure PCTCN2018072285-appb-000007
这里的g(θ Tx)是提到的sigmoid函数,x是特征向量,θ是参数向量,相应的决策函数为:
y *=1,if P(y=1|x)>0.5                   (公式7)
模型的数学形式确定后,接下来即求解模型中的参数。采用了最大似然估计,即找到一组参数,使得在这组参数下数据的似然度(概率)越大。在逻辑回归模型中,似然度L(θ)可表示为:
L(θ)=P(D|θ)=∏P(y|x;θ)=∏g(θ Tx) y(1-g(θ Tx)) 1-y     (公式8)
取对数可以得到对数似然度l(θ):
l(θ)=∑ylog g(θ Tx)+(1-y)log(1-g(θ Tx))           (公式9)
在LR模型中,最大化上述似然函数即可得到最优参数。本申请采用梯度下降迭代求解参数,通过在每一步选取使目标函数变化最快的一个方向调整参数的值来逼近最优值。
模型训练完成之后,即得到推荐待呈现信息的推荐系统。计算从待呈现信息库里检索到的预定数目个待呈现信息进行点击率预测,选取预估点击率最高的待呈现信息进行呈现。
从图4中可以看出,与图2对应的实施例相比,本实施例中的信息呈现方法的流程400突出了对待呈现信息进行选择的步骤。从而能够准确地选择待呈现信息,提取待呈现信息的命中率,尽量呈现有效的待呈现信息,降低投放待呈现信息的成本。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种信息呈现装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的信息呈现装置500包括:关键帧检测单元501、图像检测单元502、确定单元503和呈现单元504。其中,关 键帧检测单元501用于检测目标视频中的关键帧,其中,关键帧为目标视频中图像熵大于预设的图像熵阈值的帧;图像检测单元502用于响应于检测到关键帧,从关键帧中检测目标物品的图像;确定单元503用于响应于从关键帧中检测到目标物品的图像,确定在关键帧之后连续呈现目标物品的图像的帧的数目是否大于预定的帧数;呈现单元504用于若大于预定的帧数,则获取与目标物品的图像匹配的待呈现信息,并在连续呈现目标物品的图像的帧中呈现待呈现信息。
在本实施例中,信息呈现装置500的关键帧检测单元501、图像检测单元502、确定单元503和呈现单元504的具体处理可以参考图2对应实施例中的步骤201、步骤202、步骤203、步骤204。
在本实施例的一些可选的实现方式中,关键帧检测单元501进一步用于:获取图像熵大于预设的图像熵阈值的帧作为关键帧;按照目标视频的播放顺序,获取关键帧之后的图像熵大于预设的图像熵阈值的第一帧;确定第一帧与关键帧的相似度是否小于预设的相似度阈值;若小于预设的相似度阈值,则确定出第一帧为关键帧。
在本实施例的一些可选的实现方式中,图像检测单元502进一步用于:基于预先训练的卷积神经网络从关键帧中检测目标物品的图像,其中,卷积神经网络用于识别目标物品的图像特征并根据图像特征确定目标物品的图像。
在本实施例的一些可选的实现方式中,确定单元503进一步用于:采用压缩跟踪算法确定目标物品的图像是否连续呈现在关键帧之后的不同的帧中;若连续呈现,则累计连续呈现目标物品的图像的帧的数目,并确定帧的数目是否大于预定的帧数。
在本实施例的一些可选的实现方式中,呈现单元504进一步用于:确定目标物品的图像在连续呈现目标物品的图像的帧中的位置信息;根据位置信息确定待呈现信息的呈现位置;在呈现位置上呈现待呈现信息。
在本实施例的一些可选的实现方式中,呈现单元504进一步用于:获取待呈现信息集合,其中,待呈现信息包括图片;确定待呈现信息集合中每条待呈现信息中的图片与目标物品的图像之间的相似度;按 照相似度由大到小的顺序从待呈现信息集合中选取至少一条待呈现信息。
在本实施例的一些可选的实现方式中,待呈现信息包括文字信息;以及呈现单元504进一步用于:获取与目标物品的图像的类别匹配的文字信息。
在本实施例的一些可选的实现方式中,呈现单元504进一步用于:获取通过终端观看目标视频的用户的类别标签,其中,用户的类别标签是通过对用户的行为数据进行大数据分析得到的;从待呈现信息集合中获取与用户的类别标签匹配的至少一条待呈现信息。
下面参考图6,其示出了适于用来实现本申请实施例的设备的计算机系统600的结构示意图。图6示出的设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中, 该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是, 框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括关键帧检测单元、图像检测单元、确定单元和呈现单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,关键帧检测单元还可以被描述为“检测目标视频中的关键帧的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的装置中所包含的;也可以是单独存在,而未装配入该装置中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该装置执行时,使得该装置:检测目标视频中的关键帧,其中,关键帧为目标视频中图像熵大于预设的图像熵阈值的帧;响应于检测到关键帧,从关键帧中检测目标物品的图像;响应于从关键帧中检测到目标物品的图像,确定在关键帧之后连续呈现目标物品的图像的帧的数目是否大于预定的帧数;若大于预定的帧数,则获取与目标物品的图像匹配的待呈现信息,并在连续呈现目标物品的图像的帧中呈现待呈现信息。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (18)

  1. 一种信息呈现方法,其特征在于,所述方法包括:
    检测目标视频中的关键帧,其中,所述关键帧为所述目标视频中图像熵大于预设的图像熵阈值的帧;
    响应于检测到所述关键帧,从所述关键帧中检测目标物品的图像;
    响应于从所述关键帧中检测到所述目标物品的图像,确定在所述关键帧之后连续呈现所述目标物品的图像的帧的数目是否大于预定的帧数;
    若大于预定的帧数,则获取与所述目标物品的图像匹配的待呈现信息,并在所述连续呈现所述目标物品的图像的帧中呈现所述待呈现信息。
  2. 根据权利要求1所述的方法,其特征在于,所述检测目标视频中的关键帧,包括:
    获取图像熵大于预设的图像熵阈值的帧作为关键帧;
    按照所述目标视频的播放顺序,获取所述关键帧之后的图像熵大于预设的图像熵阈值的第一帧;
    确定所述第一帧与所述关键帧的相似度是否小于预设的相似度阈值;
    若小于预设的相似度阈值,则确定出所述第一帧为关键帧。
  3. 根据权利要求1所述的方法,其特征在于,从所述关键帧中检测目标物品的图像,包括:
    基于预先训练的卷积神经网络从所述关键帧中检测目标物品的图像,其中,所述卷积神经网络用于识别所述目标物品的图像特征并根据所述图像特征确定所述目标物品的图像。
  4. 根据权利要求1所述的方法,其特征在于,所述确定在所述关键帧之后连续呈现所述目标物品的图像的帧的数目是否大于预定的帧 数,包括:
    采用压缩跟踪算法确定所述目标物品的图像是否连续呈现在所述关键帧之后的不同的帧中;
    若连续呈现,则累计连续呈现所述目标物品的图像的帧的数目,并确定所述帧的数目是否大于预定的帧数。
  5. 根据权利要求1所述的方法,其特征在于,所述在所述连续呈现所述目标物品的图像的帧中呈现所述待呈现信息,包括:
    确定所述目标物品的图像在所述连续呈现所述目标物品的图像的帧中的位置信息;
    根据所述位置信息确定所述待呈现信息的呈现位置;
    在所述呈现位置上呈现所述待呈现信息。
  6. 根据权利要求1-5任一项中所述的方法,其特征在于,所述获取与所述目标物品的图像匹配的待呈现信息,包括:
    获取待呈现信息集合,其中,所述待呈现信息包括图片;
    确定所述待呈现信息集合中每条待呈现信息中的图片与所述目标物品的图像之间的相似度;
    按照相似度由大到小的顺序从所述待呈现信息集合中选取至少一条待呈现信息。
  7. 根据权利要求1所述的方法,其特征在于,所述待呈现信息包括文字信息;以及
    所述获取与所述目标物品的图像匹配的待呈现信息,包括:
    获取与所述目标物品的图像的类别匹配的文字信息。
  8. 根据权利要求1所述的方法,其特征在于,所述获取与所述目标物品的图像匹配的待呈现信息,包括:
    获取通过终端观看所述目标视频的用户的类别标签,其中,所述用户的类别标签是通过对所述用户的行为数据进行大数据分析得到的;
    从待呈现信息集合中获取与所述用户的类别标签匹配的至少一条待呈现信息。
  9. 一种信息呈现装置,其特征在于,所述装置包括:
    关键帧检测单元,用于检测目标视频中的关键帧,其中,所述关键帧为所述目标视频中图像熵大于预设的图像熵阈值的帧;
    图像检测单元,用于响应于检测到所述关键帧,从所述关键帧中检测目标物品的图像;
    确定单元,用于响应于从所述关键帧中检测到所述目标物品的图像,确定在所述关键帧之后连续呈现所述目标物品的图像的帧的数目是否大于预定的帧数;
    呈现单元,用于若大于预定的帧数,则获取与所述目标物品的图像匹配的待呈现信息,并在所述连续呈现所述目标物品的图像的帧中呈现所述待呈现信息。
  10. 根据权利要求9所述的装置,其特征在于,所述关键帧检测单元进一步用于:
    获取图像熵大于预设的图像熵阈值的帧作为关键帧;
    按照所述目标视频的播放顺序,获取所述关键帧之后的图像熵大于预设的图像熵阈值的第一帧;
    确定所述第一帧与所述关键帧的相似度是否小于预设的相似度阈值;
    若小于预设的相似度阈值,则确定出所述第一帧为关键帧。
  11. 根据权利要求9所述的装置,其特征在于,所述图像检测单元进一步用于:
    基于预先训练的卷积神经网络从所述关键帧中检测目标物品的图像,其中,所述卷积神经网络用于识别所述目标物品的图像特征并根据所述图像特征确定所述目标物品的图像。
  12. 根据权利要求9所述的装置,其特征在于,所述确定单元进一步用于:
    采用压缩跟踪算法确定所述目标物品的图像是否连续呈现在所述关键帧之后的不同的帧中;
    若连续呈现,则累计连续呈现所述目标物品的图像的帧的数目,并确定所述帧的数目是否大于预定的帧数。
  13. 根据权利要求9所述的装置,其特征在于,所述呈现单元进一步用于:
    确定所述目标物品的图像在所述连续呈现所述目标物品的图像的帧中的位置信息;
    根据所述位置信息确定所述待呈现信息的呈现位置;
    在所述呈现位置上呈现所述待呈现信息。
  14. 根据权利要求9-13任一项中所述的装置,其特征在于,所述呈现单元进一步用于:
    获取待呈现信息集合,其中,所述待呈现信息包括图片;
    确定所述待呈现信息集合中每条待呈现信息中的图片与所述目标物品的图像之间的相似度;
    按照相似度由大到小的顺序从所述待呈现信息集合中选取至少一条待呈现信息。
  15. 根据权利要求9所述的装置,其特征在于,所述待呈现信息包括文字信息;以及
    所述呈现单元进一步用于:
    获取与所述目标物品的图像的类别匹配的文字信息。
  16. 根据权利要求9所述的装置,其特征在于,所述呈现单元进一步用于:
    获取通过终端观看所述目标视频的用户的类别标签,其中,所述 用户的类别标签是通过对所述用户的行为数据进行大数据分析得到的;
    从待呈现信息集合中获取与所述用户的类别标签匹配的至少一条待呈现信息。
  17. 一种设备,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-8中任一所述的方法。
  18. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-8中任一所述的方法。
PCT/CN2018/072285 2017-03-15 2018-01-11 信息呈现方法和装置 WO2018166288A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710152564.0 2017-03-15
CN201710152564.0A CN108629224B (zh) 2017-03-15 2017-03-15 信息呈现方法和装置

Publications (1)

Publication Number Publication Date
WO2018166288A1 true WO2018166288A1 (zh) 2018-09-20

Family

ID=63522608

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072285 WO2018166288A1 (zh) 2017-03-15 2018-01-11 信息呈现方法和装置

Country Status (2)

Country Link
CN (1) CN108629224B (zh)
WO (1) WO2018166288A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110189242A (zh) * 2019-05-06 2019-08-30 百度在线网络技术(北京)有限公司 图像处理方法和装置
CN110570318A (zh) * 2019-04-18 2019-12-13 阿里巴巴集团控股有限公司 计算机执行的基于视频流的车辆定损方法及装置
CN110941594A (zh) * 2019-12-16 2020-03-31 北京奇艺世纪科技有限公司 一种视频文件的拆分方法、装置、电子设备及存储介质
CN111079864A (zh) * 2019-12-31 2020-04-28 杭州趣维科技有限公司 一种基于优化视频关键帧提取的短视频分类方法及系统
CN111125501A (zh) * 2018-10-31 2020-05-08 北京字节跳动网络技术有限公司 用于处理信息的方法和装置
CN111611417A (zh) * 2020-06-02 2020-09-01 Oppo广东移动通信有限公司 图像去重方法、装置、终端设备及存储介质
CN112085120A (zh) * 2020-09-17 2020-12-15 腾讯科技(深圳)有限公司 多媒体数据的处理方法、装置、电子设备及存储介质
CN112749326A (zh) * 2019-11-15 2021-05-04 腾讯科技(深圳)有限公司 信息处理方法、装置、计算机设备及存储介质
CN113033475A (zh) * 2021-04-19 2021-06-25 北京百度网讯科技有限公司 目标对象追踪方法、相关装置及计算机程序产品
CN113312951A (zh) * 2020-10-30 2021-08-27 阿里巴巴集团控股有限公司 动态视频目标跟踪系统、相关方法、装置及设备
CN113763098A (zh) * 2020-12-21 2021-12-07 北京沃东天骏信息技术有限公司 用于确定物品的方法和装置

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109495784A (zh) * 2018-11-29 2019-03-19 北京微播视界科技有限公司 信息推送方法、装置、电子设备及计算机可读存储介质
CN111683267A (zh) * 2019-03-11 2020-09-18 阿里巴巴集团控股有限公司 媒体信息的处理方法、系统、设备及存储介质
CN110177250A (zh) * 2019-04-30 2019-08-27 上海掌门科技有限公司 一种用于在视频通话过程中提供采购信息的方法与设备
CN110311945B (zh) * 2019-04-30 2022-11-08 上海掌门科技有限公司 一种用于在实时视频流中呈现资源推送信息的方法与设备
CN110610510B (zh) * 2019-08-29 2022-12-16 Oppo广东移动通信有限公司 目标跟踪方法、装置、电子设备及存储介质
CN110853124B (zh) * 2019-09-17 2023-09-08 Oppo广东移动通信有限公司 生成gif动态图的方法、装置、电子设备及介质
CN110764726B (zh) * 2019-10-18 2023-08-22 网易(杭州)网络有限公司 目标对象的确定方法及装置、终端设备、存储介质
CN113766330A (zh) * 2021-05-26 2021-12-07 腾讯科技(深圳)有限公司 基于视频生成推荐信息的方法和装置
CN114640863A (zh) * 2022-03-04 2022-06-17 广州方硅信息技术有限公司 直播间内的人物信息显示方法、系统、装置及计算机设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020097893A1 (en) * 2001-01-20 2002-07-25 Lee Seong-Deok Apparatus and method for generating object-labeled image in video sequence
CN103810711A (zh) * 2014-03-03 2014-05-21 郑州日兴电子科技有限公司 一种用于监控系统视频的关键帧提取方法及其系统
CN104715023A (zh) * 2015-03-02 2015-06-17 北京奇艺世纪科技有限公司 基于视频内容的商品推荐方法和系统
CN105282573A (zh) * 2014-07-24 2016-01-27 腾讯科技(北京)有限公司 一种嵌入式信息处理方法、客户端及服务器
CN105679017A (zh) * 2016-01-27 2016-06-15 福建工程学院 一种轻微交通事故辅助取证的方法及系统
CN105872588A (zh) * 2015-12-09 2016-08-17 乐视网信息技术(北京)股份有限公司 视频中加载广告的方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020097893A1 (en) * 2001-01-20 2002-07-25 Lee Seong-Deok Apparatus and method for generating object-labeled image in video sequence
CN103810711A (zh) * 2014-03-03 2014-05-21 郑州日兴电子科技有限公司 一种用于监控系统视频的关键帧提取方法及其系统
CN105282573A (zh) * 2014-07-24 2016-01-27 腾讯科技(北京)有限公司 一种嵌入式信息处理方法、客户端及服务器
CN104715023A (zh) * 2015-03-02 2015-06-17 北京奇艺世纪科技有限公司 基于视频内容的商品推荐方法和系统
CN105872588A (zh) * 2015-12-09 2016-08-17 乐视网信息技术(北京)股份有限公司 视频中加载广告的方法及装置
CN105679017A (zh) * 2016-01-27 2016-06-15 福建工程学院 一种轻微交通事故辅助取证的方法及系统

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125501A (zh) * 2018-10-31 2020-05-08 北京字节跳动网络技术有限公司 用于处理信息的方法和装置
CN111125501B (zh) * 2018-10-31 2023-07-25 北京字节跳动网络技术有限公司 用于处理信息的方法和装置
CN110570318B (zh) * 2019-04-18 2023-01-31 创新先进技术有限公司 计算机执行的基于视频流的车辆定损方法及装置
CN110570318A (zh) * 2019-04-18 2019-12-13 阿里巴巴集团控股有限公司 计算机执行的基于视频流的车辆定损方法及装置
CN110189242A (zh) * 2019-05-06 2019-08-30 百度在线网络技术(北京)有限公司 图像处理方法和装置
CN110189242B (zh) * 2019-05-06 2023-04-11 阿波罗智联(北京)科技有限公司 图像处理方法和装置
CN112749326B (zh) * 2019-11-15 2023-10-03 腾讯科技(深圳)有限公司 信息处理方法、装置、计算机设备及存储介质
CN112749326A (zh) * 2019-11-15 2021-05-04 腾讯科技(深圳)有限公司 信息处理方法、装置、计算机设备及存储介质
CN110941594A (zh) * 2019-12-16 2020-03-31 北京奇艺世纪科技有限公司 一种视频文件的拆分方法、装置、电子设备及存储介质
CN110941594B (zh) * 2019-12-16 2023-04-18 北京奇艺世纪科技有限公司 一种视频文件的拆分方法、装置、电子设备及存储介质
CN111079864A (zh) * 2019-12-31 2020-04-28 杭州趣维科技有限公司 一种基于优化视频关键帧提取的短视频分类方法及系统
CN111611417B (zh) * 2020-06-02 2023-09-01 Oppo广东移动通信有限公司 图像去重方法、装置、终端设备及存储介质
CN111611417A (zh) * 2020-06-02 2020-09-01 Oppo广东移动通信有限公司 图像去重方法、装置、终端设备及存储介质
CN112085120A (zh) * 2020-09-17 2020-12-15 腾讯科技(深圳)有限公司 多媒体数据的处理方法、装置、电子设备及存储介质
CN112085120B (zh) * 2020-09-17 2024-01-02 腾讯科技(深圳)有限公司 多媒体数据的处理方法、装置、电子设备及存储介质
CN113312951A (zh) * 2020-10-30 2021-08-27 阿里巴巴集团控股有限公司 动态视频目标跟踪系统、相关方法、装置及设备
CN113312951B (zh) * 2020-10-30 2023-11-07 阿里巴巴集团控股有限公司 动态视频目标跟踪系统、相关方法、装置及设备
CN113763098A (zh) * 2020-12-21 2021-12-07 北京沃东天骏信息技术有限公司 用于确定物品的方法和装置
CN113033475A (zh) * 2021-04-19 2021-06-25 北京百度网讯科技有限公司 目标对象追踪方法、相关装置及计算机程序产品
CN113033475B (zh) * 2021-04-19 2024-01-12 北京百度网讯科技有限公司 目标对象追踪方法、相关装置及计算机程序产品

Also Published As

Publication number Publication date
CN108629224A (zh) 2018-10-09
CN108629224B (zh) 2019-11-05

Similar Documents

Publication Publication Date Title
WO2018166288A1 (zh) 信息呈现方法和装置
CN108446390B (zh) 用于推送信息的方法和装置
EP3267362B1 (en) Machine learning image processing
US20220309762A1 (en) Generating scene graphs from digital images using external knowledge and image reconstruction
US10360623B2 (en) Visually generated consumer product presentation
JP7130560B2 (ja) コンテンツを効果的に配信するための動的クリエイティブの最適化
WO2020108396A1 (zh) 视频分类的方法以及服务器
CN107305557A (zh) 内容推荐方法及装置
CN104715023A (zh) 基于视频内容的商品推荐方法和系统
CN110737783A (zh) 一种推荐多媒体内容的方法、装置及计算设备
WO2012071696A1 (zh) 基于用户兴趣学习的个性化广告推送方法与系统
US20210073890A1 (en) Catalog-based image recommendations
WO2020192013A1 (zh) 定向广告投放方法和装置、设备及存储介质
CN108959323B (zh) 视频分类方法和装置
JP6527275B1 (ja) イメージ内の複数の客体の調和に基づく調和検索方法、コンピュータ装置およびコンピュータプログラム
CN112364204A (zh) 视频搜索方法、装置、计算机设备及存储介质
CN113766330A (zh) 基于视频生成推荐信息的方法和装置
US20150131967A1 (en) Computerized systems and methods for generating models for identifying thumbnail images to promote videos
Savchenko et al. Preference prediction based on a photo gallery analysis with scene recognition and object detection
WO2022247666A1 (zh) 一种内容的处理方法、装置、计算机设备和存储介质
Uddin et al. An indoor human activity recognition system for smart home using local binary pattern features with hidden markov models
US9286623B2 (en) Method for determining an area within a multimedia content element over which an advertisement can be displayed
US11823217B2 (en) Advanced segmentation with superior conversion potential
CN111967924A (zh) 商品推荐方法、商品推荐装置、计算机设备和介质
CN114330519A (zh) 数据确定方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18766942

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.12.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 18766942

Country of ref document: EP

Kind code of ref document: A1