CN109388721B - Method and device for determining cover video frame - Google Patents

Method and device for determining cover video frame Download PDF

Info

Publication number
CN109388721B
CN109388721B CN201811217665.2A CN201811217665A CN109388721B CN 109388721 B CN109388721 B CN 109388721B CN 201811217665 A CN201811217665 A CN 201811217665A CN 109388721 B CN109388721 B CN 109388721B
Authority
CN
China
Prior art keywords
video frame
article
text
cover
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811217665.2A
Other languages
Chinese (zh)
Other versions
CN109388721A (en
Inventor
赵翔
李鑫
刘霄
李旭斌
孙昊
文石磊
丁二锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811217665.2A priority Critical patent/CN109388721B/en
Publication of CN109388721A publication Critical patent/CN109388721A/en
Application granted granted Critical
Publication of CN109388721B publication Critical patent/CN109388721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method and a device for determining a cover video frame, wherein the method comprises the following steps: extracting keywords of an article text, and acquiring first vectors corresponding to the keywords; extracting a main word of each video frame in a preset time period in the article video, and acquiring a second vector corresponding to each main word; calculating the similarity between each video frame and the article text according to the second vector corresponding to each main word and the first vector corresponding to each keyword; and determining the target video frame as a cover video frame according to the similarity of each video frame and the article text. Therefore, the effect that the video frame serving as the cover conforms to the consistent image-text of the article content is achieved, the automatic adaptation of the video frame serving as the cover and the article content is achieved, and the cover determining efficiency and the click rate and browsing experience of a user are improved.

Description

Method and device for determining cover video frame
Technical Field
The invention relates to the technical field of multimedia information, in particular to a method and a device for determining a cover video frame.
Background
With the rapid development of the mobile internet, more and more videos appear in articles, for example, in a pushed article of a social network site, a lot of video clips are included to improve the interest of the article, and in order to make a user better understand the video content, the videos inserted into the article are displayed in the form of a video cover. However, in the related art, the video frame regarded as the cover of the video is determined as the default or is randomly selected, so that the video frame regarded as the cover is not in accordance with the content of the article, the click interest of the user is not effectively aroused, and the click rate and the browsing rate of the video by the user are not high.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the first objective of the present invention is to provide a method for determining a cover video frame, so as to realize automatic adaptation of the video frame as a cover to the content of an article.
A second object of the present invention is to provide a device for determining cover video frames.
A third object of the invention is to propose a computer program product.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
In order to achieve the above object, a first embodiment of the present invention provides a method for determining a cover video frame, including the following steps: extracting keywords of an article text, and acquiring a first vector corresponding to each keyword; extracting a subject word of each video frame in a preset time period in the article video, and acquiring a second vector corresponding to each subject word; calculating the similarity between each video frame and the article text according to the second vector corresponding to each main word and the first vector corresponding to each keyword; and determining that the target video frame is a cover video frame according to the similarity of each video frame and the article text.
In addition, the method for determining the cover video frame in the embodiment of the invention also has the following additional technical characteristics:
optionally, the cover video frame is determined to be an article cover frame, and/or the cover video frame is determined to be a video cover frame.
Optionally, the extracting a main word of each video frame in the article video within a preset time period includes: detecting whether each video frame contains a face or not, and if the fact that each video frame contains the face is known, extracting face features; and querying a preset face database to obtain main words corresponding to the face features.
Optionally, the extracting a main word of each video frame in the article video within a preset time period includes: detecting whether each video frame contains an article of a preset type, and if yes, extracting article features; and querying a preset article database to obtain a main word corresponding to the article characteristics.
Optionally, the calculating the similarity between each video frame and the article text according to the second vector corresponding to each main word and the first vector corresponding to each keyword includes: calculating a sub-distance between a second vector corresponding to each main word in each video frame and a first vector corresponding to each keyword; adding all sub-distances corresponding to each video frame to obtain a corresponding total distance; and calculating the reciprocal of the total distance of each video frame to obtain the similarity of each video frame and the article text through addition.
Optionally, the determining, according to the similarity between each video frame and the article text, that the target video frame is the cover video frame includes: and comparing the similarity of each video frame with the article text to obtain a target video frame corresponding to the maximum similarity as the cover video frame.
Optionally, the method further comprises: acquiring one or more image quality indexes of each video frame; determining that the target video frame is the cover video frame according to the similarity between each video frame and the article text, including: acquiring weights corresponding to the image quality indexes and weights corresponding to the similarity; calculating score data of each video frame according to each image quality index and corresponding weight of each video frame, and similarity and corresponding weight of each video frame and the article text; and determining the target video frame corresponding to the maximum value of the score data as the cover video frame according to the score data of each video frame.
The embodiment of the second aspect of the present invention provides a device for determining a cover video frame, including: the first acquisition module is used for extracting keywords of the article text and acquiring a first vector corresponding to each keyword; the second obtaining module is used for extracting a main word of each video frame in a preset time period in the article video and obtaining a second vector corresponding to each main word; the calculation module is used for calculating the similarity between each video frame and the article text according to the second vector corresponding to each main word and the first vector corresponding to each keyword; and the cover determining module is used for determining the target video frame as the cover video frame according to the similarity of each video frame and the article text.
A third embodiment of the present invention provides a computer program product, which when executed by an instruction processor implements the method for determining cover video frames according to the foregoing method embodiment.
A fourth aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the method for determining a cover video frame according to the foregoing method embodiment.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
extracting keywords of an article text, acquiring first vectors corresponding to the keywords, extracting main words of each video frame in a preset time period in an article video, acquiring second vectors corresponding to the main words, calculating the similarity between each video frame and the article text according to the second vectors corresponding to the main words and the first vectors corresponding to the keywords, and further determining that a target video frame is a cover video frame according to the similarity between each video frame and the article text. Therefore, the effect that the video frame serving as the cover conforms to the consistent image-text of the article content is achieved, the automatic adaptation of the video frame serving as the cover and the article content is achieved, and the cover determining efficiency and the click rate and browsing experience of a user are improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1-1 is a scene schematic of a determination of a cover video frame according to one embodiment of the invention;
FIGS. 1-2 are schematic diagrams of a scene of a determination result of a cover video frame according to another embodiment of the present invention;
FIG. 2 is a flow diagram of a method of determining cover video frames according to one embodiment of the invention;
FIG. 3 is a flow diagram of a method of determining cover video frames according to another embodiment of the present invention;
FIG. 4 is a flow diagram of a method of determining cover video frames according to yet another embodiment of the present invention;
FIG. 5 is a flow diagram of a method of determining cover video frames according to yet another embodiment of the present invention;
fig. 6 is a schematic view of an application scenario of a cover video frame determination method according to another embodiment of the present invention;
fig. 7 is a schematic structural diagram of a cover video frame determination apparatus according to an embodiment of the present invention; and
fig. 8 is a schematic structural diagram of a cover video frame determination apparatus according to another embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
A method and apparatus for determining a cover video frame according to an embodiment of the present invention will be described with reference to the drawings. Based on the description of the prior art, it can be known that a method for selecting cover video frames is lacked in the related art, and the increase of user traffic such as click rate and the like caused by matching of pictures and texts is not considered. Different video covers are displayed in different article contents aiming at the same inserted video so as to adapt to the article where the video is located, and the click rate of a user on the video and the reading experience of the article are improved.
The cover video frame can be applied to a video inserted into an article and used as a video cover, the video inserted into the article can be inserted into any position of the article according to the article requirement, and the cover video frame can also be used as an article cover, for example, the cover video frame can be used as an article cover for pushing the article by a WeChat public number, or can be used as a link thumbnail when sharing article links on social platforms such as a friend circle and the like.
For example, when a cover video frame is applied to a video cover, for the same insert video a, when an article 1 describing a star is inserted, as shown in the left diagram of fig. 1-1, the displayed video cover is a video frame containing the star, and when an article 2 describing a building is inserted, as shown in the right diagram of fig. 1-1, the displayed video cover is a video frame describing the building, and in this example, the video insertion position is the middle part of the article.
For another example, when the cover video frame is used as a text cover pushed by the public number, as shown in the left diagram of fig. 1-2, for the same insert video a, when the article 1 describing the star is inserted, the displayed article cover is a video frame including the star, as shown in the right diagram of fig. 1-2, and when the article 2 describing the building is inserted, the displayed article cover is a video frame including the building.
Fig. 2 is a flowchart of a cover video frame determination method according to an embodiment of the present invention, as shown in fig. 2, the method including:
step 101, extracting keywords of an article text, and acquiring first vectors corresponding to the keywords.
The first vector represents the characteristics of the keyword, including the probability distribution condition of the word sequence in multiple dimensions, and the method for generating the first vector comprises a neural network, dimension reduction of a word co-occurrence matrix, a probability model and the like.
Specifically, in the embodiment of the invention, keywords of the article text are extracted, the keywords represent main embodied ideas of the article, and the keywords are processed into a first vector so as to facilitate similarity comparison of the main ideas of the article in the following process.
It should be noted that, in different application scenarios, the ways of extracting keywords of an article text are different, and as one possible implementation way, after performing part-of-speech analysis and word segmentation processing on the article text, counting the frequency of occurrence of each word segmentation, and taking a word with a higher frequency of occurrence as a keyword, as another possible implementation way, inputting the article text into a preset learning model, where the input of the learning model is the article text and the output is the main idea of the article text, after obtaining the main idea output by the learning model, performing part-of-speech analysis and word segmentation processing on the article text, calculating the relevance between each analysis and the main idea, and taking a word with a relevance greater than a certain value as a keyword.
And 102, extracting the main words of each video frame in a preset time period in the article video, and acquiring second vectors corresponding to the main words.
The article video refers to the video inserted into the article.
Specifically, to facilitate determining video frames consistent with the article keywords, the video frames are processed into a second vector at the same latitude as the first vector, and the second vector is used for representing the main embodiment content of each video frame. Certainly, when the inserted video is a complete video, in order to improve the learning efficiency, the video in the preset time period is selected as the video where the video frame is located, as a possible implementation manner, considering that a climax part of the video, that is, a part capable of most embodying the content of the video, is located in the middle part of the video, and therefore, based on the thirty percent to seventy percent of the video frame of the video, the second vector is obtained, and therefore, in this embodiment, a time period corresponding to the pseudo-climax part of the preset time period is used as another possible implementation manner, a user previously screens out, based on the labels marked on different video segments based on the content of the video in the video, the time period where the video segments possibly related to the inserted article are located roughly, and uses the time period as the preset time period.
Specifically, a main word of each video frame in a preset time period in the article video is extracted, wherein the main word is used for representing content mainly contained in the current video frame, and the main word may be a main idea embodied by bullet screen content of the current video frame, a main idea embodied by subtitle content of the current video frame, or a character content, a general object content (such as a building, a living good, a cosmetic product, an environment representative) and the like contained in the video frame.
It should be noted that, in different application scenarios, the way of extracting the main word of each video frame in the article video within the preset time period is different, which is exemplified as follows:
the first example:
in this example, the main words include character content, such as star, scholars, animated characters, etc., and as shown in fig. 3, the manner of extracting the main words of each video frame in the article video within a preset time period includes:
step 201, detecting whether each video frame contains a human face, and if the video frame contains the human face, extracting human face features.
Specifically, it may be detected whether each video frame includes a human face based on whether each video frame includes features of five sense organs such as eyes and a nose of a person, and if it is known that a human face exists, in order to determine a specific person corresponding to the human face, facial features, such as features of five sense organs shape, five sense organs size, and the like, that can identify the uniqueness of the human face, may be extracted.
Step 202, querying a preset face database to obtain a main word corresponding to the face feature.
It can be understood that a face database containing the corresponding relationship between the face features and the main words corresponding to the characters is preset, and after the face features are obtained, the preset face database is queried to obtain the main words corresponding to the face features.
The second example is:
in this example, the main word includes an article, and as shown in fig. 4, the manner of extracting the main word of each video frame in the article video within a preset time period includes:
step 301, detecting whether each video frame contains an article of a preset type, and if so, extracting article features.
Specifically, whether each video frame contains articles in a preset category is detected based on the color, shape and the like corresponding to the connected domain in each video frame, wherein the articles in the preset category may contain general articles such as cosmetics and living goods, or a specific article category may be selected and set according to article contents, for example, the main content of the current article is introduction cosmetics, and the preset article category may correspond to a more fine-grained article category under the cosmetics category, for example, including lipstick, blush, mascara and the like. After each video frame is detected to contain the preset category of articles, the characteristics such as colors and shapes of the articles, which can show the uniqueness of the articles, are extracted.
Step 302, querying a preset article database to obtain a subject word corresponding to the article feature.
It can be understood that an article database containing the corresponding relationship between the article characteristics and the subject words corresponding to the articles is preset, and after the article characteristics are obtained, the preset article database is queried to obtain the subject words corresponding to the article characteristics.
And 103, calculating the similarity between each video frame and the article text according to the second vector corresponding to each main word and the first vector corresponding to each keyword.
And 104, determining the target video frame as a cover video frame according to the similarity of each video frame and the article text.
Specifically, in order to determine cover video frames which are more consistent with the article text, the similarity between each video frame and the article text is calculated according to the second vector corresponding to each main word and the first vector corresponding to each keyword. And determining a similar video frame as a video cover according to the similarity, thereby realizing the effect of image-text coincidence.
In an embodiment of the present invention, the similarity is embodied based on a distance between vectors, and in this embodiment, as shown in fig. 5, calculating a similarity between each video frame and an article text according to a second vector corresponding to each subject word and a first vector corresponding to each keyword includes:
step 401, calculating a sub-distance between a second vector corresponding to each main word in each video frame and a first vector corresponding to each keyword.
Specifically, calculating a sub-distance between a second vector corresponding to each main word in each video frame and a first vector corresponding to each keyword to calculate a similarity of each main word corresponding to the keyword of each article text.
Step 402, adding all sub-distances corresponding to each video frame to obtain a corresponding total distance.
Specifically, in this embodiment, the sub-distances corresponding to each video frame are added to obtain a corresponding total distance, which can be used to represent the overall similarity between the main word in each video frame and the keywords of the article.
And 403, calculating the reciprocal of the total distance of each video frame, and obtaining and adding the reciprocal to obtain the similarity between each video frame and the article text.
It is understood that, based on the principle of generating vector distances, the greater the vector distance is, the lower the similarity between vectors is, and therefore, in this embodiment, the reciprocal of the total distance of each video frame is calculated to obtain the similarity between each video frame and the article text by addition, and further, the similarity between each video frame and the article text is compared, so that the target video frame corresponding to the maximum value of the obtained similarity can be the cover video frame.
In the actual implementation process, as mentioned above, the cover video frame may be determined as an article cover, where the article cover may be a cover of a pushed article shown in fig. 1-2, or may be a link thumbnail of the pushed article shown in fig. 6, and of course, the cover video frame may also be determined as a video cover of an inserted video shown in fig. 1-2.
In an embodiment of the present invention, in order to further improve the click rate of the user, the video frame serving as the cover page may be determined based on the video quality of the video frame, such as the definition and the aesthetic degree.
Specifically, in this embodiment, one or more image quality indicators of each video frame are obtained, for example, the definition and the aesthetic measure of an image (the aesthetic measure may be obtained according to a pre-established deep learning model, etc.) are obtained, and then, when determining a cover video frame, a corresponding weight value is set in advance based on each image quality indicator and the similarity, where the weight value may be set according to the attribute of an article, for example, when the article belongs to an entertainment-type article, the weight of the similarity is greater than the weight of each image quality indicator, and for example, when the article belongs to a national defense-type article, the weight of the similarity is smaller than the weight of each image quality indicator.
Furthermore, according to each image quality index and corresponding weight of each video frame, and similarity and corresponding weight of each video frame and the article text, score data of each video frame is calculated, for example, each image quality index is normalized respectively, the normalized data and the corresponding weight are multiplied, meanwhile, the product value of the similarity and the corresponding weight is calculated, the sum of the two product values is used as the score data of each video frame, and the target video frame corresponding to the maximum value of the score data is determined as a cover according to the score data of each video frame, wherein the score data can be normalized in ten-degree, five-degree, and the like without limitation.
To sum up, the method for determining a cover video frame according to the embodiment of the present invention extracts keywords of an article text, obtains first vectors corresponding to the keywords, extracts a subject word of each video frame in a preset time period in the article video, obtains second vectors corresponding to the subject words, calculates a similarity between each video frame and the article text according to the second vectors corresponding to the subject words and the first vectors corresponding to the keywords, and determines that a target video frame is the cover video frame according to the similarity between each video frame and the article text. Therefore, the effect that the video frame serving as the cover conforms to the consistent image-text of the article content is achieved, the automatic adaptation of the video frame serving as the cover and the article content is achieved, and the cover determining efficiency and the click rate and browsing experience of a user are improved.
In order to implement the foregoing embodiment, the present invention further provides a device for determining a cover video frame, fig. 7 is a schematic structural diagram of a device for determining a cover video frame according to an embodiment of the present invention, and as shown in fig. 7, the device for determining a cover video frame includes: a first acquisition module 10, a second acquisition module 20, a calculation module 30, and a cover determination module 40.
The first obtaining module 10 is configured to extract keywords of an article text, and obtain first vectors corresponding to the keywords.
The second obtaining module 20 is configured to extract a subject word of each video frame in a preset time period in the article video, and obtain a second vector corresponding to each subject word.
And the calculating module 30 is configured to calculate a similarity between each video frame and the article text according to the second vector corresponding to each main word and the first vector corresponding to each keyword.
And the cover determining module 40 is used for determining the target video frame as the cover video frame according to the similarity between each video frame and the article text.
In an embodiment of the present invention, as shown in fig. 8, on the basis as shown in fig. 7, the first obtaining module 10 includes an extracting unit 11 and an obtaining unit 12, where the extracting unit 11 is configured to detect whether each video frame includes a face, and extract a face feature when it is known that the video frame includes the face.
The obtaining unit 12 is configured to query a preset face database to obtain a main word corresponding to the face feature.
It should be noted that the foregoing explanation of the embodiment of the method for determining a cover video frame is also applicable to the device for determining a cover video frame of this embodiment, and is not repeated here.
To sum up, the device for determining a cover video frame according to the embodiment of the present invention extracts keywords of an article text, obtains first vectors corresponding to the keywords, extracts a subject word of each video frame in a preset time period in the article video, obtains second vectors corresponding to the subject words, calculates a similarity between each video frame and the article text according to the second vectors corresponding to the subject words and the first vectors corresponding to the keywords, and determines that a target video frame is the cover video frame according to the similarity between each video frame and the article text. Therefore, the effect that the video frame serving as the cover conforms to the consistent image-text of the article content is achieved, the automatic adaptation of the video frame serving as the cover and the article content is achieved, and the cover determining efficiency and the click rate and browsing experience of a user are improved.
In order to implement the above embodiments, the present invention further provides a computer program product, which when executed by an instruction processor implements the method for determining cover video frames as described in the foregoing method embodiments.
In order to implement the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of determining a cover video frame as described in the aforementioned method embodiments.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A method for determining a cover video frame, comprising the steps of:
inputting an article text in an article into a pre-constructed learning model to acquire subject information of the text;
performing part-of-speech analysis on the article text to obtain a plurality of text word segments;
calculating the correlation degree between each text participle in the plurality of text participles and the topic information, and determining the text participles with the correlation degree larger than a preset threshold value as keywords;
acquiring a first vector corresponding to each keyword;
extracting a main word of each video frame in a preset time period in a video inserted into the article, and acquiring a second vector corresponding to each main word, wherein the main word is used for representing a main idea embodied by subtitle content of each video frame, contained character content and general article content;
calculating the similarity between each video frame and the article text according to the second vector corresponding to each main word and the first vector corresponding to each keyword;
and determining that the target video frame is a cover video frame according to the similarity of each video frame and the article text.
2. The method of claim 1, further comprising:
and determining the cover video frame as an article cover frame, and/or determining the cover video frame as a video cover frame.
3. The method of claim 1, wherein the extracting the main word of each video frame in the article video within a preset time period comprises:
detecting whether each video frame contains a face or not, and if the fact that each video frame contains the face is known, extracting face features;
and querying a preset face database to obtain main words corresponding to the face features.
4. The method of claim 1, wherein the extracting the main word of each video frame in the article video within a preset time period comprises:
detecting whether each video frame contains an article of a preset type, and if yes, extracting article features;
and querying a preset article database to obtain a main word corresponding to the article characteristics.
5. The method of claim 1, wherein said calculating a similarity of each video frame to the article text based on the second vector corresponding to each of the subject words and the first vector corresponding to each of the keywords comprises:
calculating a sub-distance between a second vector corresponding to each main word in each video frame and a first vector corresponding to each keyword;
adding all sub-distances corresponding to each video frame to obtain a corresponding total distance;
and calculating the reciprocal of the total distance of each video frame to obtain the similarity of each video frame and the article text through addition.
6. The method of claim 5, wherein the determining a target video frame as the cover video frame based on the similarity of each video frame to the article text comprises:
and comparing the similarity of each video frame with the article text to obtain a target video frame corresponding to the maximum similarity as the cover video frame.
7. The method of any of claims 1-6, further comprising:
acquiring one or more image quality indexes of each video frame;
determining that the target video frame is the cover video frame according to the similarity between each video frame and the article text, including:
acquiring weights corresponding to the image quality indexes and weights corresponding to the similarity;
calculating score data of each video frame according to each image quality index and corresponding weight of each video frame, and similarity and corresponding weight of each video frame and the article text;
and determining the target video frame corresponding to the maximum value of the score data as the cover video frame according to the score data of each video frame.
8. A cover video frame determination apparatus, comprising:
the third acquisition module is used for inputting the article text in the article into a pre-constructed learning model and acquiring the subject information of the text;
the fourth acquisition module is used for performing part-of-speech analysis on the article text to acquire a plurality of text participles;
the determining module is used for calculating the correlation degree of each text participle in the plurality of text participles and the topic information, and determining the text participles with the correlation degree larger than a preset threshold value as the keywords;
the first acquisition module is used for extracting keywords of the article text and acquiring a first vector corresponding to each keyword;
the second obtaining module is used for extracting a main word of each video frame in a preset time period in the article video and obtaining a second vector corresponding to each main word, wherein the main word is used for representing a main idea embodied by subtitle content of each video frame, contained character content and general article content;
the calculation module is used for calculating the similarity between each video frame and the article text according to the second vector corresponding to each main word and the first vector corresponding to each keyword;
and the cover determining module is used for determining the target video frame as the cover video frame according to the similarity of each video frame and the article text.
9. A computer program product, wherein a processor of instructions in the computer program product, when executed, implements the method of determining a cover video frame of any of claims 1-7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the cover video frame determination method of any one of claims 1-7.
CN201811217665.2A 2018-10-18 2018-10-18 Method and device for determining cover video frame Active CN109388721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811217665.2A CN109388721B (en) 2018-10-18 2018-10-18 Method and device for determining cover video frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811217665.2A CN109388721B (en) 2018-10-18 2018-10-18 Method and device for determining cover video frame

Publications (2)

Publication Number Publication Date
CN109388721A CN109388721A (en) 2019-02-26
CN109388721B true CN109388721B (en) 2021-05-28

Family

ID=65426810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811217665.2A Active CN109388721B (en) 2018-10-18 2018-10-18 Method and device for determining cover video frame

Country Status (1)

Country Link
CN (1) CN109388721B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949819A (en) * 2019-05-15 2020-11-17 北京字节跳动网络技术有限公司 Method and device for pushing video
CN110191357A (en) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 The excellent degree assessment of video clip, dynamic seal face generate method and device
CN110572711B (en) * 2019-09-27 2023-03-24 北京达佳互联信息技术有限公司 Video cover generation method and device, computer equipment and storage medium
CN110956037B (en) * 2019-10-16 2022-07-08 厦门美柚股份有限公司 Multimedia content repeated judgment method and device
CN110909205B (en) * 2019-11-22 2023-04-07 北京金山云网络技术有限公司 Video cover determination method and device, electronic equipment and readable storage medium
CN111581510B (en) * 2020-05-07 2024-02-09 腾讯科技(深圳)有限公司 Shared content processing method, device, computer equipment and storage medium
CN112752121B (en) * 2020-05-26 2023-06-09 腾讯科技(深圳)有限公司 Video cover generation method and device
CN114915831A (en) * 2022-04-19 2022-08-16 秦皇岛泰和安科技有限公司 Preview determination method, device, terminal equipment and storage medium
CN116777914B (en) * 2023-08-22 2023-11-07 腾讯科技(深圳)有限公司 Data processing method, device, equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521293A (en) * 2011-11-30 2012-06-27 江苏奇异点网络有限公司 Video reconstruction method facing video frame content
CN106161873A (en) * 2015-04-28 2016-11-23 天脉聚源(北京)科技有限公司 A kind of video information extracts method for pushing and system
CN107025312A (en) * 2017-05-19 2017-08-08 北京金山安全软件有限公司 Information providing method and device based on video content
CN107918656A (en) * 2017-11-17 2018-04-17 北京奇虎科技有限公司 Video front cover extracting method and device based on video title

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160335493A1 (en) * 2015-05-15 2016-11-17 Jichuan Zheng Method, apparatus, and non-transitory computer-readable storage medium for matching text to images

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521293A (en) * 2011-11-30 2012-06-27 江苏奇异点网络有限公司 Video reconstruction method facing video frame content
CN106161873A (en) * 2015-04-28 2016-11-23 天脉聚源(北京)科技有限公司 A kind of video information extracts method for pushing and system
CN107025312A (en) * 2017-05-19 2017-08-08 北京金山安全软件有限公司 Information providing method and device based on video content
CN107918656A (en) * 2017-11-17 2018-04-17 北京奇虎科技有限公司 Video front cover extracting method and device based on video title

Also Published As

Publication number Publication date
CN109388721A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN109388721B (en) Method and device for determining cover video frame
CN106547908B (en) Information pushing method and system
Burns et al. Women also snowboard: Overcoming bias in captioning models
CN110472090B (en) Image retrieval method based on semantic tags, related device and storage medium
Khosla et al. Memorability of image regions
Borth et al. Large-scale visual sentiment ontology and detectors using adjective noun pairs
US9348886B2 (en) Formation and description of user subgroups
CN111539197B (en) Text matching method and device, computer system and readable storage medium
CN114342353B (en) Method and system for video segmentation
CN111212303B (en) Video recommendation method, server and computer-readable storage medium
CN109189991A (en) Repeat video frequency identifying method, device, terminal and computer readable storage medium
US20140257995A1 (en) Method, device, and system for playing video advertisement
US20150242689A1 (en) System and method for determining graph relationships using images
Zhao et al. Scene classification via latent Dirichlet allocation using a hybrid generative/discriminative strategy for high spatial resolution remote sensing imagery
CN103988202A (en) Image attractiveness based indexing and searching
Guo et al. Assessment model for perceived visual complexity of painting images
CN112533051A (en) Bullet screen information display method and device, computer equipment and storage medium
CN108198172B (en) Image significance detection method and device
CN108985133B (en) Age prediction method and device for face image
WO2018068648A1 (en) Information matching method and related device
CN110197389A (en) A kind of user identification method and device
CN102509119B (en) Method for processing image scene hierarchy and object occlusion based on classifier
Jiang et al. Active context-based concept fusionwith partial user labels
Trigeorgis et al. The ICL-TUM-PASSAU approach for the MediaEval 2015" affective impact of movies" task
CN110046251A (en) Community content methods of risk assessment and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant