CN111767814A - Video determination method and device - Google Patents
Video determination method and device Download PDFInfo
- Publication number
- CN111767814A CN111767814A CN202010567117.3A CN202010567117A CN111767814A CN 111767814 A CN111767814 A CN 111767814A CN 202010567117 A CN202010567117 A CN 202010567117A CN 111767814 A CN111767814 A CN 111767814A
- Authority
- CN
- China
- Prior art keywords
- target
- video
- long
- information
- videos
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000013598 vector Substances 0.000 claims abstract description 236
- 238000012549 training Methods 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 abstract description 12
- 238000004891 communication Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013475 authorization Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/74—Browsing; Visualisation therefor
- G06F16/748—Hypervideo
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A video determination method and device are provided, and the method comprises the following steps: acquiring target video information of a target short video; determining candidate long videos associated with the target short videos according to the target video information of the target short videos; acquiring a plurality of video frames of a target short video, and determining a plurality of feature vectors corresponding to the plurality of video frames; determining a target long video with a plurality of characteristic vectors corresponding to the characteristic vectors and the target short video meeting preset matching conditions from the candidate long video; in the target long video, determining a target video frame with a characteristic vector matched with a target characteristic vector of a preset frame of the target short video, and taking the playing time corresponding to the target video frame as the initial playing time of the target long video; and when the play trigger request of the target long video is acquired, playing the target long video from the initial play time of the target long video. According to the method and the device, the target short video can be directly jumped to the target long video from the target short video, and the target long video is watched from the initial playing time of the target long video, so that a complex searching process is avoided.
Description
Technical Field
The present application relates to the field of video technologies, and in particular, to a method and an apparatus for determining a video.
Background
In recent years, short video information streams have rapidly emerged, and if a user is interested in a short video during watching the short video, the idea of watching a long video corresponding to the short video is initiated.
In the current video information stream, the short video and the long video are separated, and if a user wants to watch the corresponding long video according to the short video, the user needs to search the long video by himself, so that the searching process is complicated.
Disclosure of Invention
To solve the above technical problem or at least partially solve the above technical problem, the present application provides a video determination method and apparatus.
In a first aspect, the present application provides a video determination method, including:
acquiring target video information of a target short video;
determining candidate long videos associated with the target short videos according to target video information of the target short videos;
acquiring a plurality of video frames of the target short video, and determining a plurality of feature vectors corresponding to the plurality of video frames;
determining a target long video with a plurality of characteristic vectors corresponding to the target short video and the characteristic vectors meeting preset matching conditions from the candidate long video;
determining a target video frame with a characteristic vector matched with a target characteristic vector of a preset frame of the target short video in the target long video, and taking the playing time corresponding to the target video frame as the initial playing time of the target long video;
and when the play triggering request of the target long video is acquired, playing the target long video from the initial play time of the target long video.
Optionally, the determining, from the candidate long videos, a target long video in which a plurality of feature vectors corresponding to the feature vectors and the target short video satisfy a preset matching condition includes:
determining a plurality of target feature vectors of which a plurality of feature vectors corresponding to the target short video meet a preset matching relation in a preset database;
determining candidate long videos corresponding to the target feature vector according to a corresponding relation between a preset feature vector and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
Optionally, the preset database includes label vectors of all long videos;
the determining, according to the target video information of the target short video, the candidate long video associated with the target short video comprises:
acquiring a target label vector of the target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
selecting a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking a long video corresponding to the target similarity value as a candidate long video.
Optionally, the target video information includes text information and cover picture information;
the obtaining of the target label vector of the target short video according to the target video information of the target short video comprises:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover picture information of the target short video into a cover picture model, and outputting a cover picture label vector;
and inputting the cover map label vector and the text label vector into a label identification model, and outputting a target label vector of the target short video.
Optionally, before inputting the cover map information of the target short video into the cover map model, the method further includes:
acquiring an image identification dataset, sample cover picture information and a sample cover picture label vector;
and training an initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
In a second aspect, an embodiment of the present application provides a video determining apparatus, including:
the first acquisition module is used for acquiring target video information of a target short video;
the first determination module is used for determining candidate long videos related to the target short videos according to target video information of the target short videos;
the second acquisition module is used for acquiring a plurality of video frames of the target short video and determining a plurality of feature vectors corresponding to the plurality of video frames;
the second determining module is used for determining a target long video with a plurality of characteristic vectors corresponding to the target short video and the characteristic vectors meeting preset matching conditions from the candidate long video;
a third determining module, configured to determine, in the target long video, a target video frame whose feature vector matches a target feature vector of a preset frame of the target short video, and use a play time corresponding to the target video frame as an initial play time of the target long video;
and the playing module is used for playing the target long video from the initial playing time of the target long video when the playing triggering request of the target long video is obtained.
Optionally, the second determining module is specifically configured to:
determining a plurality of target feature vectors of which a plurality of feature vectors corresponding to the target short video meet a preset matching relationship in the preset database;
determining candidate long videos corresponding to the target feature vectors according to the corresponding relation between preset feature vectors and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and the feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
Optionally, the preset database includes label vectors of all long videos;
the first determining module is specifically configured to:
acquiring a target label vector of the target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
selecting a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking a long video corresponding to the target similarity value as a candidate long video.
Optionally, the target video information includes text information and cover picture information;
the first determining module is specifically configured to:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover picture information of the target short video into a cover picture model, and outputting a cover picture label vector;
and inputting the cover map label vector and the text label vector into a label identification model, and outputting a target label vector of the target short video.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring an image identification data set, sample cover picture information and a sample cover picture label vector;
and the training module is used for training an initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
according to the method provided by the embodiment of the application, a server determines a candidate long video associated with a target short video according to target video information of the target short video, then obtains a plurality of video frames of the target short video, determines a plurality of feature vectors corresponding to the plurality of video frames, determines a target long video with the feature vectors meeting preset matching conditions with the plurality of feature vectors corresponding to the target short video from the candidate long video, determines a target video frame with the feature vectors matching with the target feature vectors of the preset frames of the target short video in the target long video, takes the playing time corresponding to the target video frame as the initial playing time of the target long video, and plays the target long video from the initial playing time of the target long video when a playing trigger request of the target long video is obtained. According to the method and the device, the user can jump to the corresponding target long video from the target short video directly, and watch the target long video from the determined initial playing moment, so that the complex process of finding the long video and the playing moment is avoided, and convenience is brought to the user.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a flowchart of a method for determining a video according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a method for determining a target long video according to an embodiment of the present disclosure;
fig. 3 is a flowchart for determining candidate long videos according to an embodiment of the present disclosure;
fig. 4 is a flowchart of a video determining method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an apparatus for video determination according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a video determining method, which can be applied to a server and used for determining the initial playing time of a short video in a corresponding long video.
A detailed description will be given below of a video determining method provided in an embodiment of the present application with reference to a specific implementation manner, as shown in fig. 1, the specific steps are as follows:
step 101: and acquiring target video information of the target short video.
In the embodiment of the application, the server acquires target video information of the target short video, wherein the target video information can be text information and cover picture information of the target short video. Specifically, the text information is a text label vector, and the cover map information is a cover map label vector. The process of the server for acquiring the text label vector and the cover map label vector comprises the following steps: the server inputs text information of the target short video, such as title, scenario brief description, recommendation information and the like, into a text model to perform word vectorization processing to obtain a text label vector corresponding to the text information, and the server inputs cover page information of the target short video into a cover page model and outputs the cover page label vector.
The text model may be a bag-of-words model, the cover map model may be an Xception (exception) model, and the application does not specifically limit the cover map model and the text model. In addition, the text label vector and the cover icon label vector are not sequentially acquired.
In the application, a BERT (bidirectional Encoder reproducing from transforms) model can be adopted as a text model in the aspect of video, a general text model is used in the movie and TV play or entertainment field when being trained on text content, and text in other fields is lack of training, so that the text model is deficient in text semantic understanding of the general field. Step 102: and determining candidate long videos associated with the target short videos according to the target video information of the target short videos.
In the embodiment of the application, the preset database comprises a plurality of long videos and label vectors of the long videos, the server inputs text label vectors and cover icon label vectors of target short videos into the label identification model, and outputs the target label vectors of the target short videos. The server inputs a target label vector of a target short video and label vectors of a plurality of long videos into a semantic similarity model and outputs a plurality of similarity values, then selects a similarity value higher than a preset threshold value from the similarity values, takes the similarity value as a target similarity value, and takes the long video corresponding to the target similarity value as a candidate long video.
Step 103: the method comprises the steps of obtaining a plurality of video frames of a target short video, and determining a plurality of feature vectors corresponding to the video frames.
In the embodiment of the application, the server divides the target short video into frames to obtain a plurality of video frames, inputs the video frames of the target short video into the preset neural network, and outputs the feature vectors corresponding to the target video frames. The video frame input to the preset neural network may be all video frames of the target short video or a partial video frame of the target short video. Specifically, the preset neural network may be an Xception model, and the preset neural network is not specifically limited in the present application.
Step 104: and determining the target long video with the characteristic vectors corresponding to the target short video and meeting the preset matching conditions from the candidate long video.
In the embodiment of the present application, the preset database includes a plurality of candidate long videos and feature vectors of the candidate long videos, and an obtaining process of the feature vectors of the candidate long videos is the same as an obtaining process of the feature vectors of the target short videos, which is not described in detail herein.
After the server determines a plurality of characteristic vectors of the target short video, a plurality of characteristic vectors meeting a preset matching relation with the plurality of characteristic vectors of the target short video are determined in a preset database through a quantitative retrieval technology, candidate long videos corresponding to the plurality of characteristic vectors meeting the preset matching relation are determined, and the candidate long videos are used as the target long videos.
Step 105: in the target long video, determining a target video frame with a characteristic vector matched with a target characteristic vector of a preset frame of the target short video, and taking the playing time corresponding to the target video frame as the initial playing time of the target long video.
In the embodiment of the application, a server determines a preset frame in a plurality of video frames of a target short video, obtains a target feature vector corresponding to the preset frame, determines a feature vector matched with the target feature vector in a target long video, and takes the video frame corresponding to the matched feature vector as the target video frame.
The preset frame may be a starting frame or an ending frame of the target short video, or an n-th frame, and the playing time corresponding to the target video frame is the playing time of the preset frame of the target short video and is also the starting playing time of the target long video. If the preset frame is the initial frame of the target short video, starting playing from the initial playing time of the target short video in the target long video, and enabling a user to watch the target short video again when watching the target long video; if the preset frame is the ending frame of the target short video, starting playing from the last frame of the target short video in the target long video, and watching other video contents behind the target short video in the target long video by a user; if the preset frame is the n-th frame from the last to the last of the target short video, the target long video is played from the n-th frame from the last to the last of the target short video, and the user watches the ending video of the target short video and continues to watch the subsequent video content in the target long video.
Step 106: and when the play trigger request of the target long video is acquired, playing the target long video from the initial play time of the target long video.
In the embodiment of the application, after determining the target long video corresponding to the target short video and the playing time of the preset frame of the target short video in the target long video, when the client detects the playing triggering request of the target long video, the server automatically plays the target long video from the initial playing time of the target long video.
Specifically, the client detects the play triggering request of the target long video, and may be that the target long video is automatically played after the target short video is played, or after the target short video is played, if the user wants to watch the corresponding target long video, the user clicks a play button of the target long video in the video playing interface, and then the play of the target long video is triggered.
Specifically, when the client detects a playing instruction of the target short video, the server obtains a text tag vector and a cover map tag vector of the target short video, and determines a candidate long video associated with the target short video according to the text tag vector and the cover map tag vector. Then the server obtains a plurality of video frames of the target short video, determines a plurality of feature vectors corresponding to the plurality of video frames, then determines feature vectors meeting preset matching conditions of the plurality of feature vectors corresponding to the target short video in a preset database, takes the video corresponding to the feature vectors as the target long video, determines feature vectors matched with the target feature vectors of the preset frames of the target short video in the target long video, takes the video frames corresponding to the feature vectors as the target video frames, takes the playing time corresponding to the target video frames as the initial playing time of the target long video, and when the client side obtains a playing triggering request of the target long video, the client side plays the target long video from the initial playing time.
The following provides an example of a video determination method, and the specific content is as follows: if the target short video is the short video of the integrated art program AAA, the server determines that the candidate long video is the video of the first season to the fifth season of the integrated art program AAA according to the target video information of the short video, and the preset database comprises all complete long videos of the AAA from the first season to the fifth season and feature vectors of the long videos. The server firstly frames the short video, determines a plurality of feature vectors corresponding to a plurality of video frames, then determines the feature vectors meeting preset matching conditions in a preset database according to the plurality of feature vectors of the short video, and determines the long video corresponding to the short video as AAA (authentication, authorization and accounting) period III in the fifth season. And then the server determines the starting frame of the short video, and determines that the playing time of the starting frame in the AAA fifth season third long video is 28 minutes and 20 seconds, and when the short video is jumped to the long video, the playing is started from 28 minutes and 20 seconds in the AAA fifth season third long video.
The method and the device can facilitate the user to directly jump to the initial playing time of the corresponding long video from the target short video for watching, avoid the complex process of finding the initial playing time of the short video in the long video, and bring convenience to the user.
Optionally, as shown in fig. 2, determining, from the candidate long videos, a target long video in which a plurality of feature vectors corresponding to the feature vector and the target short video satisfy a preset matching condition includes:
step 201: and determining a plurality of target feature vectors corresponding to the target short video in a preset database, wherein the plurality of feature vectors meet a preset matching relationship.
In the embodiment of the application, the preset database comprises a plurality of feature vectors of candidate long videos, and the server determines a plurality of target feature vectors corresponding to the target short videos in the preset database, wherein the plurality of feature vectors meet a preset matching relationship. The preset matching relation is that the matching degree between the feature vectors is higher than a matching threshold value.
Step 202: and determining the candidate long video corresponding to the target characteristic vector according to the corresponding relation between the preset characteristic vector and the candidate long video.
In the embodiment of the application, after the server determines the target feature vector, the candidate long video corresponding to the target feature vector is determined according to the corresponding relation between the preset feature vector and the candidate long video. The server judges the number of candidate long videos corresponding to the target characteristic vector, and if the server judges that only one candidate long video corresponding to the target characteristic vector exists, the server takes the candidate long video as a target long video; if the server determines that there are multiple candidate long videos corresponding to the target feature vector, step 203 is executed.
Step 203: and taking the candidate long video containing the largest number of target feature vectors as the target long video.
In this embodiment of the application, there may be a plurality of candidate long videos corresponding to the target feature vector, and the number of target feature vectors included in different candidate long videos may be different, and then the server takes the candidate long video including the largest number of target feature vectors as the target long video, so that the selection of the target long video is more accurate.
For example, if the server determines that there are n feature vectors corresponding to the target short video, where a feature vectors and feature vectors in the first candidate long video satisfy a predetermined matching relationship, that is, the number of target feature vectors in the first candidate long video is a, and b feature vectors and feature vectors in the first candidate long video satisfy the predetermined matching relationship, that is, the number of target feature vectors in the second candidate long video is b, and a > b, the server will include the candidate long video with the largest number of target feature vectors, that is, the first candidate long video as the target long video. Wherein a < n, b < n.
The method and the device for determining the target long video facilitate determination of the playing time of the target short video in the target long video.
Optionally, as shown in fig. 3, the preset database includes label vectors of all long videos;
determining the candidate long video associated with the target short video according to the target video information of the target short video comprises:
step 301: and acquiring a target label vector of the target short video according to the target video information of the target short video.
In the embodiment of the application, the server obtains the target label vector of the target short video through the label identification model.
Optionally, obtaining the target tag vector of the target short video according to the target video information of the target short video includes: performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information; inputting cover page information of the target short video into a cover page model, and outputting a cover page icon label vector; and inputting the label vector of the cover map and the text label vector into a label identification model, and outputting a target label vector of the target short video.
In the embodiment of the application, the server inputs text information of a target short video into a bert model for word vectorization processing to obtain a text label vector corresponding to the text information, the server inputs cover map information of the target short video into a cover map model and outputs the cover map label vector, and finally, the server inputs the cover map label vector and the text label vector into a label identification model and outputs the target label vector of the target short video.
The cover map model may be an Xception (exception) model, and the tag identification model may be a transform (change) model, which are not specifically limited in the present application. In addition, the word vectorization processing and the cover icon label vector extraction are not sequentially distinguished.
Step 302: and inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values.
In the embodiment of the present application, the server obtains the tag vectors of all the long videos included in the preset database, and the obtaining method is the same as the obtaining method of the target tag vector, which is not described in detail herein. And the server inputs the target label vector and the label vectors of all the long videos into a semantic similarity model and outputs a plurality of similarity values.
Step 303: and selecting the similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking the long video corresponding to the target similarity value as a candidate long video.
In the embodiment of the application, the server selects a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and takes a long video corresponding to the target similarity value as a candidate long video. The candidate long videos are selected through the method, the number of the long videos in the preset database is reduced, the retrieval range of the target short videos can be narrowed, the retrieval speed is accelerated, and the retrieval precision is improved.
Optionally, before inputting the cover picture information of the target short video into the cover picture model, the method further includes the following steps: acquiring an image identification data set, sample cover picture information and a sample cover picture label vector; and training the initial cover map model through an image recognition data set, sample cover map information, a sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
In the embodiment of the present application, the image recognition data set may be video picture data in ImageNet (image recognition database); the sample cover map information can be a cover map of a long video in a preset database; the sample cover map label vector may be a feature vector corresponding to the cover map. The server obtains an image recognition data set, sample cover image information and a sample cover image label vector, and trains an initial cover image model through the image recognition data set, the sample cover image information, the sample cover image label vector and a preset training algorithm to obtain a trained cover image model.
Based on the same technical concept, an embodiment of the present application further provides a flowchart of a video determination method, as shown in fig. 4, where the flowchart includes:
step 401: and acquiring text information and cover picture information of the target short video.
Step 402: obtaining target label vector of target short video according to text information and cover picture information
Step 403: and determining candidate long videos associated with the target short video according to the target label vectors of the target short video and the label vectors of all the long videos.
Step 404: the method comprises the steps of obtaining a plurality of video frames of a target short video, and determining a plurality of feature vectors corresponding to the video frames.
Step 405: and determining the target long video with the characteristic vectors corresponding to the target short video and meeting the preset matching conditions from the candidate long video.
Step 406: and determining the playing time of the preset frame of the target short video in the target long video.
Step 407: and playing the target long video from the initial playing time of the target long video.
Based on the same technical concept, an embodiment of the present application further provides an apparatus for video determination, as shown in fig. 5, the apparatus includes:
a first obtaining module 501, configured to obtain target video information of a target short video;
a first determining module 502, configured to determine, according to target video information of a target short video, a candidate long video associated with the target short video;
a second obtaining module 503, configured to obtain multiple video frames of the target short video, and determine multiple feature vectors corresponding to the multiple video frames;
a second determining module 504, configured to determine, from the candidate long videos, a target long video in which a plurality of feature vectors corresponding to the feature vector and the target short video satisfy a preset matching condition;
a third determining module 505, configured to determine, in the target long video, a target video frame whose feature vector is matched with a target feature vector of a preset frame of the target short video, and use a playing time corresponding to the target video frame as an initial playing time of the target long video;
the playing module 506 is configured to play the target long video from the initial playing time of the target long video when the play trigger request of the target long video is obtained.
Optionally, the second determining module 504 is specifically configured to:
determining a plurality of target characteristic vectors of which a plurality of characteristic vectors corresponding to the target short video meet a preset matching relation in a preset database;
determining candidate long videos corresponding to the target feature vectors according to the corresponding relation between the preset feature vectors and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and the feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
Optionally, the preset database includes label vectors of all long videos;
the first determining module 502 is specifically configured to:
acquiring a target label vector of a target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
and selecting the similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking the long video corresponding to the target similarity value as a candidate long video.
Optionally, the target video information includes text information and cover page information;
the first determining module 502 is specifically configured to:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover page information of the target short video into a cover page model, and outputting a cover page icon label vector;
and inputting the label vector of the cover map and the text label vector into a label identification model, and outputting a target label vector of the target short video.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring an image identification data set, sample cover picture information and a sample cover picture label vector;
and the training module is used for training the initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
According to the method provided by the embodiment of the application, a server determines a candidate long video associated with a target short video according to target video information of the target short video, then obtains a plurality of video frames of the target short video, determines a plurality of feature vectors corresponding to the plurality of video frames, determines a target long video with the feature vectors meeting preset matching conditions with the plurality of feature vectors corresponding to the target short video from the candidate long video, determines a target video frame with the feature vectors matching with the target feature vectors of the preset frames of the target short video in the target long video, takes the playing time corresponding to the target video frame as the initial playing time of the target long video, and plays the target long video from the initial playing time of the target long video when a playing trigger request of the target long video is obtained. According to the method and the device, the user can jump to the corresponding target long video from the target short video directly, and watch the target long video from the determined initial playing moment, so that the complex process of finding the long video and the playing moment is avoided, and convenience is brought to the user.
Based on the same technical concept, an embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication through the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to implement the above steps when executing the program stored in the memory 603.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In a further embodiment provided by the present invention, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of any of the methods described above.
In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method for video determination, the method comprising:
acquiring target video information of a target short video;
determining candidate long videos associated with the target short videos according to target video information of the target short videos;
acquiring a plurality of video frames of the target short video, and determining a plurality of feature vectors corresponding to the plurality of video frames;
determining a target long video with a plurality of characteristic vectors corresponding to the target short video and the characteristic vectors meeting preset matching conditions from the candidate long video;
determining a target video frame with a characteristic vector matched with a target characteristic vector of a preset frame of the target short video in the target long video, and taking the playing time corresponding to the target video frame as the initial playing time of the target long video;
and when the play triggering request of the target long video is acquired, playing the target long video from the initial play time of the target long video.
2. The method according to claim 1, wherein the determining, from the candidate long videos, a target long video in which a plurality of feature vectors corresponding to the feature vectors of the target short video satisfy a preset matching condition comprises:
determining a plurality of target feature vectors of which a plurality of feature vectors corresponding to the target short video meet a preset matching relation in a preset database;
determining candidate long videos corresponding to the target feature vector according to a corresponding relation between a preset feature vector and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
3. The method according to claim 1, wherein the preset database contains label vectors of all long videos;
the determining, according to the target video information of the target short video, the candidate long video associated with the target short video comprises:
acquiring a target label vector of the target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
selecting a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking a long video corresponding to the target similarity value as a candidate long video.
4. The method of claim 3, wherein the target video information comprises text information and cover art information;
the obtaining of the target label vector of the target short video according to the target video information of the target short video comprises:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover picture information of the target short video into a cover picture model, and outputting a cover picture label vector;
and inputting the cover map label vector and the text label vector into a label identification model, and outputting a target label vector of the target short video.
5. The method of claim 4, wherein prior to entering the cover art information of the target short video into the cover art model, the method further comprises:
acquiring an image identification dataset, sample cover picture information and a sample cover picture label vector;
and training an initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
6. A video determination apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring target video information of a target short video;
the first determination module is used for determining candidate long videos related to the target short videos according to target video information of the target short videos;
the second acquisition module is used for acquiring a plurality of video frames of the target short video and determining a plurality of feature vectors corresponding to the plurality of video frames;
the second determining module is used for determining a target long video with a plurality of characteristic vectors corresponding to the target short video and the characteristic vectors meeting preset matching conditions from the candidate long video;
a third determining module, configured to determine, in the target long video, a target video frame whose feature vector matches a target feature vector of a preset frame of the target short video, and use a play time corresponding to the target video frame as an initial play time of the target long video;
and the playing module is used for playing the target long video from the initial playing time of the target long video when the playing triggering request of the target long video is obtained.
7. The apparatus of claim 6, wherein the second determining module is specifically configured to:
determining a plurality of target feature vectors of which a plurality of feature vectors corresponding to the target short video meet a preset matching relationship in the preset database;
determining candidate long videos corresponding to the target feature vectors according to the corresponding relation between preset feature vectors and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and the feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
8. The apparatus according to claim 6, wherein the preset database contains label vectors of all long videos;
the first determining module is specifically configured to:
acquiring a target label vector of the target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
selecting a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking a long video corresponding to the target similarity value as a candidate long video.
9. The apparatus of claim 8, wherein the target video information comprises text information and cover art information;
the first determining module is specifically configured to:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover picture information of the target short video into a cover picture model, and outputting a cover picture label vector;
and inputting the cover map label vector and the text label vector into a label identification model, and outputting a target label vector of the target short video.
10. The apparatus of claim 9, further comprising:
the third acquisition module is used for acquiring an image identification data set, sample cover picture information and a sample cover picture label vector;
and the training module is used for training an initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010567117.3A CN111767814A (en) | 2020-06-19 | 2020-06-19 | Video determination method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010567117.3A CN111767814A (en) | 2020-06-19 | 2020-06-19 | Video determination method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111767814A true CN111767814A (en) | 2020-10-13 |
Family
ID=72721430
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010567117.3A Pending CN111767814A (en) | 2020-06-19 | 2020-06-19 | Video determination method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111767814A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149568A (en) * | 2020-09-23 | 2020-12-29 | 创新奇智(合肥)科技有限公司 | Short video positioning method and device, electronic equipment and computer readable storage medium |
CN112235625A (en) * | 2020-10-14 | 2021-01-15 | 广州欢网科技有限责任公司 | Method and system for tracing source of short video feature film of television terminal and television terminal |
CN112612918A (en) * | 2020-12-16 | 2021-04-06 | 北京字节跳动网络技术有限公司 | Video resource mapping method, device, equipment and medium |
CN112632323A (en) * | 2020-12-16 | 2021-04-09 | 北京字节跳动网络技术有限公司 | Video playing method, device, equipment and medium |
CN112702624A (en) * | 2020-12-22 | 2021-04-23 | 山东鲁能软件技术有限公司 | Method, system, medium and device for optimizing short video playing efficiency |
CN112929692A (en) * | 2021-01-26 | 2021-06-08 | 广州欢网科技有限责任公司 | Video tracing method and device suitable for short video |
CN113407781A (en) * | 2021-06-18 | 2021-09-17 | 湖南快乐阳光互动娱乐传媒有限公司 | Video searching method, system, server and client |
CN113722542A (en) * | 2021-08-31 | 2021-11-30 | 青岛聚看云科技有限公司 | Video recommendation method and display device |
CN114040216A (en) * | 2021-11-03 | 2022-02-11 | 杭州网易云音乐科技有限公司 | Live broadcast room recommendation method, medium, device and computing equipment |
CN114679621A (en) * | 2021-05-07 | 2022-06-28 | 腾讯云计算(北京)有限责任公司 | Video display method and device and terminal equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108024145A (en) * | 2017-12-07 | 2018-05-11 | 北京百度网讯科技有限公司 | Video recommendation method, device, computer equipment and storage medium |
CN109040775A (en) * | 2018-08-24 | 2018-12-18 | 深圳创维-Rgb电子有限公司 | Video correlating method, device and computer readable storage medium |
CN109982106A (en) * | 2019-04-29 | 2019-07-05 | 百度在线网络技术(北京)有限公司 | A kind of video recommendation method, server, client and electronic equipment |
CN110020093A (en) * | 2019-04-08 | 2019-07-16 | 深圳市网心科技有限公司 | Video retrieval method, edge device, video frequency searching device and storage medium |
CN110278449A (en) * | 2019-06-26 | 2019-09-24 | 腾讯科技(深圳)有限公司 | A kind of video detecting method, device, equipment and medium |
CN110290419A (en) * | 2019-06-25 | 2019-09-27 | 北京奇艺世纪科技有限公司 | Video broadcasting method, device and electronic equipment |
CN110413837A (en) * | 2019-05-30 | 2019-11-05 | 腾讯科技(深圳)有限公司 | Video recommendation method and device |
CN111191078A (en) * | 2020-01-08 | 2020-05-22 | 腾讯科技(深圳)有限公司 | Video information processing method and device based on video information processing model |
-
2020
- 2020-06-19 CN CN202010567117.3A patent/CN111767814A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108024145A (en) * | 2017-12-07 | 2018-05-11 | 北京百度网讯科技有限公司 | Video recommendation method, device, computer equipment and storage medium |
CN109040775A (en) * | 2018-08-24 | 2018-12-18 | 深圳创维-Rgb电子有限公司 | Video correlating method, device and computer readable storage medium |
CN110020093A (en) * | 2019-04-08 | 2019-07-16 | 深圳市网心科技有限公司 | Video retrieval method, edge device, video frequency searching device and storage medium |
CN109982106A (en) * | 2019-04-29 | 2019-07-05 | 百度在线网络技术(北京)有限公司 | A kind of video recommendation method, server, client and electronic equipment |
CN110413837A (en) * | 2019-05-30 | 2019-11-05 | 腾讯科技(深圳)有限公司 | Video recommendation method and device |
CN110290419A (en) * | 2019-06-25 | 2019-09-27 | 北京奇艺世纪科技有限公司 | Video broadcasting method, device and electronic equipment |
CN110278449A (en) * | 2019-06-26 | 2019-09-24 | 腾讯科技(深圳)有限公司 | A kind of video detecting method, device, equipment and medium |
CN111191078A (en) * | 2020-01-08 | 2020-05-22 | 腾讯科技(深圳)有限公司 | Video information processing method and device based on video information processing model |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112149568A (en) * | 2020-09-23 | 2020-12-29 | 创新奇智(合肥)科技有限公司 | Short video positioning method and device, electronic equipment and computer readable storage medium |
CN112235625A (en) * | 2020-10-14 | 2021-01-15 | 广州欢网科技有限责任公司 | Method and system for tracing source of short video feature film of television terminal and television terminal |
WO2022127523A1 (en) * | 2020-12-16 | 2022-06-23 | 北京字节跳动网络技术有限公司 | Video playback method and apparatus, device, and medium |
CN112632323A (en) * | 2020-12-16 | 2021-04-09 | 北京字节跳动网络技术有限公司 | Video playing method, device, equipment and medium |
CN112612918A (en) * | 2020-12-16 | 2021-04-06 | 北京字节跳动网络技术有限公司 | Video resource mapping method, device, equipment and medium |
CN112702624A (en) * | 2020-12-22 | 2021-04-23 | 山东鲁能软件技术有限公司 | Method, system, medium and device for optimizing short video playing efficiency |
CN112702624B (en) * | 2020-12-22 | 2023-04-07 | 山东鲁软数字科技有限公司 | Method, system, medium and device for optimizing short video playing efficiency |
CN112929692A (en) * | 2021-01-26 | 2021-06-08 | 广州欢网科技有限责任公司 | Video tracing method and device suitable for short video |
CN114679621A (en) * | 2021-05-07 | 2022-06-28 | 腾讯云计算(北京)有限责任公司 | Video display method and device and terminal equipment |
CN113407781A (en) * | 2021-06-18 | 2021-09-17 | 湖南快乐阳光互动娱乐传媒有限公司 | Video searching method, system, server and client |
CN113722542A (en) * | 2021-08-31 | 2021-11-30 | 青岛聚看云科技有限公司 | Video recommendation method and display device |
CN114040216A (en) * | 2021-11-03 | 2022-02-11 | 杭州网易云音乐科技有限公司 | Live broadcast room recommendation method, medium, device and computing equipment |
CN114040216B (en) * | 2021-11-03 | 2023-07-11 | 杭州网易云音乐科技有限公司 | Live broadcast room recommendation method, medium, device and computing equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111767814A (en) | Video determination method and device | |
CN108810642B (en) | Bullet screen display method and device and electronic equipment | |
CN110913241B (en) | Video retrieval method and device, electronic equipment and storage medium | |
CN110991187A (en) | Entity linking method, device, electronic equipment and medium | |
US7904452B2 (en) | Information providing server, information providing method, and information providing system | |
US20180004760A1 (en) | Content-based video recommendation | |
CN113297891A (en) | Video information processing method and device and electronic equipment | |
CN110337011A (en) | Method for processing video frequency, device and equipment | |
CN111522996A (en) | Video clip retrieval method and device | |
CN112507163B (en) | Duration prediction model training method, recommendation method, device, equipment and medium | |
CN113806588B (en) | Method and device for searching video | |
US20150189384A1 (en) | Presenting information based on a video | |
CN110347866B (en) | Information processing method, information processing device, storage medium and electronic equipment | |
CN108197336B (en) | Video searching method and device | |
CN107592572B (en) | Video recommendation method, device and equipment | |
CN109948057B (en) | Interested content pushing method and device, electronic equipment and medium | |
CN110674345A (en) | Video searching method and device and server | |
CN112487300B (en) | Video recommendation method and device, electronic equipment and storage medium | |
TW200834355A (en) | Information processing apparatus and method, and program | |
WO2005067295A1 (en) | Method and apparatus for content recommendation | |
CN115687690A (en) | Video recommendation method and device, electronic equipment and storage medium | |
CN115834959A (en) | Video recommendation information determination method and device, electronic equipment and medium | |
CN107688587B (en) | Media information display method and device | |
CN113472834A (en) | Object pushing method and device | |
CN113127686B (en) | Video searching method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201013 |