CN111767814A - Video determination method and device - Google Patents

Video determination method and device Download PDF

Info

Publication number
CN111767814A
CN111767814A CN202010567117.3A CN202010567117A CN111767814A CN 111767814 A CN111767814 A CN 111767814A CN 202010567117 A CN202010567117 A CN 202010567117A CN 111767814 A CN111767814 A CN 111767814A
Authority
CN
China
Prior art keywords
target
video
long
information
videos
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010567117.3A
Other languages
Chinese (zh)
Inventor
陈博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202010567117.3A priority Critical patent/CN111767814A/en
Publication of CN111767814A publication Critical patent/CN111767814A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/748Hypervideo
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A video determination method and device are provided, and the method comprises the following steps: acquiring target video information of a target short video; determining candidate long videos associated with the target short videos according to the target video information of the target short videos; acquiring a plurality of video frames of a target short video, and determining a plurality of feature vectors corresponding to the plurality of video frames; determining a target long video with a plurality of characteristic vectors corresponding to the characteristic vectors and the target short video meeting preset matching conditions from the candidate long video; in the target long video, determining a target video frame with a characteristic vector matched with a target characteristic vector of a preset frame of the target short video, and taking the playing time corresponding to the target video frame as the initial playing time of the target long video; and when the play trigger request of the target long video is acquired, playing the target long video from the initial play time of the target long video. According to the method and the device, the target short video can be directly jumped to the target long video from the target short video, and the target long video is watched from the initial playing time of the target long video, so that a complex searching process is avoided.

Description

Video determination method and device
Technical Field
The present application relates to the field of video technologies, and in particular, to a method and an apparatus for determining a video.
Background
In recent years, short video information streams have rapidly emerged, and if a user is interested in a short video during watching the short video, the idea of watching a long video corresponding to the short video is initiated.
In the current video information stream, the short video and the long video are separated, and if a user wants to watch the corresponding long video according to the short video, the user needs to search the long video by himself, so that the searching process is complicated.
Disclosure of Invention
To solve the above technical problem or at least partially solve the above technical problem, the present application provides a video determination method and apparatus.
In a first aspect, the present application provides a video determination method, including:
acquiring target video information of a target short video;
determining candidate long videos associated with the target short videos according to target video information of the target short videos;
acquiring a plurality of video frames of the target short video, and determining a plurality of feature vectors corresponding to the plurality of video frames;
determining a target long video with a plurality of characteristic vectors corresponding to the target short video and the characteristic vectors meeting preset matching conditions from the candidate long video;
determining a target video frame with a characteristic vector matched with a target characteristic vector of a preset frame of the target short video in the target long video, and taking the playing time corresponding to the target video frame as the initial playing time of the target long video;
and when the play triggering request of the target long video is acquired, playing the target long video from the initial play time of the target long video.
Optionally, the determining, from the candidate long videos, a target long video in which a plurality of feature vectors corresponding to the feature vectors and the target short video satisfy a preset matching condition includes:
determining a plurality of target feature vectors of which a plurality of feature vectors corresponding to the target short video meet a preset matching relation in a preset database;
determining candidate long videos corresponding to the target feature vector according to a corresponding relation between a preset feature vector and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
Optionally, the preset database includes label vectors of all long videos;
the determining, according to the target video information of the target short video, the candidate long video associated with the target short video comprises:
acquiring a target label vector of the target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
selecting a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking a long video corresponding to the target similarity value as a candidate long video.
Optionally, the target video information includes text information and cover picture information;
the obtaining of the target label vector of the target short video according to the target video information of the target short video comprises:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover picture information of the target short video into a cover picture model, and outputting a cover picture label vector;
and inputting the cover map label vector and the text label vector into a label identification model, and outputting a target label vector of the target short video.
Optionally, before inputting the cover map information of the target short video into the cover map model, the method further includes:
acquiring an image identification dataset, sample cover picture information and a sample cover picture label vector;
and training an initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
In a second aspect, an embodiment of the present application provides a video determining apparatus, including:
the first acquisition module is used for acquiring target video information of a target short video;
the first determination module is used for determining candidate long videos related to the target short videos according to target video information of the target short videos;
the second acquisition module is used for acquiring a plurality of video frames of the target short video and determining a plurality of feature vectors corresponding to the plurality of video frames;
the second determining module is used for determining a target long video with a plurality of characteristic vectors corresponding to the target short video and the characteristic vectors meeting preset matching conditions from the candidate long video;
a third determining module, configured to determine, in the target long video, a target video frame whose feature vector matches a target feature vector of a preset frame of the target short video, and use a play time corresponding to the target video frame as an initial play time of the target long video;
and the playing module is used for playing the target long video from the initial playing time of the target long video when the playing triggering request of the target long video is obtained.
Optionally, the second determining module is specifically configured to:
determining a plurality of target feature vectors of which a plurality of feature vectors corresponding to the target short video meet a preset matching relationship in the preset database;
determining candidate long videos corresponding to the target feature vectors according to the corresponding relation between preset feature vectors and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and the feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
Optionally, the preset database includes label vectors of all long videos;
the first determining module is specifically configured to:
acquiring a target label vector of the target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
selecting a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking a long video corresponding to the target similarity value as a candidate long video.
Optionally, the target video information includes text information and cover picture information;
the first determining module is specifically configured to:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover picture information of the target short video into a cover picture model, and outputting a cover picture label vector;
and inputting the cover map label vector and the text label vector into a label identification model, and outputting a target label vector of the target short video.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring an image identification data set, sample cover picture information and a sample cover picture label vector;
and the training module is used for training an initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
according to the method provided by the embodiment of the application, a server determines a candidate long video associated with a target short video according to target video information of the target short video, then obtains a plurality of video frames of the target short video, determines a plurality of feature vectors corresponding to the plurality of video frames, determines a target long video with the feature vectors meeting preset matching conditions with the plurality of feature vectors corresponding to the target short video from the candidate long video, determines a target video frame with the feature vectors matching with the target feature vectors of the preset frames of the target short video in the target long video, takes the playing time corresponding to the target video frame as the initial playing time of the target long video, and plays the target long video from the initial playing time of the target long video when a playing trigger request of the target long video is obtained. According to the method and the device, the user can jump to the corresponding target long video from the target short video directly, and watch the target long video from the determined initial playing moment, so that the complex process of finding the long video and the playing moment is avoided, and convenience is brought to the user.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a flowchart of a method for determining a video according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a method for determining a target long video according to an embodiment of the present disclosure;
fig. 3 is a flowchart for determining candidate long videos according to an embodiment of the present disclosure;
fig. 4 is a flowchart of a video determining method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an apparatus for video determination according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a video determining method, which can be applied to a server and used for determining the initial playing time of a short video in a corresponding long video.
A detailed description will be given below of a video determining method provided in an embodiment of the present application with reference to a specific implementation manner, as shown in fig. 1, the specific steps are as follows:
step 101: and acquiring target video information of the target short video.
In the embodiment of the application, the server acquires target video information of the target short video, wherein the target video information can be text information and cover picture information of the target short video. Specifically, the text information is a text label vector, and the cover map information is a cover map label vector. The process of the server for acquiring the text label vector and the cover map label vector comprises the following steps: the server inputs text information of the target short video, such as title, scenario brief description, recommendation information and the like, into a text model to perform word vectorization processing to obtain a text label vector corresponding to the text information, and the server inputs cover page information of the target short video into a cover page model and outputs the cover page label vector.
The text model may be a bag-of-words model, the cover map model may be an Xception (exception) model, and the application does not specifically limit the cover map model and the text model. In addition, the text label vector and the cover icon label vector are not sequentially acquired.
In the application, a BERT (bidirectional Encoder reproducing from transforms) model can be adopted as a text model in the aspect of video, a general text model is used in the movie and TV play or entertainment field when being trained on text content, and text in other fields is lack of training, so that the text model is deficient in text semantic understanding of the general field. Step 102: and determining candidate long videos associated with the target short videos according to the target video information of the target short videos.
In the embodiment of the application, the preset database comprises a plurality of long videos and label vectors of the long videos, the server inputs text label vectors and cover icon label vectors of target short videos into the label identification model, and outputs the target label vectors of the target short videos. The server inputs a target label vector of a target short video and label vectors of a plurality of long videos into a semantic similarity model and outputs a plurality of similarity values, then selects a similarity value higher than a preset threshold value from the similarity values, takes the similarity value as a target similarity value, and takes the long video corresponding to the target similarity value as a candidate long video.
Step 103: the method comprises the steps of obtaining a plurality of video frames of a target short video, and determining a plurality of feature vectors corresponding to the video frames.
In the embodiment of the application, the server divides the target short video into frames to obtain a plurality of video frames, inputs the video frames of the target short video into the preset neural network, and outputs the feature vectors corresponding to the target video frames. The video frame input to the preset neural network may be all video frames of the target short video or a partial video frame of the target short video. Specifically, the preset neural network may be an Xception model, and the preset neural network is not specifically limited in the present application.
Step 104: and determining the target long video with the characteristic vectors corresponding to the target short video and meeting the preset matching conditions from the candidate long video.
In the embodiment of the present application, the preset database includes a plurality of candidate long videos and feature vectors of the candidate long videos, and an obtaining process of the feature vectors of the candidate long videos is the same as an obtaining process of the feature vectors of the target short videos, which is not described in detail herein.
After the server determines a plurality of characteristic vectors of the target short video, a plurality of characteristic vectors meeting a preset matching relation with the plurality of characteristic vectors of the target short video are determined in a preset database through a quantitative retrieval technology, candidate long videos corresponding to the plurality of characteristic vectors meeting the preset matching relation are determined, and the candidate long videos are used as the target long videos.
Step 105: in the target long video, determining a target video frame with a characteristic vector matched with a target characteristic vector of a preset frame of the target short video, and taking the playing time corresponding to the target video frame as the initial playing time of the target long video.
In the embodiment of the application, a server determines a preset frame in a plurality of video frames of a target short video, obtains a target feature vector corresponding to the preset frame, determines a feature vector matched with the target feature vector in a target long video, and takes the video frame corresponding to the matched feature vector as the target video frame.
The preset frame may be a starting frame or an ending frame of the target short video, or an n-th frame, and the playing time corresponding to the target video frame is the playing time of the preset frame of the target short video and is also the starting playing time of the target long video. If the preset frame is the initial frame of the target short video, starting playing from the initial playing time of the target short video in the target long video, and enabling a user to watch the target short video again when watching the target long video; if the preset frame is the ending frame of the target short video, starting playing from the last frame of the target short video in the target long video, and watching other video contents behind the target short video in the target long video by a user; if the preset frame is the n-th frame from the last to the last of the target short video, the target long video is played from the n-th frame from the last to the last of the target short video, and the user watches the ending video of the target short video and continues to watch the subsequent video content in the target long video.
Step 106: and when the play trigger request of the target long video is acquired, playing the target long video from the initial play time of the target long video.
In the embodiment of the application, after determining the target long video corresponding to the target short video and the playing time of the preset frame of the target short video in the target long video, when the client detects the playing triggering request of the target long video, the server automatically plays the target long video from the initial playing time of the target long video.
Specifically, the client detects the play triggering request of the target long video, and may be that the target long video is automatically played after the target short video is played, or after the target short video is played, if the user wants to watch the corresponding target long video, the user clicks a play button of the target long video in the video playing interface, and then the play of the target long video is triggered.
Specifically, when the client detects a playing instruction of the target short video, the server obtains a text tag vector and a cover map tag vector of the target short video, and determines a candidate long video associated with the target short video according to the text tag vector and the cover map tag vector. Then the server obtains a plurality of video frames of the target short video, determines a plurality of feature vectors corresponding to the plurality of video frames, then determines feature vectors meeting preset matching conditions of the plurality of feature vectors corresponding to the target short video in a preset database, takes the video corresponding to the feature vectors as the target long video, determines feature vectors matched with the target feature vectors of the preset frames of the target short video in the target long video, takes the video frames corresponding to the feature vectors as the target video frames, takes the playing time corresponding to the target video frames as the initial playing time of the target long video, and when the client side obtains a playing triggering request of the target long video, the client side plays the target long video from the initial playing time.
The following provides an example of a video determination method, and the specific content is as follows: if the target short video is the short video of the integrated art program AAA, the server determines that the candidate long video is the video of the first season to the fifth season of the integrated art program AAA according to the target video information of the short video, and the preset database comprises all complete long videos of the AAA from the first season to the fifth season and feature vectors of the long videos. The server firstly frames the short video, determines a plurality of feature vectors corresponding to a plurality of video frames, then determines the feature vectors meeting preset matching conditions in a preset database according to the plurality of feature vectors of the short video, and determines the long video corresponding to the short video as AAA (authentication, authorization and accounting) period III in the fifth season. And then the server determines the starting frame of the short video, and determines that the playing time of the starting frame in the AAA fifth season third long video is 28 minutes and 20 seconds, and when the short video is jumped to the long video, the playing is started from 28 minutes and 20 seconds in the AAA fifth season third long video.
The method and the device can facilitate the user to directly jump to the initial playing time of the corresponding long video from the target short video for watching, avoid the complex process of finding the initial playing time of the short video in the long video, and bring convenience to the user.
Optionally, as shown in fig. 2, determining, from the candidate long videos, a target long video in which a plurality of feature vectors corresponding to the feature vector and the target short video satisfy a preset matching condition includes:
step 201: and determining a plurality of target feature vectors corresponding to the target short video in a preset database, wherein the plurality of feature vectors meet a preset matching relationship.
In the embodiment of the application, the preset database comprises a plurality of feature vectors of candidate long videos, and the server determines a plurality of target feature vectors corresponding to the target short videos in the preset database, wherein the plurality of feature vectors meet a preset matching relationship. The preset matching relation is that the matching degree between the feature vectors is higher than a matching threshold value.
Step 202: and determining the candidate long video corresponding to the target characteristic vector according to the corresponding relation between the preset characteristic vector and the candidate long video.
In the embodiment of the application, after the server determines the target feature vector, the candidate long video corresponding to the target feature vector is determined according to the corresponding relation between the preset feature vector and the candidate long video. The server judges the number of candidate long videos corresponding to the target characteristic vector, and if the server judges that only one candidate long video corresponding to the target characteristic vector exists, the server takes the candidate long video as a target long video; if the server determines that there are multiple candidate long videos corresponding to the target feature vector, step 203 is executed.
Step 203: and taking the candidate long video containing the largest number of target feature vectors as the target long video.
In this embodiment of the application, there may be a plurality of candidate long videos corresponding to the target feature vector, and the number of target feature vectors included in different candidate long videos may be different, and then the server takes the candidate long video including the largest number of target feature vectors as the target long video, so that the selection of the target long video is more accurate.
For example, if the server determines that there are n feature vectors corresponding to the target short video, where a feature vectors and feature vectors in the first candidate long video satisfy a predetermined matching relationship, that is, the number of target feature vectors in the first candidate long video is a, and b feature vectors and feature vectors in the first candidate long video satisfy the predetermined matching relationship, that is, the number of target feature vectors in the second candidate long video is b, and a > b, the server will include the candidate long video with the largest number of target feature vectors, that is, the first candidate long video as the target long video. Wherein a < n, b < n.
The method and the device for determining the target long video facilitate determination of the playing time of the target short video in the target long video.
Optionally, as shown in fig. 3, the preset database includes label vectors of all long videos;
determining the candidate long video associated with the target short video according to the target video information of the target short video comprises:
step 301: and acquiring a target label vector of the target short video according to the target video information of the target short video.
In the embodiment of the application, the server obtains the target label vector of the target short video through the label identification model.
Optionally, obtaining the target tag vector of the target short video according to the target video information of the target short video includes: performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information; inputting cover page information of the target short video into a cover page model, and outputting a cover page icon label vector; and inputting the label vector of the cover map and the text label vector into a label identification model, and outputting a target label vector of the target short video.
In the embodiment of the application, the server inputs text information of a target short video into a bert model for word vectorization processing to obtain a text label vector corresponding to the text information, the server inputs cover map information of the target short video into a cover map model and outputs the cover map label vector, and finally, the server inputs the cover map label vector and the text label vector into a label identification model and outputs the target label vector of the target short video.
The cover map model may be an Xception (exception) model, and the tag identification model may be a transform (change) model, which are not specifically limited in the present application. In addition, the word vectorization processing and the cover icon label vector extraction are not sequentially distinguished.
Step 302: and inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values.
In the embodiment of the present application, the server obtains the tag vectors of all the long videos included in the preset database, and the obtaining method is the same as the obtaining method of the target tag vector, which is not described in detail herein. And the server inputs the target label vector and the label vectors of all the long videos into a semantic similarity model and outputs a plurality of similarity values.
Step 303: and selecting the similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking the long video corresponding to the target similarity value as a candidate long video.
In the embodiment of the application, the server selects a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and takes a long video corresponding to the target similarity value as a candidate long video. The candidate long videos are selected through the method, the number of the long videos in the preset database is reduced, the retrieval range of the target short videos can be narrowed, the retrieval speed is accelerated, and the retrieval precision is improved.
Optionally, before inputting the cover picture information of the target short video into the cover picture model, the method further includes the following steps: acquiring an image identification data set, sample cover picture information and a sample cover picture label vector; and training the initial cover map model through an image recognition data set, sample cover map information, a sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
In the embodiment of the present application, the image recognition data set may be video picture data in ImageNet (image recognition database); the sample cover map information can be a cover map of a long video in a preset database; the sample cover map label vector may be a feature vector corresponding to the cover map. The server obtains an image recognition data set, sample cover image information and a sample cover image label vector, and trains an initial cover image model through the image recognition data set, the sample cover image information, the sample cover image label vector and a preset training algorithm to obtain a trained cover image model.
Based on the same technical concept, an embodiment of the present application further provides a flowchart of a video determination method, as shown in fig. 4, where the flowchart includes:
step 401: and acquiring text information and cover picture information of the target short video.
Step 402: obtaining target label vector of target short video according to text information and cover picture information
Step 403: and determining candidate long videos associated with the target short video according to the target label vectors of the target short video and the label vectors of all the long videos.
Step 404: the method comprises the steps of obtaining a plurality of video frames of a target short video, and determining a plurality of feature vectors corresponding to the video frames.
Step 405: and determining the target long video with the characteristic vectors corresponding to the target short video and meeting the preset matching conditions from the candidate long video.
Step 406: and determining the playing time of the preset frame of the target short video in the target long video.
Step 407: and playing the target long video from the initial playing time of the target long video.
Based on the same technical concept, an embodiment of the present application further provides an apparatus for video determination, as shown in fig. 5, the apparatus includes:
a first obtaining module 501, configured to obtain target video information of a target short video;
a first determining module 502, configured to determine, according to target video information of a target short video, a candidate long video associated with the target short video;
a second obtaining module 503, configured to obtain multiple video frames of the target short video, and determine multiple feature vectors corresponding to the multiple video frames;
a second determining module 504, configured to determine, from the candidate long videos, a target long video in which a plurality of feature vectors corresponding to the feature vector and the target short video satisfy a preset matching condition;
a third determining module 505, configured to determine, in the target long video, a target video frame whose feature vector is matched with a target feature vector of a preset frame of the target short video, and use a playing time corresponding to the target video frame as an initial playing time of the target long video;
the playing module 506 is configured to play the target long video from the initial playing time of the target long video when the play trigger request of the target long video is obtained.
Optionally, the second determining module 504 is specifically configured to:
determining a plurality of target characteristic vectors of which a plurality of characteristic vectors corresponding to the target short video meet a preset matching relation in a preset database;
determining candidate long videos corresponding to the target feature vectors according to the corresponding relation between the preset feature vectors and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and the feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
Optionally, the preset database includes label vectors of all long videos;
the first determining module 502 is specifically configured to:
acquiring a target label vector of a target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
and selecting the similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking the long video corresponding to the target similarity value as a candidate long video.
Optionally, the target video information includes text information and cover page information;
the first determining module 502 is specifically configured to:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover page information of the target short video into a cover page model, and outputting a cover page icon label vector;
and inputting the label vector of the cover map and the text label vector into a label identification model, and outputting a target label vector of the target short video.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring an image identification data set, sample cover picture information and a sample cover picture label vector;
and the training module is used for training the initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
According to the method provided by the embodiment of the application, a server determines a candidate long video associated with a target short video according to target video information of the target short video, then obtains a plurality of video frames of the target short video, determines a plurality of feature vectors corresponding to the plurality of video frames, determines a target long video with the feature vectors meeting preset matching conditions with the plurality of feature vectors corresponding to the target short video from the candidate long video, determines a target video frame with the feature vectors matching with the target feature vectors of the preset frames of the target short video in the target long video, takes the playing time corresponding to the target video frame as the initial playing time of the target long video, and plays the target long video from the initial playing time of the target long video when a playing trigger request of the target long video is obtained. According to the method and the device, the user can jump to the corresponding target long video from the target short video directly, and watch the target long video from the determined initial playing moment, so that the complex process of finding the long video and the playing moment is avoided, and convenience is brought to the user.
Based on the same technical concept, an embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication through the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to implement the above steps when executing the program stored in the memory 603.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In a further embodiment provided by the present invention, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of any of the methods described above.
In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for video determination, the method comprising:
acquiring target video information of a target short video;
determining candidate long videos associated with the target short videos according to target video information of the target short videos;
acquiring a plurality of video frames of the target short video, and determining a plurality of feature vectors corresponding to the plurality of video frames;
determining a target long video with a plurality of characteristic vectors corresponding to the target short video and the characteristic vectors meeting preset matching conditions from the candidate long video;
determining a target video frame with a characteristic vector matched with a target characteristic vector of a preset frame of the target short video in the target long video, and taking the playing time corresponding to the target video frame as the initial playing time of the target long video;
and when the play triggering request of the target long video is acquired, playing the target long video from the initial play time of the target long video.
2. The method according to claim 1, wherein the determining, from the candidate long videos, a target long video in which a plurality of feature vectors corresponding to the feature vectors of the target short video satisfy a preset matching condition comprises:
determining a plurality of target feature vectors of which a plurality of feature vectors corresponding to the target short video meet a preset matching relation in a preset database;
determining candidate long videos corresponding to the target feature vector according to a corresponding relation between a preset feature vector and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
3. The method according to claim 1, wherein the preset database contains label vectors of all long videos;
the determining, according to the target video information of the target short video, the candidate long video associated with the target short video comprises:
acquiring a target label vector of the target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
selecting a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking a long video corresponding to the target similarity value as a candidate long video.
4. The method of claim 3, wherein the target video information comprises text information and cover art information;
the obtaining of the target label vector of the target short video according to the target video information of the target short video comprises:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover picture information of the target short video into a cover picture model, and outputting a cover picture label vector;
and inputting the cover map label vector and the text label vector into a label identification model, and outputting a target label vector of the target short video.
5. The method of claim 4, wherein prior to entering the cover art information of the target short video into the cover art model, the method further comprises:
acquiring an image identification dataset, sample cover picture information and a sample cover picture label vector;
and training an initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
6. A video determination apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring target video information of a target short video;
the first determination module is used for determining candidate long videos related to the target short videos according to target video information of the target short videos;
the second acquisition module is used for acquiring a plurality of video frames of the target short video and determining a plurality of feature vectors corresponding to the plurality of video frames;
the second determining module is used for determining a target long video with a plurality of characteristic vectors corresponding to the target short video and the characteristic vectors meeting preset matching conditions from the candidate long video;
a third determining module, configured to determine, in the target long video, a target video frame whose feature vector matches a target feature vector of a preset frame of the target short video, and use a play time corresponding to the target video frame as an initial play time of the target long video;
and the playing module is used for playing the target long video from the initial playing time of the target long video when the playing triggering request of the target long video is obtained.
7. The apparatus of claim 6, wherein the second determining module is specifically configured to:
determining a plurality of target feature vectors of which a plurality of feature vectors corresponding to the target short video meet a preset matching relationship in the preset database;
determining candidate long videos corresponding to the target feature vectors according to the corresponding relation between preset feature vectors and the candidate long videos, wherein the preset database comprises a plurality of candidate long videos and the feature vectors of the candidate long videos;
and taking the candidate long video containing the largest number of target feature vectors as the target long video.
8. The apparatus according to claim 6, wherein the preset database contains label vectors of all long videos;
the first determining module is specifically configured to:
acquiring a target label vector of the target short video according to target video information of the target short video;
inputting the target label vector and the label vectors of all the long videos into a semantic similarity model, and outputting a plurality of similarity values;
selecting a similarity value higher than a preset threshold value from the similarity values as a target similarity value, and taking a long video corresponding to the target similarity value as a candidate long video.
9. The apparatus of claim 8, wherein the target video information comprises text information and cover art information;
the first determining module is specifically configured to:
performing word vectorization processing on text information of a target short video to obtain a text label vector corresponding to the text information;
inputting cover picture information of the target short video into a cover picture model, and outputting a cover picture label vector;
and inputting the cover map label vector and the text label vector into a label identification model, and outputting a target label vector of the target short video.
10. The apparatus of claim 9, further comprising:
the third acquisition module is used for acquiring an image identification data set, sample cover picture information and a sample cover picture label vector;
and the training module is used for training an initial cover map model through the image recognition data set, the sample cover map information, the sample cover map label vector and a preset training algorithm to obtain a trained cover map model.
CN202010567117.3A 2020-06-19 2020-06-19 Video determination method and device Pending CN111767814A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010567117.3A CN111767814A (en) 2020-06-19 2020-06-19 Video determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010567117.3A CN111767814A (en) 2020-06-19 2020-06-19 Video determination method and device

Publications (1)

Publication Number Publication Date
CN111767814A true CN111767814A (en) 2020-10-13

Family

ID=72721430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010567117.3A Pending CN111767814A (en) 2020-06-19 2020-06-19 Video determination method and device

Country Status (1)

Country Link
CN (1) CN111767814A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149568A (en) * 2020-09-23 2020-12-29 创新奇智(合肥)科技有限公司 Short video positioning method and device, electronic equipment and computer readable storage medium
CN112235625A (en) * 2020-10-14 2021-01-15 广州欢网科技有限责任公司 Method and system for tracing source of short video feature film of television terminal and television terminal
CN112612918A (en) * 2020-12-16 2021-04-06 北京字节跳动网络技术有限公司 Video resource mapping method, device, equipment and medium
CN112632323A (en) * 2020-12-16 2021-04-09 北京字节跳动网络技术有限公司 Video playing method, device, equipment and medium
CN112702624A (en) * 2020-12-22 2021-04-23 山东鲁能软件技术有限公司 Method, system, medium and device for optimizing short video playing efficiency
CN112929692A (en) * 2021-01-26 2021-06-08 广州欢网科技有限责任公司 Video tracing method and device suitable for short video
CN113407781A (en) * 2021-06-18 2021-09-17 湖南快乐阳光互动娱乐传媒有限公司 Video searching method, system, server and client
CN113722542A (en) * 2021-08-31 2021-11-30 青岛聚看云科技有限公司 Video recommendation method and display device
CN114040216A (en) * 2021-11-03 2022-02-11 杭州网易云音乐科技有限公司 Live broadcast room recommendation method, medium, device and computing equipment
CN114679621A (en) * 2021-05-07 2022-06-28 腾讯云计算(北京)有限责任公司 Video display method and device and terminal equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108024145A (en) * 2017-12-07 2018-05-11 北京百度网讯科技有限公司 Video recommendation method, device, computer equipment and storage medium
CN109040775A (en) * 2018-08-24 2018-12-18 深圳创维-Rgb电子有限公司 Video correlating method, device and computer readable storage medium
CN109982106A (en) * 2019-04-29 2019-07-05 百度在线网络技术(北京)有限公司 A kind of video recommendation method, server, client and electronic equipment
CN110020093A (en) * 2019-04-08 2019-07-16 深圳市网心科技有限公司 Video retrieval method, edge device, video frequency searching device and storage medium
CN110278449A (en) * 2019-06-26 2019-09-24 腾讯科技(深圳)有限公司 A kind of video detecting method, device, equipment and medium
CN110290419A (en) * 2019-06-25 2019-09-27 北京奇艺世纪科技有限公司 Video broadcasting method, device and electronic equipment
CN110413837A (en) * 2019-05-30 2019-11-05 腾讯科技(深圳)有限公司 Video recommendation method and device
CN111191078A (en) * 2020-01-08 2020-05-22 腾讯科技(深圳)有限公司 Video information processing method and device based on video information processing model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108024145A (en) * 2017-12-07 2018-05-11 北京百度网讯科技有限公司 Video recommendation method, device, computer equipment and storage medium
CN109040775A (en) * 2018-08-24 2018-12-18 深圳创维-Rgb电子有限公司 Video correlating method, device and computer readable storage medium
CN110020093A (en) * 2019-04-08 2019-07-16 深圳市网心科技有限公司 Video retrieval method, edge device, video frequency searching device and storage medium
CN109982106A (en) * 2019-04-29 2019-07-05 百度在线网络技术(北京)有限公司 A kind of video recommendation method, server, client and electronic equipment
CN110413837A (en) * 2019-05-30 2019-11-05 腾讯科技(深圳)有限公司 Video recommendation method and device
CN110290419A (en) * 2019-06-25 2019-09-27 北京奇艺世纪科技有限公司 Video broadcasting method, device and electronic equipment
CN110278449A (en) * 2019-06-26 2019-09-24 腾讯科技(深圳)有限公司 A kind of video detecting method, device, equipment and medium
CN111191078A (en) * 2020-01-08 2020-05-22 腾讯科技(深圳)有限公司 Video information processing method and device based on video information processing model

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149568A (en) * 2020-09-23 2020-12-29 创新奇智(合肥)科技有限公司 Short video positioning method and device, electronic equipment and computer readable storage medium
CN112235625A (en) * 2020-10-14 2021-01-15 广州欢网科技有限责任公司 Method and system for tracing source of short video feature film of television terminal and television terminal
WO2022127523A1 (en) * 2020-12-16 2022-06-23 北京字节跳动网络技术有限公司 Video playback method and apparatus, device, and medium
CN112632323A (en) * 2020-12-16 2021-04-09 北京字节跳动网络技术有限公司 Video playing method, device, equipment and medium
CN112612918A (en) * 2020-12-16 2021-04-06 北京字节跳动网络技术有限公司 Video resource mapping method, device, equipment and medium
CN112702624A (en) * 2020-12-22 2021-04-23 山东鲁能软件技术有限公司 Method, system, medium and device for optimizing short video playing efficiency
CN112702624B (en) * 2020-12-22 2023-04-07 山东鲁软数字科技有限公司 Method, system, medium and device for optimizing short video playing efficiency
CN112929692A (en) * 2021-01-26 2021-06-08 广州欢网科技有限责任公司 Video tracing method and device suitable for short video
CN114679621A (en) * 2021-05-07 2022-06-28 腾讯云计算(北京)有限责任公司 Video display method and device and terminal equipment
CN113407781A (en) * 2021-06-18 2021-09-17 湖南快乐阳光互动娱乐传媒有限公司 Video searching method, system, server and client
CN113722542A (en) * 2021-08-31 2021-11-30 青岛聚看云科技有限公司 Video recommendation method and display device
CN114040216A (en) * 2021-11-03 2022-02-11 杭州网易云音乐科技有限公司 Live broadcast room recommendation method, medium, device and computing equipment
CN114040216B (en) * 2021-11-03 2023-07-11 杭州网易云音乐科技有限公司 Live broadcast room recommendation method, medium, device and computing equipment

Similar Documents

Publication Publication Date Title
CN111767814A (en) Video determination method and device
CN108810642B (en) Bullet screen display method and device and electronic equipment
CN110913241B (en) Video retrieval method and device, electronic equipment and storage medium
CN110991187A (en) Entity linking method, device, electronic equipment and medium
US7904452B2 (en) Information providing server, information providing method, and information providing system
US20180004760A1 (en) Content-based video recommendation
CN113297891A (en) Video information processing method and device and electronic equipment
CN110337011A (en) Method for processing video frequency, device and equipment
CN111522996A (en) Video clip retrieval method and device
CN112507163B (en) Duration prediction model training method, recommendation method, device, equipment and medium
CN113806588B (en) Method and device for searching video
US20150189384A1 (en) Presenting information based on a video
CN110347866B (en) Information processing method, information processing device, storage medium and electronic equipment
CN108197336B (en) Video searching method and device
CN107592572B (en) Video recommendation method, device and equipment
CN109948057B (en) Interested content pushing method and device, electronic equipment and medium
CN110674345A (en) Video searching method and device and server
CN112487300B (en) Video recommendation method and device, electronic equipment and storage medium
TW200834355A (en) Information processing apparatus and method, and program
WO2005067295A1 (en) Method and apparatus for content recommendation
CN115687690A (en) Video recommendation method and device, electronic equipment and storage medium
CN115834959A (en) Video recommendation information determination method and device, electronic equipment and medium
CN107688587B (en) Media information display method and device
CN113472834A (en) Object pushing method and device
CN113127686B (en) Video searching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201013