CN115412765B - Video highlight determination method and device, electronic equipment and storage medium - Google Patents

Video highlight determination method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115412765B
CN115412765B CN202211054859.1A CN202211054859A CN115412765B CN 115412765 B CN115412765 B CN 115412765B CN 202211054859 A CN202211054859 A CN 202211054859A CN 115412765 B CN115412765 B CN 115412765B
Authority
CN
China
Prior art keywords
segment
video
highlight
alternative
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211054859.1A
Other languages
Chinese (zh)
Other versions
CN115412765A (en
Inventor
侯佳芸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202211054859.1A priority Critical patent/CN115412765B/en
Publication of CN115412765A publication Critical patent/CN115412765A/en
Application granted granted Critical
Publication of CN115412765B publication Critical patent/CN115412765B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the application provides a video highlight determining method, a device, electronic equipment and a storage medium, wherein the video highlight determining method comprises the following steps: obtaining a first alternative segment and a second alternative segment of the target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with a highlight meeting a highlight condition; a highlight segment of the target video is determined based on the first alternative segment and the second alternative segment. By applying the technical scheme provided by the embodiment of the application, the highlight clips determined based on the first alternative clip and the second alternative clip are fused with the set action tags and the characteristic of the highlight, so that the determined highlight clips are more accurate, and the video of interest can be helped to be positioned by a user.

Description

Video highlight determination method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer application technologies, and in particular, to a method and apparatus for determining a video highlight, an electronic device, and a storage medium.
Background
Viewing video is now becoming a way for people to entertain themselves. The video provider will often extract highlight clips from the video for video distribution or information distribution so that the user can learn the highlight portions of the corresponding video through the highlight clips.
How to accurately determine highlight in video, and help users locate interesting video is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The embodiment of the application aims to provide a video highlight determining method, a device, electronic equipment and a storage medium, so as to accurately determine highlight in video, and help users to locate interesting video. The specific technical scheme is as follows:
in a first aspect, a method for determining a video highlight is provided, including:
obtaining a first alternative segment and a second alternative segment of a target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with a highlight degree meeting a highlight degree condition;
a highlight segment of the target video is determined based on the first alternative segment and the second alternative segment.
In a specific embodiment of the present application, the determining, based on the first candidate segment and the second candidate segment, a highlight segment of the target video includes:
determining whether a superposition part exists between the first alternative segment and the second alternative segment on the time axis of the target video;
If there is a coincidence, a highlight segment of the target video is determined from the first and second alternative segments for which there is a coincidence on the time axis.
In a specific embodiment of the present application, the determining the highlight segment of the target video according to the first candidate segment and the second candidate segment having the overlapping portion on the time axis includes:
respectively taking the second alternative segment with the overlapping part on the time axis and the corresponding first alternative segment as a segment pair;
and determining highlight segments of the target video according to the time of the second alternative segment and the first alternative segment in each segment pair on the time axis.
In a specific embodiment of the present application, the determining the highlight segment of the target video according to the time of the second candidate segment and the first candidate segment in each segment pair on the time axis includes:
for each segment pair, if the starting time of the first alternative segment in the current segment pair on the time axis is before the starting time of the second alternative segment in the current segment pair on the time axis, acquiring a first standby segment in the target video, wherein the starting time of the first standby segment is the starting time of the second alternative segment in the current segment pair on the time axis, and the duration of the first standby segment is a first duration;
Determining whether the first segment to be used has the action tag;
determining the fragment pair corresponding to the first standby fragment with the action tag as a standby fragment pair;
for each of the inactive segment pairs, if the highlight level of the second candidate segment in the current inactive segment pair is above a first highlight level threshold, or the highlight level of the second candidate segment in the current inactive segment pair is one of the top N1 highlights, determining the second candidate segment in the current segment pair as a highlight segment of the target video, N1 being a positive integer.
In a specific embodiment of the present application, the determining the highlight segment of the target video according to the time of the second candidate segment and the first candidate segment in each segment pair on the time axis includes:
for each segment pair, if the starting time of the first alternative segment in the current segment pair on the time axis is after the starting time of the second alternative segment in the current segment pair on the time axis, acquiring a second standby segment in the target video, wherein the starting time of the second standby segment is the starting time of the first alternative segment in the current segment pair on the time axis, and the ending time of the second standby segment is the ending time of the second alternative segment in the current segment pair on the time axis;
And if the highlighting degree of the second alternative segment in the current segment pair is higher than a second highlighting degree threshold value, or the highlighting degree of the second alternative segment in the current segment pair is one of the highest first N2 highlighting degrees, determining the second standby segment as the highlighting segment of the target video, wherein N2 is a positive integer.
In a specific embodiment of the present application, in a case where it is determined that the first candidate segment and the second candidate segment do not have a coincident portion on the time axis, the method further includes:
determining the probability that each first alternative segment has the action label, determining the first alternative segment with the probability larger than a probability threshold as a highlight segment of the target video, or determining the first alternative segment corresponding to the top N3 probabilities as the highlight segment of the target video, wherein N3 is a positive integer;
or,
and determining the second alternative fragments with the highlights higher than a third highlights threshold value as the highlights of the target video, or determining the second alternative fragments corresponding to the top N4 highlights as the highlights of the target video, wherein N4 is a positive integer.
In a specific embodiment of the present application, before the determining the highlight segment of the target video based on the first candidate segment and the second candidate segment, the method further includes:
Determining the dynamic rate of each second alternative segment, wherein the dynamic rate is used for representing the variation difference condition of the image between video frames;
and eliminating the second alternative fragments with the dynamic rate smaller than the average dynamic rate of the target video.
In one specific embodiment of the present application, after the determining the highlight of the target video, the method further includes:
detecting whether a target object exists in a first second time period of the highlight;
if the target object is not present, searching backwards for the time when the target object is present from the starting time of the highlight on the time axis of the target video;
updating the starting time of the highlight to the time when the target object exists.
In one specific embodiment of the present application, after the determining the highlight of the target video, the method further includes:
determining whether the duration of the highlight meets a first duration requirement;
and if the duration of the highlight segment does not meet the first time requirement, performing an intercepting operation or a supplementing operation on the highlight segment based on the first time requirement.
In a specific embodiment of the present application, the first candidate segment includes one or more first video segments in a first video segment set of the target video, the first video segment set being obtained by:
Dividing the target video according to shots to obtain a plurality of shot fragments;
and merging the shot fragments according to a second time length requirement to obtain the first video fragment set, wherein the time length of each first video fragment in the first video fragment set meets the second time length requirement.
In a specific embodiment of the present application, the second candidate segment includes one or more second video segments in a second video segment set of the target video, and the chroma of each of the second video segments in the second video segment set is determined by:
inputting a current second video clip into a classification model obtained by pre-training aiming at each second video clip in the second video clip set, so as to obtain the probability that the current second video clip output by the classification model is a wonderful positive example;
and determining the highlighting degree of the current second video fragment according to the probability that the current second video fragment is the highlighting positive example.
In one embodiment of the present application, the classification model is obtained by pre-training the following steps:
obtaining a training sample set, wherein the training sample set comprises a plurality of sample pairs, and each sample pair comprises a wonderful positive example and a negative example;
Training the pre-constructed initial model by using the training sample set until the training is terminated when the set training termination condition is reached, so as to obtain the classification model;
the training the pre-constructed initial model by using the training sample set comprises the following steps:
for each sample pair, inputting the current sample pair into a pre-constructed initial model;
determining model loss according to the output results of the current sample pair and the current sample pair output by the initial model;
and adjusting parameters of the initial model according to the model loss.
In a specific embodiment of the present application, further comprising:
obtaining second viewing data of the target video;
the determining the highlighting degree of the current second video segment according to the probability that the current second video segment is the highlighting positive example comprises the following steps:
and determining the highlighting degree of the current second video segment according to the second watching data and the probability that the current second video segment is the highlighting positive example.
In a second aspect, there is provided a video highlight determination apparatus comprising:
the obtaining module is used for obtaining a first alternative segment and a second alternative segment of the target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with a highlight degree meeting a highlight degree condition;
And the determining module is used for determining the highlight segment of the target video based on the first alternative segment and the second alternative segment.
In a third aspect, an electronic device is provided, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the steps of the video highlight determination method in the first aspect when executing the program stored in the memory.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor implements the steps of the video highlight determining method according to the first aspect.
In a fifth aspect, a computer program product is provided, the computer program product comprising computer instructions stored in a computer readable storage medium and adapted to be read and executed by a processor to cause an electronic device having the processor to perform the steps of the video highlight determining method of the first aspect.
After the technical scheme provided by the embodiment of the application is applied to obtain the first alternative segment and the second alternative segment of the target video, the highlight segment of the target video is determined based on the first alternative segment and the second alternative segment. The first alternative segment is a video segment with a set action tag, the second alternative segment is determined according to the highlight degree, so that the highlight segments determined based on the first alternative segment and the second alternative segment are fused with the characteristics of the set action tag and the highlight degree, the determined highlight segments are more accurate, and further the determined highlight segments are subjected to video distribution or information delivery, so that a user can be helped to locate a video of interest, the click rate of the video is improved, and the watching probability of the user on a target video is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flowchart of an implementation of a video highlight determination method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating another implementation of a video highlight determination method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a video highlight determining apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The core of the application is to provide a video highlight determination method which can be applied to scenes such as video recommendation distribution, information delivery and the like. For example, when a target video is to be recommended to a user, a highlight of the target video can be accurately determined based on the technical scheme provided by the embodiment of the application, and then the highlight is used for video distribution, so that the user can know the highlight of the target video through the highlight, the user can be helped to locate the interested video, and the user click rate of the video and the watching probability of the user on the video are improved.
The technical scheme provided by the embodiment of the application is particularly suitable for determining the highlight clips of the cartoon and child type videos, and because the cartoon and child type videos have larger differences in the aspects of pictures, shooting styles, themes and the like compared with the videos of film and television, variety and life shooting types, the determined highlight clips are fused with the set action labels and the highlight features, and the highlight clip determination accuracy can be improved.
Referring to fig. 1, a flowchart of an implementation of a video highlight determining method according to an embodiment of the present application may include the following steps:
s110: and obtaining a first alternative segment and a second alternative segment of the target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with a highlight meeting a preset highlight condition.
In the embodiment of the present application, the target video may be any video, such as any video to be recommended for distribution. After the target video is determined, a first alternative segment and a second alternative segment of the target video may be obtained. Specifically, a video segment with a set action tag in the target video may be used as a first candidate segment, and a video segment with a highlight meeting a highlight condition in the target video may be used as a second candidate segment. The highlight condition may be a preset condition, for example, when the highlight of a certain video segment is higher than a highlight threshold, the highlight of the video segment may be considered to satisfy the highlight condition, or the highlight of a certain video segment is within the top N highest highlights, and the highlight of the video segment may be considered to satisfy the highlight condition. I.e. the second alternative segment has a higher level of highlighting.
In a specific embodiment, the target video may be segmented to obtain a first video clip set and a second video clip set of the target video.
The first video segment set and the second video segment set can be obtained in the same segmentation mode, and can also be obtained in different segmentation modes. For example, a first video segment set is obtained by a shot segmentation method, and a second video segment set is obtained by a sliding window segmentation method. The first video segment set comprises at least one first video segment, and the second video segment set comprises at least one second video segment.
For the first alternative segment, the determination may be made based on a first video segment of the first set of video segments having a set action tag.
The action tags set in the embodiments of the present application may include tags of various types such as special effects, hugs, eating, running, hand pulling, dancing, clapping, emotional agitation, etc., so as to adapt to videos of various subjects. Emotional agitation such as crying, laughing, surprise, etc.
It may be first determined whether each first video clip in the first set of video clips has a set action tag. Specifically, the probability that each first video segment has each type of action tag can be obtained through a time form model based on the spatiotemporal self-attention obtained through pre-training, and the action tag of each first video segment can be identified according to the probability. For each first video clip, the action tag with the highest probability may be determined as the action tag that the first video clip has. For example, a first video clip a in the first video clip set has a handle type action tag, and a first video clip B has a running type action tag. The types of action tags that different first video clips in the first video clip set have may be the same or different, some first video clips have a set action tag, and some first video clips do not have a set action tag.
A first alternative segment may be determined from a first video segment of the first set of video segments having a set action tag. Each first video segment with the set action tag can be determined as a first alternative segment, and when different first video segments with the set action tag are continuous, the continuous first video segments with the set action tag can be combined first, and then the combined video segments are determined as a first alternative segment.
For the second alternative segments, the determination may be based on the highlighting of each second video segment in the set of second video segments.
The highlighting of each second video clip in the set of second video clips may be determined first. Specifically, the level of each second video clip in the second video clip set may be determined according to a user viewing behavior, a preset level scoring algorithm, and the like.
A second alternative segment may be determined based on the highlighting of each second video segment in the set of second video segments. In particular, a second video segment above the set highlight threshold may be determined as a second alternative segment, or a second video segment having a top N number of video segments ordered from high to low in chroma may be determined as a second alternative segment. Each second video segment meeting the chroma condition can be determined as a second alternative segment, and when continuous segments exist in the second video segments meeting the chroma condition, the continuous second video segments can be combined first, and then the combined video segments are determined as a second alternative segment.
The highlight threshold may be set and adjusted according to the actual situation, for example, as an average value or a median value of the highlights of all the second video clips.
S120: a highlight segment of the target video is determined based on the first alternative segment and the second alternative segment.
After obtaining the first and second alternative segments of the target video, a highlight segment of the target video may be determined based on the first and second alternative segments. Because the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with a highlight degree meeting the highlight degree condition, namely a video segment with higher highlight degree, the determined highlight segment fuses the action label and the feature of the highlight degree, and the determination accuracy is improved.
After the method provided by the embodiment of the application is applied to obtain the first alternative segment and the second alternative segment of the target video, the highlight segment of the target video is determined based on the first alternative segment and the second alternative segment. The first alternative segment is a video segment with a set action tag, the second alternative segment is determined according to the highlight, so that the highlight determined based on the first alternative segment and the second alternative segment combines the characteristics of the set action tag and the highlight, the determined highlight is more accurate, the determined highlight is further used for video distribution or information delivery, a user can be helped to locate a video of interest, the user click rate of the video is improved, and the viewing probability of the user on the video is improved.
In one embodiment of the present application, determining a highlight segment of a target video based on a first alternative segment and a second alternative segment may include the steps of:
step one: determining whether a superposition part exists between the first alternative segment and the second alternative segment on the time axis of the target video, and executing the second step if the superposition part exists;
step two: and determining the highlight segment of the target video according to the first alternative segment and the second alternative segment with the overlapping part on the time axis.
For convenience of description, the above two steps are described in combination.
In the embodiment of the present application, after the first candidate segment and the second candidate segment of the target video are obtained, it may be determined whether there is a superposition portion between the first candidate segment and the second candidate segment on the time axis of the target video, that is, whether there is a superposition time between the first candidate segment and the second candidate segment. For example, if a time of a certain second candidate segment on the time axis of the target video is 2 minutes 5 seconds to 2 minutes 10 seconds and a time of a certain first candidate segment on the time axis of the target video is 1 minute 50 seconds to 2 minutes 8 seconds, it may be determined that there is a coincidence between the second candidate segment and the first candidate segment on the time axis of the target video, that is, a coincidence between 2 minutes 5 seconds to 2 minutes 8 seconds on the time axis of the target video.
If the first alternative segment and the second alternative segment have overlapping portions on the time axis, the highlight segment of the target video can be determined according to the first alternative segment and the second alternative segment which have overlapping portions on the time axis, so that the determined highlight segment can be further ensured to be fused with the action label and the feature of the highlight, and the accuracy of determining the highlight segment is improved.
In one embodiment of the present application, determining a highlight segment of a target video according to a first candidate segment and a second candidate segment having a coincident portion on a time axis may include the steps of:
the first step: respectively taking a second alternative segment with a superposition part on a time axis and a corresponding first alternative segment as a segment pair;
and a second step of: and determining the highlight segment of the target video according to the time of the second alternative segment and the first alternative segment in each segment pair on the time axis.
For convenience of description, the above two steps are described in combination.
In the embodiment of the present application, in the case where it is determined that the first candidate segment and the second candidate segment have the overlapping portion on the time axis, the second candidate segment and the corresponding first candidate segment having the overlapping portion on the time axis may be respectively regarded as one segment pair. If there is a coincidence of a second alternative segment with a first alternative segment on the time axis, the second alternative segment and the first alternative segment are a segment pair. If there is a coincidence part between a second alternative segment and a plurality of first alternative segments on the time axis, the second alternative segment and each of the plurality of first alternative segments are respectively a segment pair.
And determining the highlight segment of the target video according to the time of the second alternative segment and the first alternative segment in each segment pair on the time axis. The determined highlight fragment can contain part or all of the time in the second alternative fragment and the first alternative fragment with the overlapping part on the time axis, the action label and the characteristic of the highlight degree are fused, and the accuracy of highlight fragment determination is improved.
In one embodiment of the present application, determining the highlight segment of the target video according to the time of the second alternative segment and the first alternative segment in each segment pair on the time axis may include the steps of:
step one: for each segment pair, if the starting time of a first alternative segment in the current segment pair on the time axis is before the starting time of a second alternative segment in the current segment pair on the time axis, acquiring a first standby segment in the target video, wherein the starting time of the first standby segment is the starting time of the second alternative segment in the current segment pair, and the duration of the first standby segment is a first duration;
step two: determining whether the first segment to be used has an action tag;
step three: determining a fragment pair corresponding to a first fragment to be used with an action tag as a fragment pair to be used;
Step four: for each inactive segment pair, if the highlight level of the second alternate segment in the current inactive segment pair is above the first highlight level threshold, or the highlight level of the second alternate segment in the current inactive segment pair is one of the top N1 highlight levels, then the second alternate segment in the current segment pair is determined to be the highlight segment of the target video, N1 being a positive integer.
For convenience of description, the above four steps are described in combination.
In the embodiment of the application, the second alternative segment and the corresponding first alternative segment, which have the overlapping portion on the time axis, are respectively used as a segment pair, so that a plurality of segment pairs can be obtained, and each segment pair comprises a first alternative segment and a second alternative segment.
For each segment pair, it may be determined whether a start time of a first alternate segment in the current segment pair on the time axis is before a start time of a second alternate segment in the current segment pair on the time axis. The current segment pair refers to the segment pair for which the current operation is directed.
If the starting time of the first alternative segment in the current segment pair on the time axis is before the starting time of the second alternative segment in the current segment pair on the time axis, the chroma of the first alternative segment in the current segment pair with partial time can be considered to be not high enough, and the starting time of the second alternative segment in the current segment pair on the time axis can be taken as a starting point, so that the first segment to be used with the first duration in the target video can be obtained. The starting time of the first standby segment is the starting time of the second standby segment in the current segment pair on the time axis, and part of the time of the first standby segment in the current segment pair is removed. The first duration may be preset, for example, 5 seconds. Because there is a coincidence of the first candidate segment and the second candidate segment in the current segment pair on the time axis, if the start time of the first candidate segment on the time axis is before the start time of the second candidate segment on the time axis, the end time of the first candidate segment on the time axis may be before or after the end time of the second candidate segment on the time axis and after the start time of the second candidate segment on the time axis.
After the first standby segment is acquired, it may be determined whether the first standby segment has a set action tag. If the first segment to be used has a set action tag, the first segment to be used can be considered to still meet the determination requirement of the highlight segment. If the first standby segment does not have the set action tag, the first standby segment may be considered to be not in accordance with the determination requirement of the highlight segment, and may be ignored.
The segment pair corresponding to the first standby segment with the action tag may be determined as a standby segment pair.
For each inactive segment pair, it may be determined whether the highlight level of the second alternate segment in the current inactive segment pair is above the first highlight level threshold, or whether the highlight level of the second alternate segment in the current inactive segment pair is one of the top N1 highlights. If the highlight of the second alternative segment in the current inactive segment pair is higher than the first highlight threshold, or the highlight of the second alternative segment in the current inactive segment pair is one of the top N1 highlights, then the highlight of the second alternative segment in the current inactive segment pair may be considered to be higher, and the current inactive segment pair is the segment pair corresponding to the first inactive segment with the action tag, the second alternative segment in the current inactive segment pair has the action tag, and the second alternative segment in the current segment pair may be determined to be the highlight segment of the target video. Further, the determined highlight has action labels, and the highlight is high. The first highlighting threshold and N1 may be set and adjusted according to the actual situation.
The current dormant segment pair refers to the dormant segment pair for which the current operation is directed.
In one embodiment of the present application, determining the highlight segment of the target video according to the time of the second alternative segment and the first alternative segment in each segment pair on the time axis may include the steps of:
the first step: for each segment pair, if the starting time of the first alternative segment in the current segment pair on the time axis is after the starting time of the second alternative segment in the current segment pair on the time axis, acquiring a second standby segment in the target video, wherein the starting time of the second standby segment is the starting time of the first alternative segment in the current segment pair on the time axis, and the ending time of the second standby segment is the ending time of the second alternative segment in the current segment pair on the time axis;
and a second step of: if the highlight level of the second alternative segment in the current segment pair is higher than the second highlight level threshold, or the highlight level of the second alternative segment in the current segment pair is one of the top N2 highlight levels, determining the second standby segment as the highlight segment of the target video, wherein N2 is a positive integer.
For convenience of description, the above two steps are described in combination.
In the embodiment of the application, the second alternative segment and the corresponding first alternative segment, which have the overlapping portion on the time axis, are respectively used as a segment pair, so that a plurality of segment pairs can be obtained, and each segment pair comprises a first alternative segment and a second alternative segment.
For each segment pair, it may be determined whether a start time of a first candidate segment in the current segment pair on the time axis is after a start time of a second candidate segment in the current segment pair on the time axis. If the starting time of the first alternative segment in the current segment pair on the time axis is after the starting time of the second alternative segment in the current segment pair on the time axis, the video segment from the starting time of the second alternative segment in the current segment pair on the time axis to the starting time of the first alternative segment in the current segment pair on the time axis can be considered to have no action tag, the starting time of the first alternative segment in the current segment pair on the time axis can be taken as a starting point, and the ending time of the second alternative segment in the current segment pair on the time axis can be taken as an ending point, so that the second standby segment of the target video can be obtained. This ensures that the second dormant segment has an action tag. The current segment pair refers to the segment pair for which the current operation is directed. Because there is a coincidence of the first candidate segment and the second candidate segment in the current segment pair on the time axis, if the start time of the first candidate segment on the time axis is after the start time of the second candidate segment on the time axis, the start time of the first candidate segment on the time axis must be before the end time of the second candidate segment on the time axis, and the end time of the first candidate segment on the time axis may be before or after the end time of the second candidate segment on the time axis.
The highlight of the second alternative segment in the current segment pair may be compared to the highlights of the second alternative segment in the other segment pairs, and the second standby segment may be determined to be the highlight of the target video if the highlight of the second alternative segment in the current segment pair is higher than the second highlight threshold, or the highlight of the second alternative segment in the current segment pair is one of the top N2 highlights that are highest. Further, the determined highlight has action labels, and the highlight is high.
The second highlighting threshold and N2 may be set and adjusted according to the actual situation. The first and second highlighting thresholds may be the same or different, and N1 and N2 may be the same or different.
In an embodiment of the present application, in a case where it is determined that there is no overlapping portion of the first candidate segment and the second candidate segment on the time axis, the method may further include the steps of:
determining the probability that each first alternative segment has an action label, determining the first alternative segment with the probability larger than a probability threshold as a highlight segment of the target video, or determining the first alternative segment corresponding to the top N3 probabilities as the highlight segment of the target video, wherein N3 is a positive integer;
Or,
and determining a second alternative segment with the highlighting degree higher than a third highlighting degree threshold value as a highlighting segment of the target video, or determining a second alternative segment corresponding to the highest first N4 highlighting degrees as a highlighting segment of the target video, wherein N4 is a positive integer.
In the embodiment of the application, after the first candidate segment and the second candidate segment of the target video are obtained, whether a superposition portion exists between the first candidate segment and the second candidate segment on a time axis is determined. If there is no overlap on the time axis, a highlight of the target video may be determined by an action tag or a highlight.
Specifically, the probability that each first alternative segment has an action tag may be determined, the first alternative segment with the probability greater than the probability threshold may be determined as a highlight segment of the target video, or the first alternative segment corresponding to the top N3 probabilities may be determined as a highlight segment of the target video. When determining whether each first candidate segment has an action tag, a probability that each first candidate segment is determined to have an action tag may be obtained. For each first alternative segment, the higher the probability that the first alternative segment has an action tag, the more pronounced the action in the first alternative segment can be considered. The first candidate segment with the higher probability can be determined as the highlight segment of the target video, so that the determined highlight segment has richer actions.
A second alternative segment having a highlight higher than the third highlight threshold may also be determined as a highlight of the target video, or a second alternative segment corresponding to the top N4 highlights may be determined as a highlight of the target video, so that the determined highlight is higher in highlight and is more attractive to the user.
In one embodiment of the present application, before determining the highlight of the target video based on the first and second alternative segments, the method may further comprise the steps of:
step one: determining the dynamic rate of each second alternative segment, wherein the dynamic rate is used for representing the variation difference condition of the images between video frames;
step two: and eliminating the second alternative fragments with the dynamic rate smaller than the average dynamic rate of the target video.
For convenience of description, the above two steps are described in combination.
In one embodiment of the present application, after the second alternative segments are obtained, the dynamic rate of each second alternative segment may be determined. The dynamic rate is used to characterize the variation of the image between frames of the video. Specifically, it may be determined by the difference of the video inter-frame images. The average dynamic rate of the target video may be determined simultaneously or before or after this. The average dynamic rate of the target video may be an average of the dynamic rates of all second candidate segments of the target video.
And judging the magnitude relation between the dynamic rate of each second alternative segment and the average dynamic rate of the target video. For each second alternative segment, if the dynamic rate of the second alternative segment is higher than the average dynamic rate of the target video, the difference of the image changes among the frames of the second alternative segment can be considered to be larger, so that the interest of the user is easier to improve, and if the dynamic rate of the second alternative segment is smaller than or equal to the average dynamic rate of the target video, the difference of the image changes among the frames of the second alternative segment can be considered to be smaller, so that the interest of the user can not be easily improved.
The second alternative segment with the dynamic rate smaller than the average dynamic rate of the target video can be removed, and then the highlight segment of the target video is determined based on the first alternative segment and the second alternative segment. The dynamic rate of the determined highlight is higher.
In one embodiment of the present application, after determining the highlight of the target video, the method may further comprise the steps of:
the first step: detecting whether a target object exists in the first second duration of the highlight fragment, and if the target object does not exist, executing a second step;
and a second step of: looking back for the time of the existence of the target object from the start time of the highlight on the time axis of the target video;
And a third step of: the start time of the highlight is updated to the time when the target object is present.
For convenience of description, the above three steps are described in combination.
In an embodiment of the present application, after determining the highlight of the target video, it may be detected whether the target object exists in the previous second duration of the highlight. The target object may be a primary role and/or a secondary role in the target video. Detection can be performed according to a person identification index algorithm, such as detecting whether cartoon characters exist or not. The second duration may be set and adjusted according to the actual situation, for example, 5 seconds.
It will be appreciated that a target object such as a character in a video is more likely to be of interest to a user when the user is watching the video. If the target object is present for the first second duration of the highlight reel, no further processing may be performed. If the target object does not exist in the previous second duration of the highlight, the time when the target object exists can be searched backwards from the starting time of the highlight on the time axis of the target video, and then the starting time of the highlight is updated to the latest searched time point when the target object exists. Therefore, the target object appears at the beginning stage of the adjusted highlight fragment, and the viewing probability of the user is further improved.
In one embodiment of the present application, after determining the highlight of the target video, the method may further comprise the steps of:
step one: determining whether the duration of the highlight meets a first duration requirement;
step two: if the duration of the highlight segment does not meet the first time requirement, performing an intercepting operation or a supplementing operation on the highlight segment based on the first time requirement.
For convenience of description, the above three steps are described in combination.
In the embodiment of the present application, the duration of the highlight may be limited, because if the duration of the highlight is too short, the highlight presented to the user is too small to effectively improve the interest of the user in watching the target video, and if the duration of the highlight is too long, the highlight presented to the user is too much, so that the probability of the user watching the target video is easily reduced.
Therefore, in the embodiment of the present application, after determining the highlight of the target video, it may also be determined whether the duration of the highlight meets the first duration requirement. The first time period requirement may be preset, such as a time period in the range of 30 seconds to 2 minutes.
If the duration of the highlight segment does not meet the first time requirement, such as is greater than the maximum value of the first time requirement, then the highlight segment may be truncated such that the duration of the truncated highlight segment is less than or equal to the maximum value of the first time requirement, meeting the first time requirement. For example, the starting time of the highlight on the time axis can be taken as a starting point, the video is taken back for 1 minute in the target video, and the taken video is taken as the highlight. If the duration of the highlight is less than the minimum value of the first time requirement, the highlight can be supplemented, so that the duration of the supplemented highlight is greater than or equal to the minimum value of the first time requirement, and the first time requirement is met. A video clip can be intercepted backwards in the target video from the ending time of the highlight on the time axis and added to the highlight, and the added video clip is taken as the highlight.
According to the duration of the highlight and the first time requirement, the highlight is adjusted, so that the duration of the highlight is not too long or too short, and the probability of watching the target video by a user can be effectively improved.
In one embodiment of the present application, the first candidate segment includes one or more first video segments of a first set of video segments of the target video, which may be obtained by:
the first step: dividing a target video according to shots to obtain a plurality of shot fragments;
and a second step of: and merging the shot fragments according to the second time length requirement to obtain a first video fragment set, wherein the time length of each first video fragment in the first video fragment set meets the second time length requirement.
For convenience of description, the above two steps are described in combination.
In the embodiment of the application, the target video may be segmented according to shots, so as to obtain a plurality of shot segments. Specifically, the segmentation can be performed by using algorithms such as image similarity, histogram statistics, frame change segmentation (kernel temporal segmentation, KTS), and the like.
However, since there is often a problem that the motion is discontinuous and the motion label is difficult to judge when the shots are rapidly switched, after the plurality of shot segments are obtained in the embodiment of the present application, the shot segments can be combined according to the second duration requirement to obtain the first video segment set. For example, the second duration is required to be 5 seconds to 6 seconds, and if the duration of the obtained plurality of shot segments is [1s,3s,2s,5.1s,4s,1s ], the first video segment set of [6s,5.1s,5s ] can be obtained after the combination.
The duration of each first video clip in the first set of video clips meets the second duration requirement. This facilitates determination of the action tag.
In one embodiment of the present application, the second alternative segments comprise one or more second video segments of a second set of video segments of the target video, and the precision chroma of each second video segment of the second set of video segments may be determined by:
step one: obtaining first watching data of a target video;
step two: and determining the highlighting degree of each second video clip in the second video clip set according to the first viewing data.
For convenience of description, the above three steps are described in combination.
In the embodiment of the present application, for the target video, if there are more viewing behaviors of the user, such as playing, pausing, playing back, fast forwarding, transmitting a bullet screen, and the like, the level of each second video clip may be determined by the viewing data.
First viewing data of the target video, which is data including the above-described viewing behavior, may be obtained first. From the first viewing data, a level of each second video clip in the set of second video clips may be determined.
Specifically, the number of times each second video clip in the second video clip set is watched may be determined according to the first viewing data, and the degree of highlighting of each second video clip may be determined according to the number of times each second video clip is watched. For example, the precision chroma can be determined according to the preset corresponding relation between the watching times and the precision chroma.
The highlighting of each second video clip can be quickly determined from the viewing data.
Further, when determining the second alternative clip according to the level of each second video clip in the second video clip set, the level of level threshold may be set according to the number of viewers. For example, the highlight corresponding to the average value of the number of times of watching the second video clips in the second video clip set is determined as a highlight threshold, and the determined second alternative clip is the second video clip with the number of times of watching being greater than the average value of the number of times of watching in the second video clip set, or the highlight corresponding to the median of the number of times of watching the second video clips in the second video clip set is determined as a highlight threshold, and the determined second alternative clip is the second video clip with the number of times of watching being greater than the median of times of watching in the second video clip set.
In one embodiment of the present application, the second video clips in the second set of video clips do not include video clips at the beginning and end of the clip.
It will be appreciated that in general, the head and tail of a video are viewed more frequently, but the significance of using the head or tail as a highlight is not great. Therefore, before determining the highlighting of each second video clip in the second set of video clips from the first viewing data, the second video clips at the beginning and end of the clip may be removed from the second set of video clips, i.e., the second video clips in the second set of video clips do not include video clips at the beginning and end of the clip. Therefore, the situation that the finally determined highlight is the head or the tail of the target video can be effectively avoided, and the highlight determination accuracy is improved.
In one embodiment of the present application, the second alternative segments comprise one or more second video segments of a second set of video segments of the target video, and the precision chroma of each second video segment of the second set of video segments may be determined by:
the first step: inputting the current second video clip into a classification model obtained by training in advance aiming at each second video clip in the second video clip set to obtain the probability that the current second video clip output by the classification model is a wonderful positive example;
And a second step of: and determining the highlighting degree of the current second video segment according to the probability that the current second video segment is the highlighting positive example.
For convenience of description, the above two steps are described in combination.
In the case where the target video is temporarily not online or the online time is shorter, the viewing data of the target video is smaller. It is more difficult to determine the highlighting of the second video clip by the user's viewing behavior. Therefore, embodiments of the present application provide another way to make a highlight determination.
For each second video clip in the second video clip set, the current second video clip can be input into a classification model obtained by pre-training, so as to obtain the probability that the current second video clip output by the classification model is a wonderful positive example. The classification model may be pre-trained and obtained, which may output as input the probability that the second video segment is a highlight. The higher the probability that the current second video clip is a highlight, the more highlight the current second video clip. The current second video clip is the second video clip for which the current operation is directed.
The highlighting of the current second video clip may be determined based on the probability that the current second video clip is a highlight. Specifically, a corresponding relationship between the probability that the second video segment is the highlight and the chroma can be preset, and the chroma of the corresponding second video segment can be determined according to the corresponding relationship and the probability that the second video segment output by the classification model is the highlight.
The highlighting degree of each second video segment can be accurately determined through the classification model, and basic guarantee is provided for the determination of the subsequent highlighting segments.
In one embodiment of the present application, the classification model may be pre-trained by:
step one: obtaining a training sample set, wherein the training sample set comprises a plurality of sample pairs, and each sample pair comprises a wonderful positive example and a negative example;
step two: training the pre-constructed initial model by using a training sample set until the training is terminated when the set training termination condition is reached, so as to obtain a classification model;
the training of the pre-constructed initial model by using the training sample set comprises the following steps:
for each sample pair, inputting the current sample pair into a pre-constructed initial model;
determining model loss according to the output result of the current sample pair output by the current sample pair and the initial model;
parameters of the initial model are adjusted according to model losses.
In an embodiment of the present application, a training sample set may be obtained in advance, where the training sample set includes a plurality of sample pairs, and each sample pair includes a highlight positive example and a negative example. Specifically, a plurality of historical videos and highlight clips of each historical video may be obtained in advance, for each historical video, the highlight clip of each historical video may be extracted from the historical video as a highlight example, a section of a set duration may be randomly extracted from other parts of the historical video except for the highlight clip as a negative example, the set duration may be 5 seconds or 10 seconds, and the highlight example and the negative example may become a sample pair, and a plurality of sample pairs may form a training sample set.
Training the pre-constructed initial model by using the training sample set until the set training termination condition is reached, and obtaining the classification model. The initial model may be a Timesformer model, with model parameters of the initial model having initial states.
The training termination condition may be that the number of training times reaches a preset number of times threshold, or may be that the model accuracy reaches a set accuracy threshold.
In model training, for each sample pair, the current sample pair may be input into a pre-built initial model, which may include a backbone network and a classification network. The characteristics of the highlight positive and negative examples in the current sample pair can be extracted by a transducer as an initial model backbone network and then input into a classification network. And finally, obtaining an output result of the current sample pair output by the initial model.
Model loss can be determined based on the output results of the current sample pair and the current sample pair output by the initial model. Model loss may include cross entropy classification loss and ordering loss, namely ordering loss. In particular, the model penalty may be the sum of the cross-entropy classification penalty and the ordering penalty, or a weighted sum.
Specifically, for a wonderful positive example in the current sample pair, the output result of the wonderful positive example, which belongs to the positive example or the negative example, output by the initial model can be compared with the real information of the wonderful positive example, and for a negative example in the current sample pair, the output result of the wonderful positive example, which belongs to the positive example or the negative example, output by the initial model can be compared with the real information of the negative example, so as to obtain the cross entropy classification loss.
And aiming at the wonderful positive example and the wonderful negative example in the current sample pair, obtaining the score of the wonderful positive example and the score of the wonderful negative example output by the initial model, and determining the sorting loss according to the score, wherein the score can correspond to the probability.
The ordering loss L can be determined according to the following formula p (s + ,s - ):
L p (s + ,s - )=max(0,1-h(s + )+h(s - )) p
Wherein s is + Representing a wonderful positive example, s - Representing negative examples, h (s + ) Score, h(s) - ) The score representing the negative example of the initial model output, p represents the sample pair, and the ranking penalty is such that the score of the wonderful positive example is greater than the score of the negative example, preferably close to 1.
After the model loss is determined, parameters of the initial model can be adjusted according to the model loss, and the model loss is reduced, so that the model loss is continuously converged, and the accuracy of the initial model is continuously improved.
The classification model is obtained through pre-training, so that the subsequent determination of the highlighting degree is facilitated.
In one embodiment of the present application, the method may further comprise the steps of:
obtaining second watching data of the target video;
determining the highlighting of the current second video segment based on the probability that the current second video segment is a highlighting may comprise the steps of:
and determining the highlighting degree of the current second video segment according to the second watching data and the probability of the current second video segment being the highlighting positive example.
In the embodiment of the application, the second viewing data of the target video can be obtained.
And inputting the current second video clip into a classification model obtained by training in advance for each second video clip in the second video clip set, and determining the highlighting degree of the current second video clip according to the second watching data and the probability that the current second video clip is a highlighting case after obtaining the probability that the current second video clip output by the classification model is a highlighting case. Specifically, the first reference precision of the current second video segment may be determined according to the second viewing data, the second reference precision of the current second video segment may be determined according to the probability that the current second video segment is a highlight, and an average value of the first reference precision and the second reference precision may be used as the highlight of the current second video segment. Namely, the user watching behavior is combined with the detection of the classification model to determine the highlighting degree of the second video segment, so that the determined highlighting degree is more accurate, and basic guarantee is provided for the subsequent determination of the highlighting segment.
As shown in fig. 2, one specific procedure for video highlight determination is:
firstly, a first video fragment set and a second video fragment set of a target video are obtained;
then determining whether each first video clip in the first video clip set has a set action label, and determining the highlighting degree of each second video clip in the second video clip set;
determining a first alternative segment according to the first video segment with the set action label in the first video segment set, and determining a second alternative segment according to the highlighting degree of each second video segment in the second video segment set;
then determining a highlight segment of the target video based on the first alternative segment and the second alternative segment;
finally, the determined highlight segment is adjusted, whether a target object exists in the second time length before the highlight segment can be detected, if the target object does not exist, the time when the target object exists is searched backwards from the starting time of the highlight segment on a time axis, the starting time of the highlight segment is updated to the time when the target object exists, whether the time length of the highlight segment meets the first time length requirement can be determined, if the time length of the highlight segment is larger than the maximum value of the first time length requirement, the highlight segment is intercepted, so that the time length of the intercepted highlight segment is smaller than or equal to the maximum value of the first time length requirement, and if the time length of the highlight segment is smaller than the minimum value of the first time length requirement, the highlight segment is supplemented, so that the time length of the supplemented highlight segment is larger than or equal to the maximum value of the first time length requirement.
Wherein after determining the second candidate segments and before determining the highlight segments of the target video based on the first candidate segments and the second candidate segments, the dynamic rate of each second candidate segment may also be determined, and the second candidate segments having a dynamic rate smaller than the average dynamic rate of the target video are culled.
According to the method and the device for determining the highlight clips, the determined highlight clips are fused with the action labels and the highlight features, and the determined highlight clips are adjusted, so that the determined highlight clips are more conducive to improving the watching interest of users, and the watching probability of the users to target videos can be improved.
It should be noted that, the time referred to in the embodiment of the present application refers to a time on a time axis of the target video.
Corresponding to the above method embodiments, the embodiments of the present application further provide a video highlight determining apparatus, and the video highlight determining apparatus described below and the video highlight determining method described above may be referred to correspondingly to each other.
Referring to fig. 3, the video highlight determining apparatus 300 may include the following modules:
an obtaining module 310, configured to obtain a first alternative segment and a second alternative segment of the target video, where the first alternative segment is a video segment with a set action tag, and the second alternative segment is a video segment with a highlight degree that meets a highlight degree condition;
A determining module 320 is configured to determine a highlight segment of the target video based on the first candidate segment and the second candidate segment.
After the device provided by the embodiment of the application is applied to obtain the first alternative segment and the second alternative segment of the target video, the highlight segment of the target video is determined based on the first alternative segment and the second alternative segment. The first alternative segment is a video segment with a set action tag, the second alternative segment is a video segment with a highlight degree meeting the highlight degree condition, so that the highlight segment determined based on the first alternative segment and the second alternative segment combines the set action tag and the highlight degree, the determined highlight segment is more accurate, the determined highlight segment is further used for video distribution or information delivery, a user can be helped to locate a video of interest, the click rate of the video is improved, and the watching probability of the user on the video is improved.
In a specific embodiment of the present application, the determining module 320 is configured to:
determining whether a superposition part exists between the first alternative segment and the second alternative segment on the time axis of the target video;
if the overlapping portion exists, a highlight segment of the target video is determined according to the first candidate segment and the second candidate segment, in which the overlapping portion exists on the time axis.
In a specific embodiment of the present application, the determining module 320 is configured to:
respectively taking a second alternative segment with a superposition part on a time axis and a corresponding first alternative segment as a segment pair;
and determining the highlight segment of the target video according to the time of the second alternative segment and the first alternative segment in each segment pair on the time axis.
In a specific embodiment of the present application, the determining module 320 is configured to:
for each segment pair, if the starting time of a first alternative segment in the current segment pair on the time axis is before the starting time of a second alternative segment in the current segment pair on the time axis, acquiring a first standby segment in the target video, wherein the starting time of the first standby segment is the starting time of the second alternative segment in the current segment pair on the time axis, and the duration of the first standby segment is a first duration;
determining whether the first segment to be used has an action tag;
determining a fragment pair corresponding to a first fragment to be used with an action tag as a fragment pair to be used;
for each inactive segment pair, if the highlight level of the second alternate segment in the current inactive segment pair is above the first highlight level threshold, or the highlight level of the second alternate segment in the current inactive segment pair is one of the top N1 highlight levels, then the second alternate segment in the current segment pair is determined to be the highlight segment of the target video, N1 being a positive integer.
In a specific embodiment of the present application, the determining module 320 is configured to:
for each segment pair, if the starting time of the first alternative segment in the current segment pair on the time axis is after the starting time of the second alternative segment in the current segment pair on the time axis, acquiring a second standby segment in the target video, wherein the starting time of the second standby segment is the starting time of the first alternative segment in the current segment pair on the time axis, and the ending time of the second standby segment is the ending time of the second alternative segment in the current segment pair on the time axis;
and if the highlighting degree of the second alternative fragment in the current fragment pair is higher than a second highlighting degree threshold value, or the highlighting degree of the second alternative fragment in the current fragment pair is one of the highest first N2 highlighting degrees, determining the second standby fragment as the highlighting fragment of the target video, wherein N2 is a positive integer.
In a specific embodiment of the present application, the determining module 320 is further configured to:
under the condition that the first alternative fragments and the second alternative fragments are not overlapped on a time axis, determining the probability that each first alternative fragment has an action label, determining the first alternative fragments with the probability larger than a probability threshold as highlight fragments of the target video, or determining the first alternative fragments corresponding to the highest first N3 probabilities as highlight fragments of the target video, wherein N3 is a positive integer;
Or,
and determining a second alternative segment with the highlighting degree higher than a third highlighting degree threshold value as a highlighting segment of the target video, or determining a second alternative segment corresponding to the highest first N4 highlighting degrees as a highlighting segment of the target video, wherein N4 is a positive integer.
In a specific embodiment of the present application, the method further includes a rejection module, configured to:
before determining highlight segments of the target video based on the first candidate segments and the second candidate segments, determining a dynamic rate of each second candidate segment, wherein the dynamic rate is used for representing the variation difference condition of images among video frames;
and eliminating the second alternative fragments with the dynamic rate smaller than the average dynamic rate of the target video.
In a specific embodiment of the present application, the method further includes a first adjustment module, configured to:
after determining the highlight of the target video, detecting whether a target object exists in the previous second time period of the highlight;
if the target object does not exist, searching backwards for the time when the target object exists from the starting time of the highlight on the time axis of the target video;
the start time of the highlight is updated to the time when the target object is present.
In a specific embodiment of the present application, the method further includes a second adjustment module, configured to:
After determining the highlight of the target video, determining whether the duration of the highlight meets a first duration requirement;
if the duration of the highlight segment does not meet the first time requirement, performing an intercepting operation or a supplementing operation on the highlight segment based on the first time requirement.
In a specific embodiment of the present application, the first candidate segment includes one or more first video segments in a first video segment set of the target video, and the obtaining module 310 is configured to obtain the first video segment set by:
dividing a target video according to shots to obtain a plurality of shot fragments;
and merging the shot fragments according to the second time length requirement to obtain a first video fragment set, wherein the time length of each first video fragment in the first video fragment set meets the second time length requirement.
In one embodiment of the present application, the second candidate segments include one or more second video segments in a second set of video segments of the target video, and the obtaining module 310 is configured to determine the level of highlighting of each second video segment in the second set of video segments by:
inputting the current second video clip into a classification model obtained by training in advance aiming at each second video clip in the second video clip set to obtain the probability that the current second video clip output by the classification model is a wonderful positive example;
And determining the highlighting degree of the current second video segment according to the probability that the current second video segment is the highlighting positive example.
In a specific embodiment of the present application, the second determining module 330 is further configured to obtain the classification model through the following steps:
obtaining a training sample set, wherein the training sample set comprises a plurality of sample pairs, and each sample pair comprises a wonderful positive example and a negative example;
training the pre-constructed initial model by using a training sample set until the training is terminated when the set training termination condition is reached, so as to obtain a classification model;
the training of the pre-constructed initial model by using the training sample set comprises the following steps:
for each sample pair, inputting the current sample pair into a pre-constructed initial model;
determining model loss according to the output result of the current sample pair output by the current sample pair and the initial model;
parameters of the initial model are adjusted according to model losses.
In one embodiment of the present application, model loss includes cross entropy classification loss and ordering loss.
In a specific embodiment of the present application, the method further includes a fourth determining module, configured to:
obtaining second watching data of the target video;
An obtaining module 310, configured to:
and determining the highlighting degree of the current second video segment according to the second watching data and the probability of the current second video segment being the highlighting positive example.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The embodiment of the present application further provides an electronic device, as shown in fig. 4, including a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete communication with each other through the communication bus 404,
a memory 403 for storing a computer program;
the processor 401, when executing the program stored in the memory 403, implements the following steps:
obtaining a first alternative segment and a second alternative segment of the target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with a highlight meeting a highlight condition;
a highlight segment of the target video is determined based on the first alternative segment and the second alternative segment.
The communication bus 404 mentioned by the above terminal may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus 404 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface 402 is used for communication between the above-described terminal and other devices.
The memory 403 may include a random access memory (Random Access Memory, abbreviated as RAM) or may include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. Optionally, the memory 403 may also be at least one storage device located remotely from the aforementioned processor.
The processor 401 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment provided herein, there is also provided a computer-readable storage medium having instructions stored therein that, when run on a computer, cause the computer to perform the steps of the video highlight determining method of any of the above embodiments.
In yet another embodiment provided herein, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the steps of the video highlight determining method of any of the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.) means from one website, computer, server, or data center. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (16)

1. A method for determining a video highlight, comprising:
obtaining a first alternative segment and a second alternative segment of a target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with a highlight degree meeting a highlight degree condition;
determining a highlight segment of the target video based on the first alternative segment and the second alternative segment;
the first alternative segment is determined according to the first video segment with the action tag in the first video segment set; the second alternative segments are determined according to the highlighting degree of each second video segment in the second video segment set; the first alternative segment and the second alternative segment are obtained after the target video is segmented;
in the case that the starting time of the first alternative segment in the current segment pair on the time axis is after the starting time of the second alternative segment in the current segment pair on the time axis, if the highlight degree of the second alternative segment in the current segment pair is one of the top N2 highlight degrees, the highlight segment is a second standby segment; the starting time of the second standby segment is the starting time of the first alternative segment in the current segment pair on the time axis, the ending time of the second standby segment is the ending time of the second alternative segment in the current segment pair on the time axis, and N2 is a positive integer;
In the case that the starting time of the first alternative segment in the current segment pair on the time axis is before the starting time of the second alternative segment in the current segment pair on the time axis, and the first alternative segment has an action tag, if the highlight degree of the second alternative segment in the current standby segment pair is one of the top N1 highlights, the highlight segment is the second alternative segment; the starting time of the first segment to be used is the starting time of a second segment to be used in the current segment pair, and the duration of the first segment to be used is a first duration; n1 is a positive integer; the current standby fragment is a fragment pair corresponding to the first standby fragment;
the current segment pair is composed of the second alternative segment and the corresponding first alternative segment, wherein the second alternative segment and the corresponding first alternative segment have overlapping parts on the time axis.
2. The method of claim 1, wherein the determining the highlight of the target video based on the first and second candidate segments comprises:
determining whether a superposition part exists between the first alternative segment and the second alternative segment on the time axis of the target video;
If there is a coincidence, a highlight segment of the target video is determined from the first and second alternative segments for which there is a coincidence on the time axis.
3. The video highlight determining method according to claim 2, wherein the determining the highlight of the target video from the first candidate segment and the second candidate segment having the overlapping portion on the time axis includes:
respectively taking the second alternative segment with the overlapping part on the time axis and the corresponding first alternative segment as a segment pair;
and determining highlight segments of the target video according to the time of the second alternative segment and the first alternative segment in each segment pair on the time axis.
4. A video highlight determining method as claimed in claim 3 wherein said determining a highlight of said target video based on the time of said second and first candidate segments in each said segment pair on said time axis comprises:
for each segment pair, if the starting time of the first alternative segment in the current segment pair on the time axis is before the starting time of the second alternative segment in the current segment pair on the time axis, acquiring a first standby segment in the target video, wherein the starting time of the first standby segment is the starting time of the second alternative segment in the current segment pair on the time axis, and the duration of the first standby segment is a first duration;
Determining whether the first segment to be used has the action tag;
determining the fragment pair corresponding to the first standby fragment with the action tag as a standby fragment pair;
for each of the inactive segment pairs, if the highlight level of the second candidate segment in the current inactive segment pair is above a first highlight level threshold, or the highlight level of the second candidate segment in the current inactive segment pair is one of the top N1 highlights, determining the second candidate segment in the current segment pair as a highlight segment of the target video, N1 being a positive integer.
5. A video highlight determining method as claimed in claim 3 wherein said determining a highlight of said target video based on the time of said second and first candidate segments in each said segment pair on said time axis comprises:
for each segment pair, if the starting time of the first alternative segment in the current segment pair on the time axis is after the starting time of the second alternative segment in the current segment pair on the time axis, acquiring a second standby segment in the target video, wherein the starting time of the second standby segment is the starting time of the first alternative segment in the current segment pair on the time axis, and the ending time of the second standby segment is the ending time of the second alternative segment in the current segment pair on the time axis;
And if the highlighting degree of the second alternative segment in the current segment pair is higher than a second highlighting degree threshold value, or the highlighting degree of the second alternative segment in the current segment pair is one of the highest first N2 highlighting degrees, determining the second standby segment as the highlighting segment of the target video, wherein N2 is a positive integer.
6. The video highlight determination method according to claim 2, wherein in the case where it is determined that there is no overlapping portion of the first candidate segment and the second candidate segment on the time axis, further comprising:
determining the probability that each first alternative segment has the action label, determining the first alternative segment with the probability larger than a probability threshold as a highlight segment of the target video, or determining the first alternative segment corresponding to the top N3 probabilities as the highlight segment of the target video, wherein N3 is a positive integer;
or,
and determining the second alternative fragments with the highlights higher than a third highlights threshold value as the highlights of the target video, or determining the second alternative fragments corresponding to the top N4 highlights as the highlights of the target video, wherein N4 is a positive integer.
7. The video highlight determination method of claim 1, further comprising, prior to said determining the highlight of the target video based on the first and second alternative segments:
determining the dynamic rate of each second alternative segment, wherein the dynamic rate is used for representing the variation difference condition of the image between video frames;
and eliminating the second alternative fragments with the dynamic rate smaller than the average dynamic rate of the target video.
8. The video highlight determination method of claim 1, further comprising, after said determining the highlight of the target video:
detecting whether a target object exists in a first second time period of the highlight;
if the target object is not present, searching backwards for the time when the target object is present from the starting time of the highlight on the time axis of the target video;
updating the starting time of the highlight to the time when the target object exists.
9. The video highlight determination method of claim 1, further comprising, after said determining the highlight of the target video:
Determining whether the duration of the highlight meets a first duration requirement;
and if the duration of the highlight segment does not meet the first time requirement, performing an intercepting operation or a supplementing operation on the highlight segment based on the first time requirement.
10. The video highlight determining method of claim 1, wherein the first alternative segments comprise one or more first video segments of a first set of video segments of the target video, the first set of video segments obtained by:
dividing the target video according to shots to obtain a plurality of shot fragments;
and merging the shot fragments according to a second time length requirement to obtain the first video fragment set, wherein the time length of each first video fragment in the first video fragment set meets the second time length requirement.
11. The video highlight determining method according to any one of claims 1 to 10, wherein the second alternative segments comprise one or more second video segments of a second set of video segments of the target video, the precision of each of the second video segments of the second set of video segments being determined by:
Inputting a current second video clip into a classification model obtained by pre-training aiming at each second video clip in the second video clip set, so as to obtain the probability that the current second video clip output by the classification model is a wonderful positive example;
and determining the highlighting degree of the current second video fragment according to the probability that the current second video fragment is the highlighting positive example.
12. The video highlight determination method according to claim 11, wherein the classification model is obtained by training in advance by:
obtaining a training sample set, wherein the training sample set comprises a plurality of sample pairs, and each sample pair comprises a wonderful positive example and a negative example;
training the pre-constructed initial model by using the training sample set until the training is terminated when the set training termination condition is reached, so as to obtain the classification model;
the training the pre-constructed initial model by using the training sample set comprises the following steps:
for each sample pair, inputting the current sample pair into a pre-constructed initial model;
determining model loss according to the output results of the current sample pair and the current sample pair output by the initial model;
And adjusting parameters of the initial model according to the model loss.
13. The video highlight determination method of claim 11, further comprising:
obtaining second viewing data of the target video;
the determining the highlighting degree of the current second video segment according to the probability that the current second video segment is the highlighting positive example comprises the following steps:
and determining the highlighting degree of the current second video segment according to the second watching data and the probability that the current second video segment is the highlighting positive example.
14. A video highlight determination apparatus, comprising:
the obtaining module is used for obtaining a first alternative segment and a second alternative segment of the target video, wherein the first alternative segment is a video segment with a set action label, and the second alternative segment is a video segment with a highlight degree meeting a highlight degree condition;
a determining module configured to determine a highlight segment of the target video based on the first candidate segment and the second candidate segment;
the first alternative segment is determined according to the first video segment with the action tag in the first video segment set; the second alternative segments are determined according to the highlighting degree of each second video segment in the second video segment set; the first alternative segment and the second alternative segment are obtained after the target video is segmented;
In the case that the starting time of the first alternative segment in the current segment pair on the time axis is after the starting time of the second alternative segment in the current segment pair on the time axis, if the highlight degree of the second alternative segment in the current segment pair is one of the top N2 highlight degrees, the highlight segment is a second standby segment; the starting time of the second standby segment is the starting time of the first alternative segment in the current segment pair on the time axis, the ending time of the second standby segment is the ending time of the second alternative segment in the current segment pair on the time axis, and N2 is a positive integer;
in the case that the starting time of the first alternative segment in the current segment pair on the time axis is before the starting time of the second alternative segment in the current segment pair on the time axis, and the first alternative segment has an action tag, if the highlight degree of the second alternative segment in the current standby segment pair is one of the top N1 highlights, the highlight segment is the second alternative segment; the starting time of the first segment to be used is the starting time of a second segment to be used in the current segment pair, and the duration of the first segment to be used is a first duration; n1 is a positive integer; the current standby fragment is a fragment pair corresponding to the first standby fragment;
The current segment pair is composed of the second alternative segment and the corresponding first alternative segment, wherein the second alternative segment and the corresponding first alternative segment have overlapping parts on the time axis.
15. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the video highlight determination method according to any one of claims 1 to 13 when executing a program stored on a memory.
16. A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the video highlight determining method according to any of claims 1 to 13.
CN202211054859.1A 2022-08-31 2022-08-31 Video highlight determination method and device, electronic equipment and storage medium Active CN115412765B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211054859.1A CN115412765B (en) 2022-08-31 2022-08-31 Video highlight determination method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211054859.1A CN115412765B (en) 2022-08-31 2022-08-31 Video highlight determination method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115412765A CN115412765A (en) 2022-11-29
CN115412765B true CN115412765B (en) 2024-03-26

Family

ID=84163759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211054859.1A Active CN115412765B (en) 2022-08-31 2022-08-31 Video highlight determination method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115412765B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116017041A (en) * 2022-12-05 2023-04-25 北京有竹居网络技术有限公司 Video pushing method and device, computer equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102427507A (en) * 2011-09-30 2012-04-25 北京航空航天大学 Football video highlight automatic synthesis method based on event model
CN104994425A (en) * 2015-06-30 2015-10-21 北京奇艺世纪科技有限公司 Video labeling method and device
CN109977735A (en) * 2017-12-28 2019-07-05 优酷网络技术(北京)有限公司 Move the extracting method and device of wonderful
CN110191357A (en) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 The excellent degree assessment of video clip, dynamic seal face generate method and device
CN111669656A (en) * 2020-06-19 2020-09-15 北京奇艺世纪科技有限公司 Method and device for determining wonderful degree of video clip
CN112511854A (en) * 2020-11-27 2021-03-16 刘亚虹 Live video highlight generation method, device, medium and equipment
CN113194359A (en) * 2021-04-27 2021-07-30 武汉星巡智能科技有限公司 Method, device, equipment and medium for automatically grabbing baby wonderful video highlights
CN113365147A (en) * 2021-08-11 2021-09-07 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium based on music card point
WO2021184852A1 (en) * 2020-03-16 2021-09-23 平安科技(深圳)有限公司 Action region extraction method, device and apparatus, and computer-readable storage medium
CN114329072A (en) * 2021-12-23 2022-04-12 北京市商汤科技开发有限公司 Video processing method and device, electronic equipment and storage medium
CN114845149A (en) * 2021-02-01 2022-08-02 腾讯科技(北京)有限公司 Editing method of video clip, video recommendation method, device, equipment and medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109547859B (en) * 2017-09-21 2021-12-07 腾讯科技(深圳)有限公司 Video clip determination method and device
US11025964B2 (en) * 2019-04-02 2021-06-01 Wangsu Science & Technology Co., Ltd. Method, apparatus, server, and storage medium for generating live broadcast video of highlight collection
US11678029B2 (en) * 2019-12-17 2023-06-13 Tencent Technology (Shenzhen) Company Limited Video labeling method and apparatus, device, and computer-readable storage medium
CN113132752B (en) * 2019-12-30 2023-02-24 阿里巴巴集团控股有限公司 Video processing method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102427507A (en) * 2011-09-30 2012-04-25 北京航空航天大学 Football video highlight automatic synthesis method based on event model
CN104994425A (en) * 2015-06-30 2015-10-21 北京奇艺世纪科技有限公司 Video labeling method and device
CN109977735A (en) * 2017-12-28 2019-07-05 优酷网络技术(北京)有限公司 Move the extracting method and device of wonderful
CN110191357A (en) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 The excellent degree assessment of video clip, dynamic seal face generate method and device
WO2021184852A1 (en) * 2020-03-16 2021-09-23 平安科技(深圳)有限公司 Action region extraction method, device and apparatus, and computer-readable storage medium
CN111669656A (en) * 2020-06-19 2020-09-15 北京奇艺世纪科技有限公司 Method and device for determining wonderful degree of video clip
CN112511854A (en) * 2020-11-27 2021-03-16 刘亚虹 Live video highlight generation method, device, medium and equipment
CN114845149A (en) * 2021-02-01 2022-08-02 腾讯科技(北京)有限公司 Editing method of video clip, video recommendation method, device, equipment and medium
CN113194359A (en) * 2021-04-27 2021-07-30 武汉星巡智能科技有限公司 Method, device, equipment and medium for automatically grabbing baby wonderful video highlights
CN113365147A (en) * 2021-08-11 2021-09-07 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium based on music card point
CN114329072A (en) * 2021-12-23 2022-04-12 北京市商汤科技开发有限公司 Video processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115412765A (en) 2022-11-29

Similar Documents

Publication Publication Date Title
US10575037B2 (en) Video recommending method, server, and storage media
CN109145784B (en) Method and apparatus for processing video
US10990877B2 (en) Frame selection based on a trained neural network
CN110309795B (en) Video detection method, device, electronic equipment and storage medium
CN110909205B (en) Video cover determination method and device, electronic equipment and readable storage medium
CN110856037B (en) Video cover determination method and device, electronic equipment and readable storage medium
US20090116695A1 (en) System and method for processing digital media
JP7394809B2 (en) Methods, devices, electronic devices, media and computer programs for processing video
CN110321845B (en) Method and device for extracting emotion packets from video and electronic equipment
CN111274442B (en) Method for determining video tag, server and storage medium
CN113613065A (en) Video editing method and device, electronic equipment and storage medium
CN113469298B (en) Model training method and resource recommendation method
CN110287375B (en) Method and device for determining video tag and server
CN111708909B (en) Video tag adding method and device, electronic equipment and computer readable storage medium
CN111314732A (en) Method for determining video label, server and storage medium
CN113779381B (en) Resource recommendation method, device, electronic equipment and storage medium
CN115412765B (en) Video highlight determination method and device, electronic equipment and storage medium
CN110674345A (en) Video searching method and device and server
CN112150457A (en) Video detection method, device and computer readable storage medium
CN112040325B (en) Video playing method and device, electronic equipment and storage medium
CN113239183A (en) Training method and device of ranking model, electronic equipment and storage medium
CN113472834A (en) Object pushing method and device
CN109800326B (en) Video processing method, device, equipment and storage medium
CN115190357B (en) Video abstract generation method and device
CN108882024B (en) Video playing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant