CN115731500A

CN115731500A - Video highlight detection method, device, medium and electronic equipment

Info

Publication number: CN115731500A
Application number: CN202211551849.9A
Authority: CN
Inventors: 杜臣; 周文
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-03-03

Abstract

The present disclosure relates to the field of videos, and in particular, to a method, an apparatus, a medium, and an electronic device for highlight detection of a video. The method comprises the following steps: dividing a target video into a plurality of video segments, and determining a highlight score of each video segment; performing lens splitting processing on a plurality of video clips to obtain a plurality of shot clips, wherein each shot clip comprises one or more continuous video clips; for each shot, determining the highlight score of the shot according to the highlight score of the video clip included by the shot; and determining highlight segments of the target video according to the highlight scores of each shot segment. Therefore, the complete highlight segment can be accurately determined, and the watching experience of the user is improved.

Description

Video highlight detection method, device, medium and electronic equipment

Technical Field

The present disclosure relates to the field of videos, and in particular, to a method, an apparatus, a medium, and an electronic device for highlight detection of a video.

Background

Video highlights refer to a segment of the video that is highlights. Compared with medium videos and small videos, the rhythm of the long video is relatively slow, and audiences need to invest a large period of time to screen and enter a plot. The highlight time of the long video or the highlight segment of the long video is automatically played on the network video platform, so that the efficiency of screening and finding the long video by a user can be improved, the user can more conveniently sense and consume the content of the long video, and the effect of draining the long video is achieved.

In a common video highlight detection method, a preset model is used for highlight scoring at each moment of a video, and if a moment with a high score is directly obtained as highlight, the obtained highlight segments may be scattered extremely, and the start or end of the video may be more abrupt, so that the viewing experience of a user is affected.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a video highlight detection method, including:

dividing a target video into a plurality of video segments and determining a highlight score of each video segment; performing lens splitting processing on the plurality of video clips to obtain a plurality of shot clips, wherein each shot clip comprises one or more continuous video clips;

for each of the shot segments, determining a highlight score for the shot segment according to the highlight scores of the video segments that the shot segment comprises;

and determining highlight segments of the target video according to the highlight scores of each shot segment.

In a second aspect, the present disclosure provides a video highlight detection apparatus, including:

the first determining module is used for dividing a target video into a plurality of video segments and determining the highlight score of each video segment;

the processing module is used for performing mirror splitting processing on the plurality of video clips to obtain a plurality of shot clips, wherein each shot clip comprises one or more continuous video clips;

a second determining module, configured to determine, for each of the shot segments, a highlight score of the shot segment according to a highlight score of the video segment included in the shot segment;

and the third determining module is used for determining highlight segments of the target video according to the highlight scores of all the shot segments.

In a third aspect, the present disclosure provides a computer readable medium, on which a computer program is stored, which program, when being executed by a processing apparatus, realizes the steps of the video highlight detection method described above.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

and the processing device is used for executing the computer program in the storage device so as to realize the steps of the video highlight detection method.

In the technical scheme, the plurality of video clips are subjected to the lens splitting processing to obtain the plurality of shot clips, and the shot clips comprise one or more continuous video clips, so that the shot clips are more complete compared with the video clips. According to the highlight scores of the video clips included by the shot clips, the highlight scores of the shot clips can be accurately determined. Therefore, the complete highlight segment can be accurately determined, and the watching experience of the user is improved.

Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:

fig. 1 is a flowchart of a video highlight detection method provided according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of highlight segment determination provided according to an embodiment of the present disclosure.

Fig. 3 is a schematic diagram of a highlight score prediction process provided in accordance with one embodiment of the present disclosure.

Fig. 4 is a block diagram of a video highlight detection apparatus provided according to an embodiment of the present disclosure.

Fig. 5 is a schematic structural diagram of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

It is understood that before the technical solutions disclosed in the embodiments of the present disclosure are used, the type, the use range, the use scene, etc. of the personal information related to the present disclosure should be informed to the user and obtain the authorization of the user through a proper manner according to the relevant laws and regulations.

For example, in response to receiving a user's active request, prompt information is sent to the user to explicitly prompt the user that the requested operation to be performed would require acquisition and use of personal information to the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server, or a storage medium that performs the operations of the disclosed technical solution, according to the prompt information.

As an alternative but non-limiting implementation manner, in response to receiving an active request from the user, the manner of sending the prompt information to the user may be, for example, a pop-up window manner, and the prompt information may be presented in a text manner in the pop-up window. In addition, a selection control for providing personal information to the electronic device by the user's selection of "agreeing" or "disagreeing" can be carried in the pop-up window.

It is understood that the above notification and user authorization process is only illustrative and not limiting, and other ways of satisfying relevant laws and regulations may be applied to the implementation of the present disclosure.

Meanwhile, it is understood that the data involved in the present technical solution (including but not limited to the data itself, the acquisition or use of the data) should comply with the requirements of the corresponding laws and regulations and the related regulations.

Fig. 1 is a flowchart of a video highlight detection method provided according to an embodiment of the present disclosure, where the method may be applied to a terminal, such as a smart phone, a tablet computer, a Personal Computer (PC), a notebook computer, or the like, and may also be applied to a server. As shown in fig. 1, the method may include S101 to S104.

S101, dividing a target video into a plurality of video segments, and determining a highlight score of each video segment.

Wherein the target video may be a video file that is stored in advance. The format, the acquisition mode, and the like of the video file are not particularly limited in this disclosure. The video segment may be determined based on a sliding time window or may be determined based on the current time and a video segment duration threshold. Illustratively, the video segment duration threshold may be set to 1s. In this way, the target video can be finely divided.

The highlight score of each video clip can be determined based on a model which is trained in advance, specifically, the video clip can be input into a highlight score prediction model which is trained in advance, and an obtained output result of the model is the highlight score of the video clip. It should be noted that the highlight score prediction model may be a machine learning model trained in a machine learning manner and capable of performing highlight score prediction according to the video segment. The highlight score prediction model may be stored locally, for example, and invoked locally upon each use, or may be stored on a third party platform, and invoked from a third party upon each use, without specific limitation. The higher the highlight score, the more wonderful the video segment will be.

S102, performing lens splitting processing on the plurality of video clips to obtain a plurality of shot clips.

Wherein each shot segment comprises one or more consecutive video segments.

In order to ensure the integrity of the finally obtained highlight segment, a plurality of video segments can be subjected to a lens splitting process to obtain a shot segment with higher integrity relative to the video segment. The means for performing the split-mirror processing on the video clips are various, for example, the image of the last frame in the previous video clip and the image of the first frame in the next video clip in the two adjacent video clips can be compared in terms of characteristics, and if the difference between the two images is large, the first video clip and the second video clip can be divided into two different shot clips; on the contrary, if the difference between the two frames of images is small, the first video segment and the second video segment can be divided into the same shot segment.

And S103, determining the highlight score of the shot according to the highlight score of the video clip included in the shot for each shot.

Wherein, an average of highlight scores of the video segments included in the shot may be determined as the highlight score of the shot. Specifically, the highlight score of the ith shot can be determined by the following formula:

wherein s is _i For the ith shot, f(s) _i ) Is the highlight score, k, of the ith shot _i Is the number of video segments, p, in the ith shot _ij Is the highlight score of the jth video clip in the ith shot clip. If the shot segment includes three video segments, the average of the highlight scores of the three video segments can be determined as the highlight score of the shot segment. The highlight score of a shot can intuitively and simply reflect the overall highlight level of all the video clips included in the shot.

And S104, determining highlight segments of the target video according to the highlight scores of all the shot segments.

For example, a highlight score threshold may be preset, and if the highlight score of a shot is greater than the highlight score threshold, the shot may be determined as a highlight section of the target video. For example, if the highlight scores of the three shots are 0.2, 0.56, and 0.84, respectively, and the highlight score threshold is 0.5, the shot with the highlight score of 0.56 and the shot with the highlight score of 0.84 may be determined as the highlight segments of the target video.

The highlight score threshold may be a fixed threshold, or a dynamic threshold determined according to the highlight scores of the video segments in the target video. For example, the highlight score threshold may be an average value of highlight scores of video segments in the target video, or a video segment with a highlight score of the top 75% of all video segments may be first screened, and a median of the highlight scores of the screened video segments may be used as the highlight score threshold. The dynamic highlight score threshold can be better adapted to the target video, so that the accuracy of the determined highlight segments of the target video is improved.

For example, a preset number may be set, and the preset number may represent the number of highlight segments that the user wants to acquire, and specifically, the preset number of shot segments with the highest highlight score may be determined as the highlight segments of the target video. For example, if the target video includes three shots having highlight scores of 0.2, 0.56, and 0.84, respectively, and the preset number is 2, the shot having the highlight score of 0.56 and the shot having the highlight score of 0.84 may be determined as the highlight shots of the target video.

In the technical scheme, the plurality of video clips are subjected to lens splitting processing to obtain a plurality of shot clips, and the shot clips comprise one or more continuous video clips, so that the shot clips are more complete compared with the video clips. According to the highlight scores of the video clips included by the shot clips, the highlight scores of the shot clips can be accurately determined. Therefore, the complete highlight segment can be accurately determined, and the watching experience of the user is improved.

The specific implementation of determining the highlight segments of the target video according to the highlight score of each shot segment in S104 may further be:

determining a shot segment with a highlight score larger than a preset highlight score threshold value as a first shot segment;

if the number of the first shot sections is multiple, combining the multiple first shot sections to obtain at least one second shot section, wherein the multiple first shot sections combined together to form one second shot section satisfy the following conditions: in any two adjacent first shot sections, the time difference between the ending time of the preceding first shot section and the starting time of the following first shot section is smaller than a preset time difference threshold value;

and determining highlight segments of the target video according to the at least one second shot segment.

The highlight score threshold may be a fixed threshold or a dynamic threshold, and if the highlight score of the shot is greater than a preset highlight score threshold, the shot may be determined as the first shot. The adjacent shot segments often have certain continuity, and the possibility of continuity between two adjacent first shot segments can be characterized by the time difference. In the two adjacent first shot sections, the smaller the time difference between the end time of the preceding first shot section and the start time of the succeeding first shot section, the higher the possibility that the two first shot sections have continuity. If the merging condition is met, determining that two adjacent first shot segments have certain continuity, and merging to further ensure the integrity of highlight segments of the target video; if the first shot segment and other adjacent first shot segments do not satisfy the merging condition, the first shot segment can be independently used as one of the second shot segments.

As shown in fig. 2, each video segment is 1s in length, and a shot is obtained from several video segments: the highlight score of the first shot A1 is 0.76; the highlight score of the second shot A2 is 0.26; the highlight score of the third shot A3 is 0.1; the highlight score of the fourth shot segment A4 is 0.652; the highlight score of the fifth shot A5 is 0.19; the highlight score of the sixth shot segment A6 is 0.58; the highlight score of the seventh shot A7 is 0.38; the high light score of the eighth shot segment A8 is 0.7. If the predetermined high score threshold is 0.5, the first, fourth, sixth, and eighth lens segments A1, A4, A6, and A8 can be determined as the first lens segment. The time difference between the first and fourth lens segments A1 and A4 is 5s, the time difference between the fourth and sixth lens segments A4 and A6 is 2s, and the time difference between the sixth and eighth lens segments A6 and A8 is 3s. If the preset time difference threshold is 4s, it may be determined that the first shot segment A1 and the fourth shot segment A4 do not satisfy the merging condition, and the first shot segment A1 may be determined as a first second shot segment B1; the fourth and sixth lens segments A4 and A6 satisfy the merging condition, and the sixth and eighth lens segments A6 and A8 satisfy the merging condition, so that the fourth, sixth and eighth lens segments A4, A6 and A8 can be merged into the second lens segment B2.

In an embodiment, determining a highlight segment of the target video according to the at least one second shot segment may include: and directly determining the second shot segment as a highlight segment of the target video. As shown in fig. 2, the first second shot segment B1 and the second shot segment B2 may be determined as highlight segments of the target video.

In other embodiments, the specific implementation of determining the highlight segment of the target video according to the at least one second shot segment may further be:

determining the highlight score of the second shot section according to the highlight score of the first shot section;

and if the number of the second shot sections exceeds the preset number, determining the second shot sections with the highest highlight score ranking in the preset number as highlight sections.

For example, an average of highlight scores of the first shot sections included in the second shot section may be determined as a highlight score of the second shot section. As described above, the first shot included in the first second shot B1 has the first shot A1, and the highlight score of 0.76 of the first shot A1 can be determined as the highlight score of the first second shot B1; the first shot segment included in the second shot segment B2 includes a fourth shot segment A4, a sixth shot segment A6, and an eighth shot segment A8, and the high scores of these shot segments are 0.652, 0.58, and 0.7, respectively, and the average value is 0.644, then 0.644 can be determined as the high score of the second shot segment B2.

The preset number may represent the number of highlight segments that the user wants to capture. The second shot segments with the highest highlight score ranking and the preset number are determined as highlight segments, so that the number of the determined highlight segments can meet the expected number of users, the determined highlight segments are high in wonderness, and the purpose of guiding the target videos is achieved easily.

determining candidate shot segments according to a preset duration and at least one second shot segment;

and determining highlight segments according to the candidate shot segments.

The length of the second shot segment is determined by at least one first shot segment constituting the second shot segment. If the number of the first shot sections included in the second shot section is too large, the second shot section may be too long, so that the obtained highlight section is too long, and a user needs to invest a large period of time to perform further screening on the basis of the determined highlight section. If the number of the first shots included in the second shot is too small, for example, only one first video shot is included therein, the second shot may be too short, so that the obtained highlight shots are too short, and it is difficult for the user to enter the scenario according to the highlight shots. Whether the duration of the second shot segment is proper or not can be determined by comparing the preset duration with the duration of the second shot segment so as to screen out proper candidate shot segments and further obtain highlight segments.

The specific implementation of determining the candidate shot segments according to the preset duration and the at least one second shot segment may be:

and determining a second shot segment with the segment duration not shorter than the preset duration as a candidate shot segment.

The preset time period may be preset, for example, may be set to 10s. If the segment duration of the second shot segment is 15s, determining the second shot segment as a candidate shot segment; if the clip duration of the second shot clip is 6s, the second shot clip may not be determined as a candidate shot clip. Therefore, the determined candidate shot segments are all segments with the duration being greater than the preset duration, the candidate shot segments can be prevented from being too short, the highlight segments determined subsequently are prevented from being too short, and the user can enter the plot according to the highlight segments.

The specific implementation manner of determining the candidate shot according to the preset duration and the at least one second shot may further be:

and cutting a second shot section of which the section duration is longer than the preset duration according to the preset duration, and determining the second shot section of which the section duration is not longer than the preset duration and the cut second shot section as candidate shot sections.

The preset time period may be preset, for example, 20s. And if the segment duration of the second shot segment is 35s, cutting the second shot segment. Illustratively, starting from the start time of the second shot section, a section of a preset duration is reserved and determined as the second shot section after clipping, that is, the section overtime is deleted. For example, the second shot section has a section duration of 35s, and a section 20s before the second shot section before clipping can be reserved in the second shot section after clipping as a candidate shot section. If the segment duration of the second shot segment is 15s, no cropping is performed, and the second shot segment can be directly determined as a candidate shot segment. Therefore, the determined candidate shot segments are all segments with the duration not greater than the preset duration, the candidate shot segments can be prevented from being too long, the subsequently determined highlight segments are prevented from being too long, and the user does not need to spend too much duration to further screen the content in the determined highlight segments.

and cutting a second shot section of which the section duration is longer than the preset duration according to the preset duration, and determining the second shot section of which the section duration is equal to the preset duration and the cut second shot section as candidate shot sections.

The preset time period may be preset, for example, may be set to 15s. If the segment duration of the second shot segment is 20s, the second shot segment is cut, for example, a segment with a preset duration is reserved by taking the starting time of the second shot segment as a starting point, and the segment is determined as the second shot segment after cutting, that is, a segment 15s before the second shot segment can be reserved in the second shot segment after cutting, and is determined as a candidate shot segment. If the segment duration of the second shot segment is 10s, the second shot segment may not be determined as a candidate shot segment, and the second shot segment may be removed. If the segment duration of the second shot segment is 15s, the second shot segment may be determined as a candidate shot segment. Therefore, the determined candidate shot segments are all segments with the duration being the preset duration, the consistency of the durations of the candidate shot segments is ensured, the consistency of the durations of the highlight segments is ensured, a user can enter a plot according to the determined highlight segments, and meanwhile, the content in the highlight segments does not need to be further screened by spending too much duration.

In another embodiment, the specific embodiment of determining the highlight segment of the target video according to the preset duration and the at least one second shot segment may further include: determining a highlight score of the second shot segment according to the highlight score of the first shot segment. Thus, when determining a highlight segment from the candidate shot segments, it can be determined by:

and if the number of the candidate shot segments exceeds the preset number, determining the candidate shot segments with the highest highlight score ranking in the preset number as highlight segments.

Specifically, an average of highlight scores of the first shot sections included in the second shot section may be determined as a highlight score of the second shot section. After each second shot is processed by the preset duration, if the number of the obtained candidate shots exceeds the preset number, the candidate shots with the highest highlight score ranking with the preset number can be determined as highlight shots. In this way, the number of highlight segments determined can be matched with the expected number of users while ensuring the quality of the highlight segments.

In the case where the number of the first shot sections is one, the first shot section may be directly determined as a highlight section.

Alternatively, in a case where the number of the first shot sections is one, it may be further determined whether a section duration of the first shot section exceeds a preset duration. And if the segment time length of the first shot segment is longer than the preset time length, cutting the first shot segment according to the preset time length, and determining the cut first shot segment as a highlight segment.

Wherein, in the case that the number of the first shot segments is one, if the segment duration of the first shot segments is shorter, the first video segment can be directly determined as the highlight segment. If the segment duration of the first shot segment is longer, the first shot segment can be cut, and the cut first shot segment is determined to be a highlight segment. Illustratively, with the starting time of the first shot as a starting point, a shot with a preset duration is reserved and determined as the first shot after clipping, and the overtime shot is deleted. Therefore, the determined highlight segments can be prevented from being too long, and the user does not need to spend too long time for further screening the determined highlight segments.

In some embodiments, the specific implementation of determining the highlight score of each of the video segments in S101 may be:

and aiming at each video clip, extracting the multi-modal characteristics of the video clip, and determining the highlight score of the video clip according to the multi-modal characteristics and a highlight score prediction model trained in advance.

Wherein the multi-modal features may include at least two of: image features, audio features, and text features.

The pre-trained highlight score prediction model may be a neural network model. FIG. 3 is a schematic diagram of a highlight score prediction process provided in accordance with an embodiment of the present disclosure. As shown in fig. 3, the target video is divided into n video segments, and at least two of an image frame, an audio segment, and text information of each video segment may be extracted.

The image frames of the video clip may be extracted by a visual model, which may be, for example, a 3D convolutional neural network model, and the corresponding visual features D1 may be output.

An audio segment of the video segment may be extracted by an audio model, which may be, for example, a VGGsih neural network model, and corresponding audio feature d2 may be output.

Text information of the video clip can be extracted through a language model, and corresponding text features d3 can be output, wherein the language model can recognize words in audio based on ASR (Automatic Speech Recognition), recognize subtitle text in an image based on OCR (Optical Character Recognition), and extract text information of the video clip.

Aiming at each video clip, according to the multi-modal characteristics corresponding to the video clip, the video sequence characteristics of the video clip can be obtained; further, the fused feature corresponding to the video clip can be obtained based on the video sequence feature of the video clip and the video sequence feature corresponding to the adjacent video clip; a highlight score for the video segment may be determined based on the fused features.

Examples are given where the multi-modal features include image features, audio features, and text features. Can segment the video l _n The corresponding visual feature d1, audio feature d2 and text feature d3 are spliced, and the corresponding position code is added to obtain the video segment l _n The video sequence characteristic of (1). The video sequence characteristics can be further fused through a multi-layer transform encoder model, namely the method is realized in the transform encoder model: determining the merged video sequence by combining the context information of the video segment (i.e. the video sequence characteristics corresponding to the adjacent video segments)Characteristic v _n . Inputting the fused features into a highlight score prediction model full-link layer (MLP head) to obtain a highlight score p of the video segment _n 。

Therefore, multi-modal characteristics such as images, audios and texts can be fused, highlight recognition advantage complementation of various modes is realized, and the problem of insufficient characteristic description of a single mode in a highlight detection process is effectively solved. Meanwhile, when the highlight score of the video clip is determined, the video sequence characteristics corresponding to the adjacent video clips are combined, so that the highlight scores of the obtained video clips have certain relevance, and the accuracy of the highlight clip determined based on the highlight scores is improved.

Based on the same inventive concept, the disclosure also provides a video highlight detection device. Fig. 4 is a block diagram illustrating a video highlight detection apparatus according to an exemplary embodiment, and as shown in fig. 4, the video highlight detection apparatus 400 may include:

a first determining module 401, configured to divide a target video into a plurality of video segments, and determine a highlight score of each of the video segments;

a processing module 402, configured to perform a lens splitting process on the multiple video segments to obtain multiple shot segments, where each shot segment includes one or more consecutive video segments;

a second determining module 403, configured to determine, for each of the shots, a highlight score of the shot according to the highlight scores of the video clips included in the shot;

a third determining module 404, configured to determine highlight segments of the target video according to the highlight scores of each of the shot segments.

Optionally, the third determining module 404 may include:

the first determining submodule is used for determining the shot segments with the highlight scores larger than a preset highlight score threshold value as first shot segments;

a merging submodule, configured to, if the number of the first shot segments is multiple, merge the multiple first shot segments to obtain at least one second shot segment, where multiple first shot segments merged together to form one second shot segment satisfy the following conditions: in any two adjacent first lens segments, the time difference between the ending time of the preceding first lens segment and the starting time of the following first lens segment is smaller than a preset time difference threshold value;

and the second determining sub-module is used for determining the highlight segment of the target video according to the at least one second shot segment.

Optionally, the second determining submodule includes:

a third determining submodule, configured to determine a highlight score of the second shot according to the highlight score of the first shot;

a fourth determining sub-module, configured to determine, as the highlight segment, the second lens segment with the highest highlight score rank with respect to the preset number if the number of the second lens segments exceeds the preset number.

Optionally, the second determining sub-module includes:

a fifth determining submodule, configured to determine a candidate shot according to a preset duration and the at least one second shot;

and the sixth determining submodule is used for determining the highlight segments according to the candidate shot segments.

Optionally, the fifth determining sub-module is configured to determine the candidate shots by one of:

determining a second shot segment with segment duration not shorter than the preset duration as the candidate shot segment; or,

cutting a second shot section of which the section duration is longer than the preset duration according to the preset duration, and determining the second shot section of which the section duration is not longer than the preset duration and the cut second shot section as the candidate shot sections; or,

and cutting a second shot section of which the section duration is longer than the preset duration according to the preset duration, and determining the second shot section of which the section duration is equal to the preset duration and the cut second shot section as the candidate shot sections.

Optionally, the sixth determining sub-module includes:

a seventh determining sub-module, configured to determine a highlight score of the second shot according to the highlight score of the first shot;

an eighth determining submodule, configured to determine, as the highlight segment, the candidate shot segment with the highest highlight score rank in the preset number if the number of the candidate shot segments exceeds a preset number.

Optionally, the third determining module 404 further includes:

and the ninth determining submodule is used for cutting the first shot section according to the preset time length and determining the cut first shot section as the highlight section if the number of the first shot section is one and the section time length of the first shot section is longer than the preset time length.

Optionally, the first determining module 401 is configured to determine the highlight score of each of the video segments by:

and aiming at each video clip, extracting multi-modal characteristics of the video clip, and determining the highlight score of the video clip according to the multi-modal characteristics and a highlight score prediction model trained in advance.

Optionally, the multi-modal features comprise at least two of: image features, audio features, and text features.

Referring now to FIG. 5, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, the electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, or the like; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or installed from the storage means 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

dividing a target video into a plurality of video segments and determining a highlight score of each video segment;

performing lens splitting processing on the plurality of video clips to obtain a plurality of shot clips, wherein each shot clip comprises one or more continuous video clips;

for each of the shots, determining a highlight score for the shot from the highlight scores of the video segments that the shot comprises;

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module does not in some cases constitute a limitation of the module itself, for example, a processing module may also be described as a "shot determination module".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In accordance with one or more embodiments of the present disclosure, example 1 provides a video highlight detection method, including:

Example 2 provides the method of example 1, including, in accordance with one or more embodiments of the present disclosure:

determining a highlight segment of the target video according to the at least one second shot segment.

Example 3 provides the method of example 2, the determining, from the at least one second shot section, a highlight section of the target video, including:

determining a highlight score of the second shot segment according to the highlight score of the first shot segment;

and if the number of the second shot segments exceeds a preset number, determining the second shot segments with the highest highlight score ranking in the preset number as the highlight segments.

Example 4 provides the method of example 2, the determining, from the at least one second shot section, a highlight section of the target video, including:

determining candidate shot segments according to a preset duration and the at least one second shot segment;

and determining the highlight segments according to the candidate shot segments.

Example 5 provides the method of example 4, wherein determining the candidate shot segments according to the preset duration and the at least one second shot segment comprises one of:

Example 6 provides the method of example 4, the determining, from the at least one second shot section, a highlight section of the target video, further comprising:

the determining the highlight segments according to the candidate shot segments comprises:

and if the number of the candidate shot segments exceeds a preset number, determining the candidate shot segments with the highest highlight score ranking in the preset number as the highlight segments.

Example 7 provides the method of example 2, the determining highlight segments of the target video according to the highlight scores of each of the shot segments, further comprising:

if the number of the first shot segments is one, and the segment duration of the first shot segments is longer than the preset duration, cutting the first shot segments according to the preset duration, and determining the cut first shot segments as the highlight segments.

Example 8 provides the method of any one of examples 1 to 7, wherein, for each of the video segments, multi-modal features of the video segment are extracted, and a highlight score of the video segment is determined according to the multi-modal features and a pre-trained highlight score prediction model.

In accordance with one or more embodiments of the present disclosure, example 9 provides the method of example 8,

example 10 provides, in accordance with one or more embodiments of the present disclosure, a video highlight detection apparatus, comprising:

the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for dividing a target video into a plurality of video segments and determining the highlight score of each video segment;

the processing module is used for performing split-lens processing on the plurality of video clips to obtain a plurality of shot clips, wherein each shot clip comprises one or more continuous video clips;

a second determining module, configured to determine, for each of the shots, a highlight score of the shot according to the highlight score of the video clip included in the shot;

Example 11 provides a computer-readable medium, on which is stored a computer program that, when executed by a processing device, implements the steps of the method of any of examples 1-9.

Example 12 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to implement the steps of the method of any of examples 1 to 9.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and the technical features disclosed in the present disclosure (but not limited to) having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Claims

1. A method for highlight detection of a video, comprising:

2. The method of claim 1, wherein said determining highlight segments of said target video according to the highlight score of each said shot segment comprises:

if the number of the first shot sections is multiple, combining the multiple first shot sections to obtain at least one second shot section, wherein the multiple first shot sections combined together to form one second shot section satisfy the following conditions: in any two adjacent first lens segments, the time difference between the ending time of the preceding first lens segment and the starting time of the following first lens segment is smaller than a preset time difference threshold value;

3. The method of claim 2, wherein determining highlight segments of the target video according to the at least one second shot segment comprises:

and if the number of the second shot sections exceeds a preset number, determining the second shot sections with the highest highlight score ranking in the preset number as the highlight sections.

4. The method of claim 2, wherein determining highlight segments of the target video according to the at least one second shot segment comprises:

5. The method according to claim 4, wherein the determining candidate shots according to the preset duration and the at least one second shot comprises one of:

6. The method of claim 4, wherein determining highlight segments of the target video according to the at least one second shot segment further comprises:

said determining said highlight segment according to said candidate shot segment comprises:

7. The method of claim 2, wherein said determining highlight segments of said target video according to the highlight score of each said shot segment further comprises:

8. The method of any one of claims 1-7, wherein said determining a highlight score for each of said video segments comprises:

9. The method of claim 8, wherein the multi-modal features comprise at least two of: image features, audio features, and text features.

10. A video highlight detection device, comprising:

11. A computer-readable medium, on which a computer program is stored which, when being executed by a processing means, carries out the steps of the method according to any one of claims 1-9.

12. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 9.