CN114708287A

CN114708287A - Shot boundary detection method, device and storage medium

Info

Publication number: CN114708287A
Application number: CN202011492635.XA
Authority: CN
Inventors: 吴悠; 陈长国
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2022-07-05

Abstract

The embodiment of the application provides a shot boundary detection method, shot boundary detection equipment and a storage medium. In an embodiment of the present application, a shot boundary detection request for a target video may be received; extracting space-time slices from a target video; determining a video frame of a suspected shot boundary in a target video according to the space-time slice, and taking the video frame as a candidate frame; and selecting a target frame meeting the preset local feature matching requirement from the candidate frames as a shot boundary frame in the target video. Therefore, in the embodiment of the application, a mode of combining space-time slices and local feature matching is adopted, the video frames of suspected shot boundaries can be preliminarily screened out based on the space-time slices, and real shot boundary frames can be further selected out through the local feature matching, so that the calculation amount can be greatly reduced, and the shot boundary frames can be searched in multiple levels, and therefore, the efficiency, the recall rate and the accuracy of shot boundary detection can be effectively improved.

Description

Shot boundary detection method, device and storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a shot boundary detection method, device, and storage medium.

Background

Shot (shot) is used as a basic semantic unit for forming a video, and the inspection of the boundary of the shot is the basis for video segmentation and video indexing and is also the key for semantic acquisition and content analysis in video retrieval.

At present, a deep learning method is increasingly used to solve the problem of shot boundary detection, but the operation of deep learning is time-consuming, and the forms of shot switching included in videos are many, especially with the popularity of short videos, various new transition special effects are layered endlessly, which results in high implementation cost of a detection model and insufficient detection performance.

Disclosure of Invention

Aspects of the present disclosure provide a shot boundary detection method, device, and storage medium to improve the efficiency and/or accuracy of shot boundary detection.

The embodiment of the application provides a shot boundary detection method, which comprises the following steps:

receiving a shot boundary detection request aiming at a target video;

extracting spatio-temporal slices from the target video;

determining a video frame of a suspected shot boundary in the target video as a candidate frame according to the space-time slice;

and selecting a target frame which meets the preset local feature matching requirement from the candidate frames as a shot boundary frame in the target video.

The embodiment of the application also provides a computing device, which comprises a memory and a processor;

the memory is to store one or more computer instructions;

the processor is coupled with the memory for executing the one or more computer instructions for:

receiving a shot boundary detection request aiming at a target video;

extracting spatio-temporal slices from the target video;

Embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the aforementioned shot boundary detection method.

In an embodiment of the present application, a shot boundary detection request for a target video may be received; extracting space-time slices from a target video; determining a video frame of a suspected shot boundary in a target video as a candidate frame according to the space-time slice; and selecting a target frame meeting the preset local feature matching requirement from the candidate frames as a shot boundary frame in the target video. Therefore, in the embodiment of the application, a mode of combining space-time slices and local feature matching is adopted, the video frames of suspected shot boundaries can be preliminarily screened out based on the space-time slices, and real shot boundary frames can be further selected out through the local feature matching, so that the calculation amount can be greatly reduced, and the shot boundary frames can be searched in multiple levels, and therefore, the efficiency, the recall rate and the accuracy of shot boundary detection can be effectively improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flowchart of a shot boundary detection method according to an exemplary embodiment of the present application;

FIG. 2 is a logic diagram of a shot boundary detection scheme according to an exemplary embodiment of the present application

Fig. 3 is a schematic flowchart of an implementation manner of shot boundary detection according to an exemplary embodiment of the present application;

fig. 4 is a logic diagram of another implementation manner of a shot boundary detection method according to an exemplary embodiment of the present application;

fig. 5 is a schematic structural diagram of a computing device according to another exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

Aiming at the technical problems of large calculation time consumption, low precision and the like in shot boundary detection in the conventional deep learning mode, some embodiments of the present application: a shot boundary detection request for a target video may be received; extracting space-time slices from a target video; determining a video frame of a suspected shot boundary in a target video as a candidate frame according to the space-time slice; and selecting a target frame meeting the preset local feature matching requirement from the candidate frames as a shot boundary frame in the target video. Therefore, in the embodiment of the application, a mode of combining space-time slices and local feature matching is adopted, the video frames of suspected shot boundaries can be preliminarily screened out based on the space-time slices, and real shot boundary frames can be further selected out through the local feature matching, so that the calculation amount can be greatly reduced, and the shot boundary frames can be searched in multiple levels, and therefore, the efficiency, the recall rate and the accuracy of shot boundary detection can be effectively improved.

Before introducing the technical solution proposed in the present application, several technical concepts are explained:

a shot, which is a segment composed of several video frames that are consecutive in time, is used to describe one basic unit of consecutive scenes in a video. The video usually comprises a plurality of shots, the shots have boundaries, and the shots can be switched with each other.

Shot boundary detection, which refers to a process of segmenting and extracting a shot from a video, can generally use shot boundary frames to represent the boundary of the shot.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of a shot boundary detection method according to an exemplary embodiment of the present application. Fig. 2 is a logic diagram of a shot boundary detection scheme according to an exemplary embodiment of the present application. The shot boundary detection method provided by the embodiment may be executed by a shot boundary detection apparatus, which may be implemented as software or as a combination of software and hardware, and may be integrally disposed in a computing device. As shown in fig. 1, the method includes:

step 100, receiving a shot boundary detection request aiming at a target video;

step 101, extracting space-time slices from a target video;

102, determining a video frame of a suspected shot boundary in a target video as a candidate frame according to the space-time slice;

and 103, selecting a target frame meeting the preset local feature matching requirement from the candidate frames as a shot boundary frame in the target video.

The shot boundary detection method provided by the embodiment can be applied to various scenes needing shot boundary detection, such as video segmentation, video indexing, video content analysis and the like, and can also be used for video teaching, excellent film source teaching, transition reflected linking art and the like.

In step 100, a shot detection request for a target video may be received. The target video may be a video in any scene that needs to be shot boundary detected. In this embodiment, the attributes such as the specification and format of the target video and the video content in the target video are not limited.

In step 101, spatio-temporal slices may be extracted from a target video. The space-time slice is a two-dimensional image formed by combining pixel lines extracted from fixed positions of continuous video frame sequences, and the pixel line refers to the combination of pixel points on one line in a video frame. In this embodiment, L rows (columns) of pixels may be extracted from the same position of at least one video frame included in the target video frame to form a two-dimensional image, so as to obtain a spatio-temporal slice. In this embodiment, the value of L may be set according to actual needs, for example, L may be set to 3. In this way, 3 rows (columns) of pixels can be extracted at a given location in a single video frame, thereby forming a two-dimensional image with pixels extracted on other video frames.

Based on this, in step 102, video frames of the suspected shot boundaries may be determined in the target video as candidate frames according to the spatio-temporal slices. The spatio-temporal slice may include slice features such as color and texture. During the research process, the inventor finds that, because video frames inside a single shot are generally continuous in time, space and image structure, the appearance on a spatio-temporal slice is generally that slice features have continuity. When a shot is switched, the appearance of the space-time slice is usually that obvious discontinuity appears in the slice characteristics.

For this reason, in this embodiment, in step 102, video frames that meet shot-cut characteristics may be searched according to continuity of slice features in the spatio-temporal slices, and such video frames may be determined as video frames of the suspected shot boundary. It should be understood that in step 102, shot boundary detection may be performed based on spatio-temporal slices, and the accuracy of the shot boundary detected in this manner is insufficient, so in the present embodiment, the result of the shot boundary detection based on spatio-temporal slices is described as a video frame of a suspected shot boundary.

In this embodiment, the video frames of the suspected shot boundaries determined in the target video according to the space-time slice may be used as candidate frames. Wherein the number of candidate frames may be plural. It should be noted that, in practical applications, the start station in the shot may be used as a shot boundary frame to represent the shot boundary, and therefore, in this embodiment, according to the video frame of the suspected shot boundary determined by the space-time slice in the target video, the video frame of the suspected shot start, that is, the suspected shot start frame, may be selected. Of course, this embodiment does not limit this, and the end frame of the shot may also be used to represent the shot boundary.

In this embodiment, at least one video frame in the target video may be traversed to respectively determine whether the at least one video frame may be a candidate frame. Since the analysis process for each video frame is similar, the process of determining the candidate frame will be described below by taking the first video frame in the target video as an example. It should be understood that the first video frame may be any one of the at least one video frame in the target video.

In this embodiment, on the spatio-temporal slice, if the slice feature corresponding to the first video frame meets the preset continuity requirement, it may be determined that the first video frame is a video frame of the suspected shot boundary, that is, a candidate frame.

In an alternative implementation manner, a difference between the slice features of the first video frame and the previous frame may be calculated, and if the difference is greater than a preset threshold, it may be determined that the slice feature corresponding to the first video frame meets a preset continuity requirement and serves as a candidate frame. Of course, the difference between the slice features of the first video frame and the next frame may also be calculated, and if the difference is greater than a preset threshold, the first video frame may be determined to be a candidate frame.

In this implementation manner, based on the difference degree of each video frame in the target video and the adjacent frame thereof on the slice feature, the difference degree greater than the preset threshold value is selected, so as to directly screen out the candidate frames. However, the inventor finds that the candidate frames screened by the implementation method are insufficient in precision in the research process, and the magnitude of the candidate frames is still large.

In another optional implementation manner, a first video frame and N preceding frames and M following frames corresponding to the first video frame may be obtained to form a first frame image sequence, where N and M are positive integers; calculating the difference degree between the slice characteristics corresponding to the first video frame and the adjacent previous frame and the difference degree between the slice characteristics corresponding to other adjacent frames in the first frame image sequence; carrying out mutation detection on the difference degree under the first frame image sequence; and if the difference degree mutation is detected on the first video frame, determining that the slice characteristics corresponding to the first video frame meet the preset continuity requirement and using the slice characteristics as candidate frames.

Fig. 3 is a flowchart illustrating an implementation manner of shot boundary detection according to an exemplary embodiment of the present application. Referring to fig. 3, in this implementation, a frame image queue may be maintained, and in practical applications, a first-in first-out queue may be used. When traversing to the first video frame, N previous frames and M next frames corresponding to the first video frame machine may be loaded into the frame image queue in sequence, that is, the first frame image sequence is loaded into the frame image queue. For example, N may be set to 6 and M may be set to 7, so that 14 video frames may be included in the first frame image sequence to be loaded into the frame image queue. When starting the analysis process of the next video frame of the first video frame, the video frames in the frame image queue may be replaced with the preceding and following frames of the next video frame.

Referring to fig. 3, in this implementation, a disparity queue may also be maintained, and in accordance with the above example, for a first video frame, 13 disparities may be obtained under a first frame image sequence, and the 13 disparities may be loaded into the disparity queue and may be detected for sudden change in the disparity queue.

In the process of carrying out mutation detection on the difference degree under the first frame image, a difference degree change curve can be constructed according to the difference degree under the first frame image sequence; and if the difference degree corresponding to the first video frame is located in a peak area in the change curve of the difference degree, determining that the sudden change of the difference degree is detected on the first video frame. The peak region includes not only the peak position but also the region near the peak. For example, if in the specified coordinate system, the horizontal axis represents a video frame and the vertical axis represents a disparity, then from a disparity change curve, if the first video frame is located right at a peak, it may be determined that the first video frame is a candidate frame; of course, if the frame distance between the first video and the video frame located at the peak is smaller than the preset distance threshold, for example, the frame distance is 2 and smaller than the preset distance 5, it may also be determined that the first video frame is a candidate frame.

Accordingly, in this implementation, the first video frame and the frame image sequence formed by the preceding frame and the following frame thereof may be used as an analysis unit, and the change state of the degree of difference of each adjacent frame in the slice feature may be analyzed in the frame image sequence, and if a sudden change is detected on the first video frame, the first video frame may be determined as a candidate frame. In this way, a wider reference frame can be used to detect whether the first video frame has the characteristics of the shot boundary, and therefore, compared with the former implementation mode in which the absolute value of the difference between the first video frame and its adjacent frame is used to directly analyze, higher analysis accuracy can be obtained, which can effectively reduce the number of candidate frames and improve the accuracy of the candidate frames.

On the basis of determining the candidate frames, referring to fig. 1-3, in step 103, a target frame meeting a preset local feature matching requirement may be selected from the candidate frames as a shot boundary frame in the target video.

In this embodiment, the local Feature may use an orb (ordered Fast and Rotated brief) Feature, an speeded up robust Feature, or a scale Invariant Feature transform sift (scale Invariant Feature transform) Feature, and the like, and the type of the local Feature is not limited in this embodiment.

The following will exemplify a process of local feature matching by taking the ORB feature as an example. Since the processing procedure for each candidate frame is similar, the procedure for determining local feature matching will be described below by taking the first candidate frame as an example. It should be understood that the first candidate frame may be any one of the candidate frames.

In one implementation, a frame preceding the first candidate frame may be determined in the target video; respectively extracting local features in the first candidate frame and the previous frame; and if the local features of the first candidate frame and the previous frame are matched, determining the first candidate frame as the target frame. By carrying out local feature matching on the candidate frames, secondary judgment on the candidate frames can be realized to determine whether the candidate frames are shot boundary frames, so that the recall rate and the accuracy of shot boundary detection can be effectively improved.

In step 103, the frame image queue mentioned above may be continuously used, so that the frame before the first candidate frame can be quickly determined in the frame image queue.

On this basis, ORB local features may be extracted from the first candidate frame and its previous frame. In practical applications, feature points in the first candidate frame and the frame before the first candidate frame may be determined first, for example, the feature points may be detected by using a FAST feature point detection method, and then P feature points with the maximum Harris corner response value may be selected from the FAST feature points by using a Harris corner measurement method. Then, the BRIEF can be used as a feature description method to generate a BRIEF descriptor for each feature point, for example, the BRIEF descriptor can be a binary code string with length n.

In this implementation, a feature matching scheme based on the grid motion statistics GMS may be employed to determine whether the local features between the first candidate frame and its previous frame match. In practical applications, the initial comparison can be performed based on the determined BRIEF descriptors of the feature points. For example, feature points in the first candidate frame may be traversed to find matching points in their previous frames based on the BRIEF descriptors, respectively. The number of the matching pairs determined by the preliminary comparison is more. Then, the first candidate frame may be divided into a plurality of grids, and each matching pair is traversed, taking the first matching pair as an example, the number of matching pairs existing in the grid where the first matching pair is located may be determined, and if the number is sufficient, the first matching pair may be determined to be correct, so as to find out a correct matching pair in the preliminary comparison result. If the number of correct matching pairs is sufficiently large, it may be determined that the first candidate frame matches the local features of its previous frame.

It should be noted that the above ORB local feature matching scheme is only exemplary, and the present embodiment is not limited thereto. In addition, the specific matching process for other local features is not described in detail herein.

In this embodiment, the annotation information may also be configured for the shot boundary frame determined in the target video, so as to prompt the user about the shot boundary frame. In practical application, the corresponding annotation information can be displayed under the condition that the target video is played to the shot boundary frame, so that a user can find the shot boundary frame in time, and assistance can be provided for work such as video switching, video clipping and the like.

In this embodiment, different shot boundary detection schemes may also be determined according to different application requirements, that is, different processing parameters are configured for the space-time slice mode and the local feature matching mode, so as to obtain shot boundary frames meeting different application requirements.

Therefore, in the embodiment, on one hand, the shot boundary detection efficiency can be effectively improved because only the space-time slice in the target video is analyzed and the whole frame image is not required to be analyzed; only the candidate frames in the target video are subjected to local feature analysis, and the order of magnitude of the local feature analysis can be reduced, so that the shot boundary detection efficiency is improved. On the other hand, the secondary judgment is carried out on the candidate frames based on the local features, so that the shot boundary frame in the target video can be more accurately determined, and therefore, the recall rate and the accuracy of the shot boundary detection can be effectively improved.

Fig. 4 is a logic diagram of another implementation manner of a shot boundary detection method according to an exemplary embodiment of the present application. Referring to fig. 4, in the above or below embodiment, if K consecutive video frames before the first video frame are all non-shot boundary frames, the difference between the first video frame and the previous shot boundary frame is calculated, where K is a positive integer; and if the difference between the first video frame and the last shot boundary frame is greater than a preset difference threshold, determining that the first video frame is the boundary frame of the overlapped/scratched shot.

In this embodiment, the shot boundary frame refers to a boundary frame corresponding to a normal shot, for example, a cut lens. The present embodiment may also support boundary detection for special shots, such as the above-mentioned superimposed/scratched shots and the fade-in and fade-out shots mentioned in the following embodiments.

Referring to fig. 4, in this embodiment, K may be set to 15, so that if none of the consecutive 15 frames before the first video frame is the shot boundary frame, the first video frame may be compared with the previous shot boundary frame, and if the difference between the first video frame and the previous shot boundary frame is large enough, the first video frame may be determined to be the boundary frame of the overlap/wipe shot.

In addition, referring to fig. 4, if the number of previous frames of the first video frame satisfies N and/or the number of subsequent frames satisfies M, it may be determined whether there is a sudden change on the first video frame according to the sudden change detection scheme provided in the foregoing embodiment; if the number of previous frames of the first video frame is less than N and/or the number of subsequent frames is less than M, then it can be determined whether there is a sudden change in the first video frame by referring to the last shot boundary frame, as proposed in the present implementation.

For example, if N is set to 6 and M is set to 7, if the first video frame is the 3 rd frame in the target video, and the number of previous frames of the first video frame is less than 7, the frame image queue is not full, which may cause the abrupt change detection scheme in the foregoing embodiment to be unable to implement or result in inaccurate due to insufficient data. In this case, if the last shot boundary frame is the 1 st frame in the target video, the difference between the 3 rd frame and the 1 st frame in the target video may be calculated, and if the difference is large enough, it may be determined that there is a sudden change in the slice feature on the 3 rd frame.

Accordingly, the detection of the overlay/wipe shot can be achieved.

Referring to fig. 4, in the present embodiment, a spatiotemporal slice segment corresponding to the first frame image sequence may also be determined; calculating the variance between slice pixels of each video frame under the space-time slice segment; and if the variance is smaller than a preset variance threshold value, determining that the first video frame is a boundary frame of the fade-in fade-out shot.

Accordingly, detection of a fade-in fade-out shot can be achieved.

Continuing to refer to fig. 4, in the process of performing local feature matching on the first candidate frame, if the local feature matching between the first candidate frame and the previous frame is unsuccessful, extracting histograms of the first candidate frame and the previous frame respectively; and if the histograms of the first candidate frame and the previous frame are matched, determining the first candidate frame as a shot boundary frame.

Accordingly, in this embodiment, boundary detection of various types of shots can be realized, and further determination can be performed based on the histogram after secondary determination is performed based on the local features, so as to avoid missing a shot boundary frame, which can effectively expand an applicable scene of shot boundary detection and improve recall rate and accuracy of shot boundary detection.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 101 to 103 may be device a; for another example, the execution subject of

steps

101 and 102 may be device a, and the execution subject of step 103 may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 101, 102, etc., are merely used for distinguishing different operations, and the sequence numbers do not represent any execution order per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used to distinguish different video frames, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 5 is a schematic structural diagram of a computing device according to another exemplary embodiment of the present application. As shown in fig. 5, the computing device includes: a memory 50 and a processor 51.

Memory 50 is used to store computer programs and may be configured to store other various data to support operations on the computing platform. Examples of such data include instructions, messages, pictures, videos, etc. for any application or method operating on the computing platform.

The memory 50 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A processor 51, coupled to the memory 50, for executing the computer program in the memory 50 for:

receiving a shot boundary detection request aiming at a target video;

extracting space-time slices from a target video;

determining a video frame of a suspected shot boundary in a target video as a candidate frame according to the space-time slice;

and selecting a target frame meeting the preset local feature matching requirement from the candidate frames as a shot boundary frame in the target video.

In an alternative embodiment, the processor 51, when determining video frames of suspected shot boundaries in the target video from the spatio-temporal slices, is configured to:

on the space-time slice, if slice characteristics corresponding to a first video frame meet a preset continuity requirement, determining that the first video frame is a video frame of a suspected shot boundary;

the first video frame is any one of at least one video frame contained in the target video.

In an alternative embodiment, the processor 51 is further configured to:

acquiring a first video frame and N preceding frames and M following frames corresponding to the first video frame to form a first frame image sequence, wherein N and M are positive integers;

calculating the difference degree between the slice characteristics corresponding to the first video frame and the adjacent previous frame and the difference degree between the slice characteristics corresponding to other adjacent frames in the first frame image sequence;

carrying out mutation detection on the difference degree under the first frame image sequence;

and if the sudden change of the difference degree is detected on the first video frame, determining that the slice characteristics corresponding to the first video frame meet the preset continuity requirement.

In an alternative embodiment, the processor 51, when performing abrupt change detection on the difference degree under the first frame image sequence, is configured to:

constructing a difference degree change curve according to the difference degree under the first frame image sequence;

and if the difference degree corresponding to the first video frame is located in a peak area in the change curve of the difference degree, determining that the sudden change of the difference degree is detected on the first video frame.

In an alternative embodiment, the processor 51 is further configured to:

if K continuous video frames before the first video frame are all non-shot boundary frames, calculating the difference degree between the first video frame and the last shot boundary frame, wherein K is a positive integer;

and if the difference between the first video frame and the last shot boundary frame is greater than a preset difference threshold, determining that the first video frame is the boundary frame of the overlapped/scratched shot.

In an alternative embodiment, the processor 51 is further configured to:

determining a space-time slice segment corresponding to a first frame image sequence;

calculating the variance between slice pixels of each video frame under the space-time slice segment;

and if the variance is smaller than a preset variance threshold value, determining that the first video frame is a boundary frame of the fade-in fade-out shot.

In an alternative embodiment, the processor 51, when selecting a target frame meeting a preset local feature matching requirement from the candidate frames, is configured to:

determining a frame previous to the first candidate frame in the target video;

respectively extracting local features in the first candidate frame and the previous frame;

if the local features of the first candidate frame and the previous frame are matched, determining the first candidate frame as a target frame;

wherein the first candidate frame is any one of the candidate frames.

In an alternative embodiment, the processor 51 is further configured to:

and determining whether the local features between the first candidate frame and the previous frame are matched or not by adopting a feature matching scheme based on grid motion statistics.

In an alternative embodiment, the processor 51 is further configured to:

if the local feature matching between the first candidate frame and the previous frame is unsuccessful, extracting histograms of the first candidate frame and the previous frame respectively;

and if the histograms of the first candidate frame and the previous frame are matched, determining the first candidate frame as the target frame.

In an alternative embodiment, the processor 51 is further configured to:

and configuring marking information for the shot boundary frame so as to prompt the shot boundary frame to a user.

Further, as shown in fig. 5, the computing device further includes: communication components 52, power components 53, and the like. Only some of the components are schematically shown in fig. 5, and the computing device is not meant to include only the components shown in fig. 5.

It should be noted that, for the technical details in the embodiments of the computing device, reference may be made to the related description in the foregoing method embodiments, and for the sake of brevity, detailed description is not provided herein, but this should not cause a loss of scope of the present application.

Accordingly, the present application further provides a computer-readable storage medium storing a computer program, where the computer program can implement the steps that can be executed by a computing device in the foregoing method embodiments when executed.

The communication component in fig. 5 is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices. The device where the communication component is located can access a wireless network based on a communication standard, such as a WiFi, a 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

The power supply assembly of fig. 5 described above provides power to the various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A shot boundary detection method is characterized by comprising the following steps:

receiving a shot boundary detection request aiming at a target video;

extracting spatio-temporal slices from the target video;

2. The method of claim 1, wherein determining video frames of a suspected shot boundary in the target video from the spatio-temporal slices comprises:

on the time-space slice, if slice characteristics corresponding to a first video frame meet a preset continuity requirement, determining that the first video frame is a video frame of a suspected shot boundary;

3. The method of claim 2, further comprising:

acquiring the first video frame and N preceding frames and M following frames corresponding to the first video frame to form a first frame image sequence, wherein N and M are positive integers;

4. The method according to claim 3, wherein the detecting the abrupt change of the difference degree under the first frame image sequence comprises:

and if the difference degree corresponding to the first video frame is located in a peak area in the difference degree change curve, determining that the difference degree mutation is detected on the first video frame.

5. The method of claim 3, further comprising:

if K consecutive video frames before the first video frame are all non-shot boundary frames, calculating the difference degree between the first video frame and the last shot boundary frame, wherein K is a positive integer;

6. The method of claim 3, further comprising:

determining a space-time slice segment corresponding to the first frame image sequence;

calculating the variance between the slice pixels of each video frame under the space-time slice segment;

7. The method according to claim 1, wherein the selecting a target frame meeting a preset local feature matching requirement from the candidate frames comprises:

determining a frame prior to a first candidate frame in the target video;

respectively extracting local features in the first candidate frame and a previous frame thereof;

wherein the first candidate frame is any one of the candidate frames.

8. The method of claim 7, further comprising:

9. The method of claim 7, further comprising:

and if the histogram between the first candidate frame and the previous frame is matched, determining that the first candidate frame is a shot boundary frame.

10. The method of claim 7, wherein the local features comprise a fast Oriented Rotation (ORB) feature, an accelerated robust SURF feature, or a Scale Invariant Feature Transform (SIFT) feature.

11. The method of claim 1, further comprising:

and configuring marking information for the shot boundary frame to prompt the shot boundary frame to a user.

12. A computing device comprising a memory and a processor;

the memory is to store one or more computer instructions;

receiving a shot boundary detection request aiming at a target video;

extracting spatio-temporal slices from the target video;

13. The apparatus of claim 12, wherein the processor, when determining video frames of a suspected shot boundary in the target video from the spatio-temporal slices, is configured to:

14. The device of claim 13, wherein the processor is further configured to:

15. The apparatus of claim 14, wherein the processor, when performing abrupt change detection on the degree of difference under the first sequence of frame images, is configured to:

16. The device of claim 14, wherein the processor is further configured to:

17. The device of claim 14, wherein the processor is further configured to:

18. The apparatus of claim 12, wherein the processor, when selecting the target frame meeting a preset local feature matching requirement from the candidate frames, is configured to:

determining a frame previous to a first candidate frame in the target video;

wherein the first candidate frame is any one of the candidate frames.

19. The device of claim 18, wherein the processor is further configured to:

20. The device of claim 18, wherein the processor is further configured to:

21. A computer-readable storage medium storing computer instructions, which when executed by one or more processors, cause the one or more processors to perform the shot boundary detection method of any one of claims 1-11.