CN111601162A - Video segmentation method and device and computer storage medium - Google Patents

Video segmentation method and device and computer storage medium Download PDF

Info

Publication number
CN111601162A
CN111601162A CN202010514070.4A CN202010514070A CN111601162A CN 111601162 A CN111601162 A CN 111601162A CN 202010514070 A CN202010514070 A CN 202010514070A CN 111601162 A CN111601162 A CN 111601162A
Authority
CN
China
Prior art keywords
video
image frame
timestamp
audio
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010514070.4A
Other languages
Chinese (zh)
Other versions
CN111601162B (en
Inventor
邓玉龙
向宇
丁文彪
刘子韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN202010514070.4A priority Critical patent/CN111601162B/en
Publication of CN111601162A publication Critical patent/CN111601162A/en
Application granted granted Critical
Publication of CN111601162B publication Critical patent/CN111601162B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Abstract

A video segmentation method, device and computer storage medium, mainly include extracting the image characteristic of the image frame from the video data of the goal video, and confirm the video origin-destination time stamp of the video fragment according to the image characteristic; extracting an audio endpoint from audio data of a target video, and determining an audio origin-destination timestamp of an audio fragment according to the audio endpoint; selectively updating the video origin-destination timestamp of the video segment according to the video origin-destination timestamp and the audio origin-destination timestamp; and slicing the target video according to the selectively updated video origin-destination timestamp to generate a target sub-video. Therefore, the invention can improve the integrity of the video segmentation content.

Description

Video segmentation method and device and computer storage medium
Technical Field
The embodiment of the invention relates to a video processing technology, in particular to a video segmentation method, a video segmentation device and a computer storage medium.
Background
Video slicing refers to slicing an original video according to a requirement to extract a required small portion of video clips from the original video.
At present, the most effective video segmentation method is to watch videos manually and segment the videos manually according to actual segmentation indexes in the watching process, however, the video segmentation method needs to consume a large amount of labor cost and time cost to execute, and the operation efficiency is low.
In view of this, how to improve the video segmentation quality and improve the video segmentation work efficiency is a technical subject to be solved by the present application.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method, an apparatus, and a computer storage medium for video segmentation, which can ensure the integrity of video segmentation content, improve the video segmentation efficiency, and flexibly adjust the video segmentation precision.
According to a first aspect of the present invention, there is provided a video slicing method, comprising: extracting a plurality of image characteristics corresponding to a plurality of image frames from video data of a target video, and determining a video origin-destination timestamp of at least one video segment according to each image characteristic; extracting a plurality of audio endpoints from the audio data of the target video, and determining an audio origin-destination timestamp of at least one audio segment according to each audio endpoint; selectively updating the video origin-destination timestamp of the video segment according to the audio origin-destination timestamp and the video origin-destination timestamp; and segmenting the target video according to the selectively updated video origin-destination timestamp to generate at least one target sub-video.
According to a second aspect of the present invention, there is provided a computer storage medium having stored therein instructions for performing the steps of the video slicing method of the first aspect.
According to a third aspect of the present invention, there is provided a video slicing apparatus comprising: the video segmentation module is used for extracting a plurality of image characteristics corresponding to a plurality of image frames from video data of a target video and determining a video origin-destination timestamp of at least one video segment according to each image characteristic; the audio segmentation module is used for extracting a plurality of audio endpoints from the audio data of the target video and determining an audio origin-destination timestamp of at least one audio fragment according to each audio endpoint; the segmentation updating module is used for selectively updating the video origin-destination timestamp of the video fragment according to the audio origin-destination timestamp and the video origin-destination timestamp to obtain the selectively updated video origin-destination timestamp of the video fragment; and the video segmentation module is used for segmenting the target video according to the selectively updated video origin-destination timestamp of the video segment to generate at least one target sub-video.
As can be seen from the foregoing technical solutions, the video slicing method, the video slicing device, and the computer storage medium provided in the embodiments of the present invention perform video slicing by using video data and audio data of a target video in combination, which not only can ensure the integrity of video slicing content, but also can reduce the operation cost of video slicing.
Moreover, the video segmentation method, the video segmentation device and the computer storage medium provided by the embodiment of the invention can flexibly configure the video segmentation precision so as to meet various customized video segmentation requirements.
Drawings
Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
FIG. 1 is a flow chart illustrating a video slicing method according to a first embodiment of the present invention;
fig. 2A and 2B are schematic flow charts illustrating a video slicing method according to a second embodiment of the present invention;
FIG. 3 is a flow chart of a video slicing method according to a third embodiment of the present invention;
FIGS. 4A and 4B illustrate schematic diagrams of different embodiments of determining a previous image frame, a current image frame, and a subsequent image frame using a sliding window, respectively, in accordance with the present invention;
FIG. 5 is a flowchart illustrating a video slicing method according to a fourth embodiment of the present invention;
fig. 6 is a schematic diagram illustrating an architecture of a video slicing apparatus according to a sixth embodiment of the present invention;
fig. 7 shows a schematic architecture diagram of a video slicing apparatus according to a seventh embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.
As described in the background section, most of the existing video segmentation operations are completed based on manual operation, and the problem of low operation efficiency exists.
At present, some video automatic segmentation technologies without manual intervention are proposed in the industry to improve the video segmentation processing efficiency. However, most of these video automatic segmentation technologies only use image information of video data to segment video, so that the integrity of the video segmentation content cannot be guaranteed, and the problem of video and voice asynchronism is likely to occur. In addition, some video segmentation technologies can only segment videos with subtitles, so that the application range of the video segmentation technologies is greatly limited. Moreover, most of the video segmentation techniques provided in the industry at present rely on deep learning models to implement, however, training of the deep learning models consumes a lot of time, which increases the processing cost of video segmentation.
In view of this, embodiments of the present invention provide a video segmentation method, a video segmentation device, and a computer storage medium, which can solve various problems in the video segmentation technology. The following will further describe specific implementations of embodiments of the present invention with reference to the drawings of the embodiments of the present invention.
First embodiment
Fig. 1 shows a schematic flow chart of a video slicing method according to a first embodiment of the present invention. As shown in fig. 1, the video segmentation method of the present embodiment mainly includes the following steps:
in step S1, image features of a plurality of image frames are extracted from the video data of the target video, and video origin-destination timestamps of at least one video segment are determined according to the image features.
Optionally, the target video is any video data including video data and audio data.
Alternatively, a plurality of image frames may be extracted from video data of the target video according to a preset time interval.
Specifically, the video data of the target video may be converted into an image frame sequence, a plurality of image frames may be extracted from the image frame sequence according to a preset time interval, and then image features in each image frame may be extracted.
Alternatively, the predetermined time interval may be set to 1 second, that is, one image frame is extracted from the image frame sequence every 1 second.
In this embodiment, an image frame list may be generated from the extracted image frames: { x1,x2,...,xnIn which x1Representing the decimated 1 st frame image frame, x2Representing the decimated 2 nd frame image frame, xnRepresenting the decimated nth frame image frame, and so on.
In addition, an image feature list can be generated according to the image features corresponding to the image frames: { a1,a2,...,anIn which a1Image features representing the image frame of frame 1, a2Image features representing the image frame of frame 1, anImage features of the 1 st frame image frame are represented, and so on.
In this embodiment, the video origin timestamp of the video segment comprises the video originA start timestamp and a video end timestamp, and the video start-end timestamp of each video segment determined according to each image feature can be expressed as: { (o)t1,ot2),(ot2,ot3),…,(oti-1,oti),(oti,oti+1),…}。
Optionally, a Residual network model (ResNets) may be used to extract corresponding image features from each image frame.
In this embodiment, the residual network model is a Resnet-101 model, which is used to extract Resnet-101 features in each image frame. It should be noted that the residual network model is not limited to the Resnet-101 model, and may also be adjusted to the Resnet-50 model or the Resnet-152 according to actual requirements, so as to extract the Resnet-50 feature or the Resnet-152 feature in the image frame, which is not limited in this embodiment.
Optionally, the similarity between the image frames may be calculated according to the image features, and the video start timestamp and the video end timestamp of at least one video segment may be determined according to the similarity between the image frames.
Step S2, a plurality of audio endpoints are extracted from the audio data of the target video, and an audio origin timestamp of at least one audio segment is determined according to each audio endpoint.
In this embodiment, the audio origin-time stamp of the audio segment includes an audio start time stamp and an audio end time stamp of the audio segment, and the audio origin-time stamp of each audio segment can be expressed as:
{(v1,v2),…,(v2j-1,v2j),…}。
optionally, endpoint detection may be performed on the audio data of the target video according to the preset audio feature, that is, an appearance time point and a disappearance time point of the preset audio feature in the audio data are detected, and an audio start time stamp and an audio end time stamp of the audio segment including the preset audio feature are determined according to the detection, for example, the appearance time point of the preset audio feature in the audio data is used as the audio start time stamp of the audio segment including the preset audio feature, and the disappearance time point of the preset audio feature in the audio data is used as the audio end time stamp of the audio segment including the preset audio feature.
Furthermore, any existing audio endpoint detection software may be utilized to perform endpoint detection on the audio data, and the invention is not limited in this respect.
Optionally, the preset audio feature is a speech feature. However, the preset audio feature may be set to other sound features, such as music, animal chirping sound, etc.
And step S3, selectively updating the video source-destination timestamp of the video segment according to the audio source-destination timestamp and the video source-destination timestamp, and obtaining the selectively updated video source-destination timestamp of the video segment.
Optionally, the segmentation update condition may be determined according to the video origin-destination timestamp of the video segment, and according to the segmentation update condition, when it is determined that the audio origin-destination timestamp satisfies the segmentation update condition, the video origin-destination timestamp of the video segment is updated by using the audio origin-destination timestamp satisfying the segmentation update condition, otherwise, the video origin-destination timestamp of the video segment is not updated.
And step S4, splitting the target video according to the selectively updated video origin-destination timestamp of the video segment, and generating at least one target sub-video.
Specifically, each video value timestamp { (o) selectively updated in step S3 may be updatedt1,ot2),…,(o2ti-1,o2ti) … as the final video to be sliced, whereby the target video is sliced to generate target sub-videos corresponding to the video origin-destination timestamps.
It should be noted that, in the embodiment of the present invention, the execution sequence of the steps S1 and S2 is not particularly limited, and the steps S1 and S2 may be executed first, or the steps S2 and S1 may be executed first, or the steps S1 and S2 may be executed at the same time, which is not limited by the present invention.
In another embodiment, after the step S4 is completed, the video slicing method may further include the following steps:
and determining the target sub-video meeting the preset screening conditions from the target sub-videos according to the preset screening conditions.
Optionally, the preset filtering condition may be a duration limiting condition of the target sub-video. For example, the target sub-video with a duration of more than 1 hour or less than 2 minutes can be discarded, and the video quality of the generated target sub-video can be improved by the video screening mechanism.
In summary, the embodiment of the present invention updates the video segmentation points by using the video data and the audio data of the target video in combination, so as to better ensure the integrity of the video segmentation content (i.e. the integrity of each target sub-video content), so as to avoid an abnormal situation that the video is easily cut off after being improperly segmented, thereby achieving the technical effect of reasonably segmenting the video.
Second embodiment
Fig. 2A is a flowchart illustrating a video slicing method according to a second embodiment of the present invention. The embodiment of the present invention mainly shows a specific processing flow for determining a video origin-destination timestamp of at least one video segment according to each image feature (i.e. step S1), which mainly includes the following processing steps:
step S21, calculating the similarity between each image frame and the image frame adjacent to the image frame according to the image features, and obtaining the adjacent similarity value corresponding to each image frame.
Specifically, according to the image frame vector obtained in step S1: { x1,x2,...,xnCalculating the similarity between the image characteristics of the current image frame and the image characteristics of the previous image frame by frame starting from the second frame, e.g. calculating x frame by frame2And x1Similarity between, x3And x2Similarity between, xnAnd xn-1And obtaining the similarity between the adjacent image frames to obtain the similarity value between the adjacent image frames.
In this embodiment, an adjacent similarity value vector list may be obtained according to each adjacent similarity value corresponding to each image frame: { s1,...,si,...,snIn which S is1For the representation of X2And X1The similarity between the 2 nd frame image frame and the 1 st frame image frame (i.e. the similarity between the 2 nd frame image frame and the 1 st frame image frame), and will be1Adjacent similarity values marked as frame 2 image frames, likewise SiFor the representation of Xi+1And XiThe similarity between the (i + 1) th frame image frame and the ith frame image frame, and the similarity between the (i + 1) th frame image frame and the ith frame image frame is calculatediLabeled as the neighboring similarity value of the image frame of the i +1 th frame.
Alternatively, the euclidean distance may be used to calculate the similarity between two image frames, which is expressed as:
Figure BDA0002529498280000061
wherein x isiRepresenting the ith frame image frame, xjRepresenting the j-th frame image frame, dist (x)i,xj) For representing the similarity between the image frame of the ith frame and the image frame of the jth frame, and k represents the feature dimension of the image. Wherein dist (x) when j ═ i-1 or j ═ i +1i,xj) Representing the neighboring similarity values between two neighboring image frames.
K is an image feature dimension, generally, the value range of K is not greater than 3000 dimensions, preferably, K may be set to 1000 dimensions, but not limited thereto, and K may also be set to greater than 3000 dimensions, depending on the actual requirements and the actual configuration of the relevant software and hardware.
It should be noted that the similarity calculation method between two image frames is not limited to the above euclidean distance calculation formula, and may be performed in other manners, for example, by using a cosine similarity calculation formula.
In the present embodiment, the similarity value obtained by using the euclidean distance is between 0 and 1, wherein the closer the similarity value is to 0, the higher the similarity between the two image frames (image features) is, and conversely, the closer the similarity value is to 1, the lower the similarity between the two image frames (image features) is.
In step S22, an adjacent similarity value is extracted.
In this embodiment, the vector list s can be sorted sequentially from1,...,si,...,snExtracting an adjacent similarity value.
Step S23, comparing the currently extracted neighboring similarity value with a first preset threshold, if the neighboring similarity value is greater than the first preset threshold, performing step S241 and step S242, and if the neighboring similarity value is not greater than the first preset threshold, performing step S231.
In this embodiment, the value range of the first predetermined threshold is between 0 and 1, and preferably, the first predetermined threshold may be set to 0.55.
Step S231, determining whether all the neighboring similarity values have been compared, if not, returning to step S22 to obtain the vector list { S }1,...,si,...,snThe next adjacent similarity value is obtained and analyzed, if yes, step S291 is executed.
Step S241, using the image frame corresponding to the current adjacent similarity value as a current image frame, determining at least one previous image frame from the image frames according to the image timestamp of the current image frame, and then executing step S251.
In this embodiment, at least one previous image frame is a plurality of extracted image frames, and the image time stamp of the image frame is smaller than the consecutive image frames of the first comparison frame number of the image time stamp of the current image frame.
In this embodiment, the first comparison frame number is at least 1 frame, preferably, the first comparison frame number is not less than 6 frames and not more than 20 frames, and preferably, the first comparison frame number may be set to 8 frames.
For example, when judging siIf the number of the first comparison frames is set to 1 frame, determining the ith frame image frame as a previous frame; and if the first comparison frame number is set to be 8 frames, determining the ith frame, the (i-1) th frame, the (i-2) th frame, the (i-3) th frame, the (i-4) th frame, the (i-5) th frame, the (i-6) th frame and the (i-7) th frame as the previous frame.
In step S251, the similarity between the current image frame and at least one previous image frame is calculated to obtain at least one previous similarity value, and then step S26 is executed.
For example, if the first comparison frame number is determined to be 1 frame, then the current image frame (i.e., x) is used1+1) With the preceding image frame (i.e. x)i) Calculating the similarity between the current image frame and the previous image frame and obtaining 1 previous similarity value wsi
For another example, if the first comparison frame number is determined to be 8 frames, the current image frame (i.e. x) is calculated separately1+1) With each preceding image frame (i.e. x)i-7,xi-6,xi-5,xi-4,xi-3,xi-2,xi-1,xi) Thereby obtaining corresponding 8 previous similarity values, i.e. similarity between
{wsi-7,wsi-6,wsi-5,wsi-4,wsi-3,wsi-2,wsi-1,wsi}。
Step S242, determining the image frame corresponding to the current adjacent similarity value as the current image frame, determining at least one subsequent image frame from the image frames according to the image timestamp of the current image frame, and then performing step S252.
In this embodiment, the at least one subsequent image frame is a continuous image frame of the extracted plurality of image frames, and the image time stamp of the image frame is greater than the second comparison frame number of the image time stamp of the current image frame.
In this embodiment, the second comparison frame number is at least 1 frame, preferably, the second comparison frame number is not less than 6 frames and not more than 20 frames, and preferably, the second comparison frame number can be set to 8 frames.
In this embodiment, when the set frame number of the first comparison frame number and the set frame number of the second comparison frame number are smaller, the video segmentation precision is lower; on the contrary, when the set frame number of the first comparison frame number and the set frame number of the second comparison frame number are larger, the video segmentation precision is higher, so that a user can adjust the set frame number of the first comparison frame number and the set frame number of the second comparison frame number according to the actual requirement of the video segmentation precision so as to meet the personalized video segmentation requirement.
Preferably, the second comparison frame number and the first comparison frame number are set to be the same, so as to calculate the previous similarity value and the subsequent difference value.
For example, when judging siIf the number of the second comparison frames is set to be 1 frame, determining the (i + 2) th frame image frame as a subsequent image frame; if the second comparison frame number is set to be 8 frames, determining the (i + 2) th frame, the (i + 3) th frame, the (i + 4) th frame, the (i + 6) th frame, the (i + 7) th frame, the (i + 8) th frame and the (i + 9) th frame as subsequent frame frames.
In step S252, a difference between the current image frame and at least one subsequent image frame is calculated to obtain at least one subsequent difference value.
In this embodiment, the formula 1-dist (x) can be utilizedi,xj) And calculating the difference degree between the current image frame and the subsequent image frame.
In this embodiment, if the subsequent image frame is determined to be 1 frame, the current image frame (i.e. x) is determined1+1) With subsequent image frames (i.e. x)i+2) Calculating the difference between the current image frame and the subsequent image frame, and obtaining 1 subsequent difference value 1-wsi+2(ii) a If the subsequent image frame is determined to be 8 frames, the current image frame (i.e. x) is calculated respectivelyi+1) With each subsequent image frame (i.e. x)i+2,xi+3,xi+4,xi+5,xi+6,xi+7,xi+8,xi+9) The difference between them, and thus the corresponding 8 subsequent difference values, can be expressed as:
{1-wsi+2,1-wsi+3,1-wsi+4,1-wsi+5,1-wsi+6,1-wsi+7,1-wsi+8,1-wsi+9and then step S26 is executed.
In step S26, the total number of the previous similarity values and the subsequent difference values greater than the first preset threshold is counted, and then step S27 is performed.
Step S27, analyzing whether the total number of the previous similarity values and the subsequent difference values larger than the first preset threshold is larger than a second preset threshold, if the analysis result is that the total number is larger than the second preset threshold, performing step S28, otherwise, performing step S231.
Optionally, the second preset threshold is determined based on the first comparison frame number and the second comparison frame number, wherein when the cardinality of the first comparison frame number and the second comparison frame number is larger, the value of the second preset threshold is also larger.
In this embodiment, the second preset threshold is (the first comparison frame number + the second comparison frame number) × F, where F is an adjustment parameter, and when the set value of F is larger, the video segmentation accuracy is higher, optionally, the value range of F is between 0.65 and 0.8, and preferably, the value of F is 0.7.
Step S28, determining the image timestamp of the current image frame as the video segmentation time point, and continuing to execute step S231 until all the adjacent similarity values are compared.
In this embodiment, the closer the calculated similarity value is to 0, the higher the similarity is, so that if the similarity between the current image frame and the previous image frame is lower and the similarity between the current image frame and the subsequent image frame is higher, the current image frame is the video segmentation time point meeting the video segmentation requirement.
As shown in fig. 2B, in another embodiment of the present application, after all the comparison of the neighboring similarity values is completed (i.e., yes in step S231 of fig. 2A), step S291 is continuously performed to sort the images according to the timestamps corresponding to the video segmentation time points.
Specifically, the video segmentation time points determined in step S28 in fig. 2A may be output, and the image timestamps corresponding to the video segmentation time points are sorted according to the time sequence, so as to obtain a video segmentation time point set: { o1,...,oi,...,on}。
In step S292, two adjacent video slicing time points are sequentially combined to serve as a video start time stamp and a video end time stamp of a video segment, and a video origin-destination time stamp of at least one video segment is obtained.
Optionally, each video may be segmented according to a preset time interval of image frame extractionThe time points respectively reach the time axis of the target video, and the { o } can be obtainedt1,ot2,ot3,...,oti,…,otnAnd sequentially combining two adjacent image timestamps based on a time axis to obtain each video starting timestamp and each video ending timestamp corresponding to each video clip, wherein the video starting timestamps and the video ending timestamps are represented as follows: { (o)t1,ot2),(ot2,ot3),…,(oti-1,oti),(oti,oti+1),…}。
In summary, the video segmentation technology provided in the embodiment of the present invention can meet different video segmentation accuracy requirements by flexibly adjusting the first comparison frame number of the previous image frame and the second comparison frame number of the subsequent image frame.
Moreover, the current image frame is compared with a plurality of previous image frames and a plurality of subsequent image frames respectively, rather than the current image frame is compared with only one previous image frame and one subsequent image frame, so that the fault tolerance of the system can be improved, and the running stability of the system can be improved.
Third embodiment
Fig. 3 is a flowchart illustrating a video slicing method according to a third embodiment of the present invention. As shown in fig. 3, in the present embodiment, a sliding window mechanism may be used to analyze, frame by frame, whether each image frame is a video slicing time point (i.e., each step shown in fig. 2 above), which mainly includes the following steps:
in step S31, the image frames are sorted according to the image time stamps to obtain an image frame sequence.
The sequence of image frames is for example: { X1,...,Xi-2,Xi-1,Xi,Xi+1,Xi+2,...,Xn}。
Step S32, starting from the first frame image frame in the image frame sequence, performs sliding processing on the image frame sequence by using a sliding window with a preset length, and obtains data blocks covering a first number of image frames.
In this embodiment, the first number is determined according to a preset length of the sliding window, and the first number is preferably an odd number, and the preset length of the sliding window is determined according to the set frame number of the previous image frame, the set frame number of the subsequent image frame, and the frame number of the current image frame, in general, the current image frame is fixed to be 1 frame, and the first comparison frame number and the second comparison frame number can be arbitrarily set according to the requirement of the video segmentation precision.
In one embodiment, assuming that the first comparison frame number of the previous image frame is set to C1 and the second comparison frame number of the subsequent image frame is set to C2, the preset length W of the sliding window is C1+ C2+1, and the first number determined according to the preset length of the sliding window is C1+ C2+1, so that the generated data blocks can cover the previous image frame of the first comparison frame number, the current image frame of one image frame and the subsequent image frame of the second comparison frame number.
For example, referring to fig. 4A, when the first comparison frame number and the second comparison frame number are both set to 1, and the preset length of the sliding window is 3, the sliding window is used to perform sliding processing on the image frame sequence, so as to obtain data blocks (refer to the data blocks shown by the solid line frame type or the data blocks shown by the dashed line frame type in fig. 4A) covering 3 consecutive image frames, which include 1 previous image frame, 1 current image frame, and 1 subsequent image frame.
For another example, referring to fig. 4B, when the first comparison frame number and the second comparison frame number are both set to 2, the preset length of the sliding window is 5, in this case, the obtained data partition (refer to the data partition shown by the solid line frame type or the data partition shown by the dashed line frame type in fig. 4B) may cover 5 consecutive image frames, including 2 previous image frames, 1 current image frame, and 2 subsequent image frames.
Step S33, determining whether the target image frame in the data block is the current image frame and determining whether the image timestamp corresponding to the current image frame is the video segmentation time point.
Specifically, it may be determined whether the target image frame in the data block is the current image frame first (i.e., the above-mentioned steps S21-S23 are performed), and after the target image frame is determined to be the current image frame, it is further determined whether the image timestamp corresponding to the current image frame is the video slicing time point (i.e., the aforementioned steps S241/S242 to S27 are performed), and after the analysis job of the target image frame in the currently processed data block is completed, the step S34 is performed.
In an embodiment, the first comparison frame number and the second comparison frame number may be set to be the same, in which case, the preset length and the first number of the sliding window are both odd numbers, and in the data block, the previous image frame and the subsequent image frame are symmetrically distributed on both sides of the current image frame (the target image frame).
Specifically, when the first comparison frame number and the second comparison frame number are the same, the image frame located at the middle position of the data block is the target image frame (image frame to be analyzed), for example, in the data block shown in the solid line frame shape of fig. 4A, X is the target image frameiA target image frame is obtained; in the data block shown in the dotted frame of FIG. 4A, Xi+1The target image frame is the previous image frame, and as can be seen from fig. 4A, the previous image frame and the subsequent image frame are symmetrically distributed on both sides of the current image frame (the target image frame), and such a layout can facilitate the subsequent execution of the related operation processing of the previous similarity value and the subsequent difference value.
Step S34, determining whether the last frame image frame in the image frame sequence is covered by the data block, if yes, exiting the step, if no, proceeding to step S35.
And step S35, performing sliding processing on the image frame sequence by a preset step size based on the current position of the data block in the image frame sequence by using a sliding window to update the data block, and returning to execute step S33 to continue analyzing the target image frame in the updated data block until each image frame in the image frame sequence is covered by at least one data block.
In this embodiment, the image frames covered by the data blocks are changed according to the difference of the sliding positions of the sliding window, so that the target image frame to be analyzed in the data blocks can be updated by sliding the image frame sequence based on the current positions of the data blocks in the image frame sequence by using the sliding window until all the image frames in the image frame sequence are covered by at least one data block.
Specifically, the target image frame (e.g., X) in the currently processed data blocki) After the analysis operation is completed, the sliding window may automatically slide back by one bit to move the next image frame (e.g., X) in the image frame sequencei+1) The target image frames in the image frame sequence can be updated frame by sliding the image frame sequence with a preset step length (for example, sliding the image frame sequence frame by frame) through the sliding window, so that the technical effect of analyzing each image frame by frame is achieved.
In addition, when the target image frame in the data block is determined as the current image frame, a previous image frame and a subsequent image frame associated with the current image frame may be determined according to the coverage of the data block, so as to facilitate the operation processing of the previous similarity value and the subsequent difference value.
The following will exemplarily describe the specific implementation steps for determining whether each image frame is a video slicing time point by using a sliding window in conjunction with fig. 4A and 4B:
referring to fig. 4A, in an embodiment, the first comparison frame number and the second comparison frame number are both set to 1 frame, and the predetermined length of the sliding window is 3, that is, the data block obtained based on the sliding window can cover 3 consecutive image frames.
Specifically, WiFor indicating the position of the sliding window at the i-th sliding, Wi+1It indicates the position of the sliding window at the i +1 th sliding. As can be seen in FIG. 4A, when the sliding window is located at W relative to the sequence of image framesiIn the position of (b), the data block (see the solid-line frame-shaped portion) obtained according to the sliding window covers the image frame (X) of the i-1 st framei-1) Ith frame image frame (X)i) I +1 th frame image frame (X)i+1) Wherein, the ith frame image frame located in the middle of the data block is the target image frame (i.e. the current image frame to be analyzed), and if the ith frame image frame is determined as the current image frame, the (i-1) th frame image in the data blockThe frame is determined as a previous image frame, and the image frame of the (i + 1) th frame is determined as a subsequent image frame, so as to execute the analysis step of whether the image time stamp corresponding to the image frame of the (i) th frame is the video segmentation time point.
Furthermore, after the i-th frame image frame is analyzed, the sliding window is shifted backwards by one grid based on the current position of the data blocks in the image frame sequence, namely WiW to which the position of (1) is slidi+1At this time, the image frame covered by the data block (see the dashed frame part) obtained according to the sliding window is updated to the ith frame image frame (X)i) I +1 th frame image frame (X)i+1) I +2 th frame image frame (X)i+2) Correspondingly, the target image frame in the data block is also updated to the i +1 th frame image frame from the i +1 th frame image frame, and when the i +1 th frame image frame is determined as the current image frame, the i +2 th frame image frame in the data block is determined as the previous image frame, so as to perform the analysis step of whether the image timestamp corresponding to the i +1 th frame image frame is the video slicing time point.
And after the i +1 th image frame is analyzed, the sliding window continues to slide backwards by one grid, and so on until the last image frame (such as X image frame) in the image frame sequencen) Covered by one data block.
Referring to fig. 4B, in another embodiment, the first comparison frame number and the second comparison frame number are both set to 2 frames, and correspondingly, the predetermined length of the sliding window is 5, i.e., the data block obtained based on the sliding window can cover 5 consecutive image frames.
As can be seen in FIG. 4B, when the sliding window is located at W relative to the sequence of image framesiIn the position of (b), the image frame (X) of the i-2 th frame is covered in the data block (see the solid frame portion) obtained according to the sliding windowi-2) I-1 th frame image frame (X)i-1) Ith frame image frame (X)i) I +1 th frame image frame (X)i+1) And the (i + 2) th frame image frame (X)i+2) Wherein the i-th frame image frame located at the middle position of the data block is the target image frame (i.e. the current image to be analyzed)A frame). Furthermore, when the ith frame image frame is determined as the current image frame, the (i-2) th frame image frame and the (i-1) th frame image frame in the data block are determined as the previous image frames, and the (i + 1) th frame image frame and the (i + 2) th frame image frame are determined as the subsequent image frames, so as to execute the analysis step of whether the image time stamp corresponding to the ith frame image frame is used as the video segmentation time point.
After the ith frame image frame is analyzed, the sliding window automatically slides backwards by one grid based on the current position of the data blocks in the image frame sequence, namely WiW to which the position of (1) is slidi+1Position, at this time, the image frame overlaid in the data block (see the dotted frame type portion) obtained according to the sliding window is updated to the i-1 th frame image frame (X) accordinglyi-1) Ith frame image frame (X)i) I +1 th frame image frame (X)i+1) I +2 th frame image frame (X)i+2) I +3 th frame image frame (X)i+3) Correspondingly, the target image frame (i.e. the image frame to be analyzed currently) in the data block is also updated from the i-th frame image frame to the i + 1-th frame image frame, and when the i + 1-th frame image frame is determined as the current image frame, the i-1-th frame image frame and the i-th frame image frame in the data block are determined as the previous image frames, and the i + 2-th frame image frame and the i + 3-th frame image frame are determined as the subsequent image frames, so as to perform the analysis step of whether the image timestamp corresponding to the i + 1-th frame image frame is the video slicing time point.
After the i +1 th image frame completes analysis, the sliding window continues to slide backwards for one grid, and so on until the last image frame (X) in the image frame sequencen) Covered by one data block.
It should be noted that, in the image frame sequence, the initial image frame to be analyzed is determined according to the first relative frame number, for example, when the first relative frame number is set to 1 frame, the analysis is started from the 2 nd image frame in the image frame sequence, and when the first relative frame number is set to 8 frames, the analysis is started from the 9 th image frame in the image frame sequence.
It should be further noted that, in the embodiment of the present application, the determination operation of the current image frame and the determination operation of the video segmentation time point are both implemented based on the image features of the image frame, and each image frame and each image feature are in a one-to-one correspondence relationship, therefore, in practical applications, the image frame parameters in the sliding window may also be replaced by the image feature parameters corresponding to the image frame to complete each processing step shown in fig. 3, and the technical principles are the same, and therefore, details are not described herein.
As can be seen from the above, by introducing the segmentation sliding window mechanism, the embodiment of the present invention can update the target image frame to be analyzed in the image frame sequence frame by frame, and when the target image frame is determined as the current image frame, based on the coverage parameter range of the data block, the previous image frame of the first comparison frame number and the subsequent image frame of the second comparison frame number associated with the current image frame can be visually determined.
It should be noted that the setting number of the first comparison frame number and the second comparison frame number is not limited to that described in the embodiment of fig. 4A and fig. 4B, and may be arbitrarily changed according to actual requirements, for example, the first comparison frame number and the second comparison frame number are set as different comparison frame numbers, which is not limited in the present invention.
Fourth embodiment
Fig. 5 is a flowchart illustrating a video slicing method according to a fourth embodiment of the present invention. The video slicing method of the present embodiment exemplarily shows a specific implementation means for selectively updating the video source-destination timestamp of the video segment according to the audio source-destination timestamp and the video source-destination timestamp (i.e. step S3), and includes the following steps:
in step S51, a video source-destination timestamp of a video segment is extracted, and a slicing update condition is determined according to the extracted video source-destination timestamp of the video segment.
For example, from the video origin-destination time stamps of the plurality of video segments generated in step S292, the video origin-destination time stamp of one video segment is extracted, and the slicing update condition is determined according to the video origin-destination time stamp.
In step S52, it is determined whether there is an audio origin-destination timestamp satisfying the slicing update condition, if the determination result is present, step S531 is performed, and if the determination result is absent, step S532 is performed.
In this embodiment, the audio source-destination timestamps satisfying the slicing update condition are that the audio source-destination timestamp has an audio source-destination timestamp that is not greater than the video source-destination timestamp in the video source-destination timestamp, and the audio destination timestamp has an audio destination timestamp that is not less than the video destination timestamp in the video source-destination timestamp, that is, the time interval range of the video segment completely falls within the time interval range of one audio segment.
And step S531, updating the video origin-destination timestamp of the video clip by using the audio origin-destination timestamp meeting the segmentation updating condition.
Specifically, the video start timestamp can be updated with an audio start timestamp in the audio start timestamp, the video end timestamp can be updated with an audio end timestamp in the audio start timestamp, and the video start timestamp of the video segment can be updated based on the updated video start timestamp and video end timestamp.
In this embodiment, the timestamp is (v) when the audio origin is detected2j-1v2j) The audio segment completely covers the video origin-timestamp of (o)2ti-1,o2ti) When the video segment is recorded, the audio origin timestamp (v) is used2j-1,v2j) Updating video origin-timestamp (o) of video segments2ti-1,o2ti)。
For example, if v2j-1≤o2ti-1<v2jThat is, when the video start time stamp of the video segment is greater than or equal to the audio start time stamp of the audio start time stamps and less than the audio end time stamp, the audio start time stamp (v) is used2j-1) Updating video start time stamp (o)2ti-1)。
If v is2j-1<o2ti≤v2jThat is, the video end timestamp of the video segment is greater than the audio origin timestampIs less than or equal to the audio start time stamp, then the audio end time stamp (v) is utilized2j) Updating video end timestamp (o)2ti)。
In step S532, if each of the audio origin-destination timestamps does not satisfy the segmentation update condition, that is, the segment (o) satisfying the complete coverage is not found2ti-1,o2ti) When the audio origin-time stamp is received, the video origin-time stamp of the video segment is not updated.
Step S54, determining whether all video segments have been confirmed, if yes, performing step S55, otherwise, returning to step S51 to confirm whether the video timestamp of the next video segment needs to be updated.
In step S55, a video origin timestamp of the selectively updated video segment is obtained.
Fifth embodiment
A fifth embodiment of the present invention provides a computer storage medium, in which instructions for executing the steps of the video slicing method in the first to fourth embodiments are stored.
Sixth embodiment
Fig. 6 shows a main architecture of a video slicing apparatus according to a sixth embodiment of the present invention, and as shown in the drawing, a video slicing apparatus 600 according to an embodiment of the present invention mainly includes: a video slicing module 610, an audio slicing module 620, a slice update module 630, and a video slicing module 640.
The video slicing module 610 is configured to extract a plurality of image features corresponding to a plurality of image frames from video data of a target video, and determine a video origin-destination timestamp of at least one video segment according to each image feature.
Optionally, the video segmentation module 610 further extracts a plurality of image frames from the video data of the target video according to a preset time interval, and extracts each image feature corresponding to each image frame.
Optionally, the video segmentation module 610 further includes extracting each image feature corresponding to each image frame by using a residual network model.
Optionally, the video segmentation module 610 further calculates a similarity between the image frames according to the image features, and determines a video start timestamp and a video end timestamp of at least one video segment according to the similarity between the image frames.
The audio slicing module 620 is configured to extract a plurality of audio endpoints from the audio data of the target video, and determine an audio origin timestamp of at least one audio segment according to each audio endpoint.
Optionally, the audio segmentation module 620 further performs endpoint detection on the audio data of the target video based on the preset audio features to obtain a plurality of audio endpoints, and determines an audio start time stamp and an audio end time stamp of the audio segment containing the preset audio features according to the audio endpoints, where each audio endpoint is configured to identify an appearance time point and a disappearance time point of the preset audio features in the audio data, and determine the audio start time stamp and the audio end time stamp of the audio segment containing the preset audio features according to each audio endpoint.
The slicing update module 630 is configured to selectively update the video source-destination timestamp of the video segment according to the audio source-destination timestamp and the video source-destination timestamp, so as to obtain a selectively updated video source-destination timestamp of the video segment.
Optionally, the slicing updating module 630 further determines a slicing updating condition according to a video origin-destination timestamp of one of the video segments, and if one of the audio origin-destination timestamps meets the slicing updating condition, updates the video origin-destination timestamp of the video segment by using the audio origin-destination timestamp meeting the slicing updating condition; if each audio origin-destination timestamp does not meet the segmentation updating condition, the video origin-destination timestamp of the video clip is not updated; and the audio start timestamp meeting the segmentation updating condition is that the audio start timestamp in the audio start timestamp is not more than the video start timestamp in the video start timestamp, and the audio end timestamp in the audio start timestamp is not less than the video end timestamp in the video start timestamp.
The video slicing module 640 is configured to slice the target video according to the selectively updated video origin-destination timestamp of the video segment to generate at least one target sub-video.
As can be seen from the above, the video slicing apparatus provided in the embodiment of the present invention can better ensure the integrity of the video slicing content by determining the video slicing point by combining and using the video data and the audio data of the target video.
Seventh embodiment
Fig. 7 shows another embodiment of the architecture of the video slicing apparatus according to the seventh embodiment of the present invention, in this embodiment, the video slicing module 610 further includes a current image frame determining unit 611 and a video slicing time point determining unit 612.
The current image frame determining unit 631 is configured to calculate similarity between each image frame and an adjacent previous image frame according to each image feature, obtain each adjacent similarity value corresponding to each image frame, and determine an image frame with an adjacent similarity value greater than a first preset threshold as the current image frame.
The video segmentation time point determining unit 632 is configured to calculate, for each current image frame, a similarity between the current image frame and at least one previous image frame to obtain at least one previous similarity value, calculate a difference between the current image frame and at least one subsequent image frame to obtain at least one subsequent difference value, count a total number of the previous similarity values and the subsequent difference values that are greater than a first preset threshold, determine, if the total number is greater than a second preset threshold, an image time stamp of the current image frame as a video segmentation time point, and determine, based on the video segmentation time point, a video start time stamp and a video end time stamp of at least one video segment.
Optionally, the at least one previous image frame is a plurality of image frames, and the image timestamp of the image frame is less than the consecutive image frame of the first comparison frame number of the image timestamp of the current image frame; the at least one subsequent image frame is a continuous image frame of a plurality of image frames, and the image time stamp of the image frame is greater than the second comparison frame number of the image time stamp of the current image frame.
Optionally, the first comparison frame number and the second comparison frame number are respectively not less than 6 frames and not more than 20 frames.
Preferably, the video segmentation module 610 further includes a sliding window 613 with a preset length, which is configured to perform sliding processing on an image frame sequence generated based on each image frame to obtain data segments covering a first number of image frames, so that the current image frame determination unit determines whether a target image frame in the data segments is a current image frame, and provides the video segmentation time point determination unit to determine whether an image timestamp corresponding to the current image frame is a video segmentation time point; and performing sliding processing on the image frame sequence by a preset step size to update the data blocks based on the current positions of the data blocks in the image frame sequence until each image frame in the image frame sequence is covered by at least one data block.
Optionally, the first number is determined according to a preset length of the sliding window, the preset length of the sliding window is determined according to a first comparison frame number of a previous image frame, a second comparison frame number of a subsequent image frame, and a frame number of a current image frame, and the first comparison frame number is the same as the second comparison frame number.
By means of the video segmentation device provided by the embodiment of the invention, different video segmentation precision requirements can be met by adjusting the first comparison frame number and the second comparison frame number. Utilizing a window-sliding mechanism, each image frame in a sequence of image frames may be advantageously analyzed frame-by-frame.
In addition, the video slicing apparatus 600 according to each embodiment of the present invention can also be used to implement other steps in each of the foregoing video slicing method embodiments, and has the beneficial effects of the corresponding method step embodiments, which are not described herein again.
In summary, the video segmentation method, the video segmentation device, and the computer storage medium provided in the embodiments of the present invention combine and utilize video data and audio data in a target video, so as to ensure the integrity of video segmentation content and improve the working efficiency of video segmentation.
In addition, the comparison frame number of the previous image frame and the subsequent image frame can be flexibly configured to meet different video segmentation precision requirements.
Moreover, by introducing a segmentation window sliding mechanism, each image frame can be analyzed frame by frame, the fault tolerance of the system is improved, and the technical effect of video segmentation quality is further improved.
It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.
The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the video slicing method described herein. Further, when a general-purpose computer accesses code for implementing the video slicing method shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the video slicing method shown herein.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims (20)

1. A method for video segmentation, the method comprising:
extracting a plurality of image characteristics corresponding to a plurality of image frames from video data of a target video, and determining a video origin-destination timestamp of at least one video segment according to each image characteristic;
extracting a plurality of audio endpoints from the audio data of the target video, and determining an audio origin-destination timestamp of at least one audio segment according to each audio endpoint;
selectively updating the video origin-destination timestamp of the video clip according to the audio origin-destination timestamp and the video origin-destination timestamp to obtain the selectively updated video origin-destination timestamp of the video clip; and
and segmenting the target video according to the selectively updated video origin-destination timestamp of the video fragment to generate at least one target sub-video.
2. The method according to claim 1, wherein said extracting image features of a plurality of image frames from video data of a target video comprises:
extracting a plurality of image frames from the video data of the target video according to a preset time interval, and extracting each image feature corresponding to each image frame.
3. The method according to claim 2, wherein said extracting each image feature corresponding to each image frame comprises:
and extracting the image characteristics corresponding to the image frames by using a residual error network model.
4. The method according to claim 1, wherein said determining a video origin-destination timestamp of at least one video segment according to each of said image features comprises:
and calculating the similarity between the image frames according to the image characteristics, and determining the video starting time stamp and the video ending time stamp of the at least one video segment according to the similarity between the image frames.
5. The method of claim 4, wherein said calculating a similarity between each of the image frames according to each of the image features and determining a video start timestamp and a video end timestamp of the at least one video segment according to the similarity between each of the image frames comprises:
according to the image features, calculating the similarity between each image frame and the image frame adjacent to the image frame, and obtaining each adjacent similarity value corresponding to each image frame;
determining the image frame with the adjacent similarity value larger than the first preset threshold value as a current image frame according to the first preset threshold value and each adjacent similarity value;
for each current image frame, calculating the similarity between the current image frame and at least one previous image frame to obtain at least one previous similarity value, calculating the difference between the current image frame and at least one subsequent image frame to obtain at least one subsequent difference value, counting the total number of the previous similarity values and the subsequent difference values which are greater than a first preset threshold value, and if the total number is greater than a second preset threshold value, determining the image timestamp of the current image frame as a video segmentation time point;
determining a video start timestamp and a video end timestamp of the at least one video segment based on each of the video slicing time points.
6. The video slicing method as set forth in claim 5,
the at least one previous image frame is a continuous image frame of which the image time stamp of the image frame is less than a first comparison frame number of the image time stamp of the current image frame in the plurality of image frames; and
the at least one subsequent image frame is a continuous image frame of the plurality of image frames, wherein the image time stamp of the image frame is greater than a second comparison frame number of the image time stamp of the current image frame.
7. The video slicing method according to claim 6, further comprising: the first comparison frame number and the second comparison frame number are respectively not less than 6 frames and not more than 20 frames.
8. The method of video slicing according to claim 7, further comprising: and determining the second preset threshold value based on the first comparison frame number and the second comparison frame number.
9. The video slicing method according to claim 6, further comprising:
sequencing according to the image time stamps corresponding to the image frames to obtain an image frame sequence;
performing sliding processing on the image frame sequence by using a sliding window with a preset length to obtain data blocks covering a first number of image frames, wherein the first number is an odd number;
determining whether a target image frame in the data blocks is the current image frame and determining whether an image timestamp corresponding to the current image frame is the video segmentation time point;
performing sliding processing on the image frame sequence by a preset step size based on the current position of the data block in the image frame sequence by using the sliding window to update the data block, and repeating the steps of determining whether a target image frame in the data block is the current image frame and determining whether an image timestamp corresponding to the current image frame is the video segmentation time point until each image frame in the image frame sequence is covered by at least one data block.
10. The method according to claim 9, wherein the first number is determined according to a preset length of the sliding window, the preset length of the sliding window is determined according to a first comparison frame number of the previous image frame, a second comparison frame number of the subsequent image frame, and a frame number of the current image frame, wherein the current image frame is a frame, and the first comparison frame number is the same as the second comparison frame number.
11. The method of claim 5, wherein said determining a video start timestamp and a video end timestamp of said at least one video segment based on each of said video slicing time points comprises:
and sequencing according to the image time stamps corresponding to the video segmentation time points, and sequentially combining two adjacent video segmentation time points to serve as the video starting time stamp and the video ending time stamp of the video clip.
12. The video slicing method of claim 1, wherein said extracting a plurality of audio endpoints from audio data of the target video comprises:
performing endpoint detection on the audio data of the target video based on preset audio features to obtain a plurality of audio endpoints, wherein each audio endpoint is used for identifying an appearance time point and a disappearance time point of the preset audio features in the audio data; and
and according to each audio endpoint, determining an audio starting time stamp and an audio ending time stamp of the audio segment containing the preset audio characteristics.
13. The method according to claim 1, wherein said selectively updating the video source-destination timestamp of the video segment according to the audio source-destination timestamp and the video source-destination timestamp comprises:
determining a segmentation update condition according to a video origin-destination timestamp of one of the video segments;
according to the segmentation updating condition, if one of the audio source-destination timestamps meets the segmentation updating condition, updating the video source-destination timestamp of the video fragment by using the audio source-destination timestamp meeting the segmentation updating condition; and if each audio origin-destination timestamp does not meet the segmentation updating condition, the video origin-destination timestamp of the video clip is not updated.
14. The video slicing method according to claim 13, wherein the audio source-destination timestamps satisfying the slicing update condition are such that an audio start timestamp in the audio source-destination timestamps is not greater than a video start timestamp in the video source-destination timestamps, and an audio end timestamp in the audio source-destination timestamps is not less than a video end timestamp in the video source-destination timestamps, and the updating the video source-destination timestamps of the video segments with the audio source-destination timestamps satisfying the slicing update condition comprises:
updating the video start timestamp with an audio start timestamp in the audio start-to-timestamp and updating the video end timestamp with an audio end timestamp in the audio start-to-timestamp; and
updating the video origin-destination timestamp of the video segment based on the updated video start timestamp and video end timestamp.
15. The method of claim 1, wherein after said slicing said target video according to said video start timestamp and said video end timestamp to generate at least one target sub-video, said method further comprises:
and determining the target sub-video meeting the preset screening condition from the at least one target sub-video according to a preset screening condition.
16. A computer storage medium having stored therein instructions for carrying out the steps of the video slicing method according to any one of claims 1 to 15.
17. A video slicing apparatus, characterized in that said device comprises:
the video segmentation module is used for extracting a plurality of image characteristics corresponding to a plurality of image frames from video data of a target video and determining a video origin-destination timestamp of at least one video segment according to each image characteristic;
the audio segmentation module is used for extracting a plurality of audio endpoints from the audio data of the target video and determining an audio origin-destination timestamp of at least one audio fragment according to each audio endpoint;
the segmentation updating module is used for selectively updating the video origin-destination timestamp of the video fragment according to the audio origin-destination timestamp and the video origin-destination timestamp to obtain the selectively updated video origin-destination timestamp of the video fragment; and
and the video segmentation module is used for segmenting the target video according to the selectively updated video origin-destination timestamp of the video segment to generate at least one target sub-video.
18. The video slicing apparatus according to claim 17, wherein said video slicing module comprises:
a current image frame determining unit, configured to calculate, according to each image feature, a similarity between each image frame and an adjacent previous image frame, obtain each adjacent similarity value corresponding to each image frame, and determine, as a current image frame, the image frame whose adjacent similarity value is greater than a first preset threshold;
a video segmentation time point determining unit, configured to calculate, for each current image frame, a similarity between the current image frame and at least one previous image frame to obtain at least one previous similarity value, calculate a difference between the current image frame and at least one subsequent image frame to obtain at least one subsequent difference value, and count a total number of the previous similarity values and the subsequent difference values that are greater than the first preset threshold, if the total number is greater than a second preset threshold, determine an image time stamp of the current image frame as a video segmentation time point, and determine a video start time stamp and a video end time stamp of the at least one video segment based on the video segmentation time point.
19. The video slicing apparatus according to claim 18, wherein said video slicing module further comprises a sliding window having a preset length, said sliding window is configured to:
performing sliding processing on an image frame sequence generated based on each image frame to obtain data blocks covering a first number of image frames, so that the current image frame determining unit determines whether a target image frame in the data blocks is the current image frame, and the video segmentation time point determining unit determines whether an image timestamp corresponding to the current image frame is the video segmentation time point; and
sliding the image frame sequence by a preset step size to update the data blocks based on the current positions of the data blocks in the image frame sequence until each of the image frames in the image frame sequence is covered by at least one of the data blocks.
20. The video slicing apparatus according to claim 17, wherein said slicing update module comprises:
determining a segmentation updating condition according to a video origin-destination timestamp of one video clip, and updating the video origin-destination timestamp of the video clip by using the audio origin-destination timestamp meeting the segmentation updating condition if one of the audio origin-destination timestamps meets the segmentation updating condition according to the segmentation updating condition; if each of the audio origin-destination timestamps does not meet the segmentation updating condition, the video origin-destination timestamp of the video clip is not updated;
wherein, the audio origin-destination timestamp satisfying the slicing updating condition is that the audio origin timestamp in the audio origin-destination timestamp is not greater than the video origin timestamp in the video origin-destination timestamp, and the audio end timestamp in the audio origin-destination timestamp is not less than the video end timestamp in the video origin-destination timestamp.
CN202010514070.4A 2020-06-08 2020-06-08 Video segmentation method and device and computer storage medium Active CN111601162B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010514070.4A CN111601162B (en) 2020-06-08 2020-06-08 Video segmentation method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010514070.4A CN111601162B (en) 2020-06-08 2020-06-08 Video segmentation method and device and computer storage medium

Publications (2)

Publication Number Publication Date
CN111601162A true CN111601162A (en) 2020-08-28
CN111601162B CN111601162B (en) 2022-08-02

Family

ID=72186336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010514070.4A Active CN111601162B (en) 2020-06-08 2020-06-08 Video segmentation method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN111601162B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257545A (en) * 2020-10-19 2021-01-22 安徽领云物联科技有限公司 Violation real-time monitoring and analyzing method and device and storage medium
CN112650880A (en) * 2020-11-30 2021-04-13 重庆紫光华山智安科技有限公司 Video analysis method and device, computer equipment and storage medium
CN113096643A (en) * 2021-03-25 2021-07-09 北京百度网讯科技有限公司 Video processing method and device
CN113194333A (en) * 2021-03-01 2021-07-30 招商银行股份有限公司 Video clipping method, device, equipment and computer readable storage medium
CN115379290A (en) * 2022-08-22 2022-11-22 上海商汤智能科技有限公司 Video processing method, device, equipment and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6363380B1 (en) * 1998-01-13 2002-03-26 U.S. Philips Corporation Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser
JP2006270301A (en) * 2005-03-23 2006-10-05 Nippon Hoso Kyokai <Nhk> Scene change detecting apparatus and scene change detection program
CN104519401A (en) * 2013-09-30 2015-04-15 华为技术有限公司 Video division point acquiring method and equipment
CN104780388A (en) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 Video data partitioning method and device
CN105611401A (en) * 2015-12-18 2016-05-25 无锡天脉聚源传媒科技有限公司 Video cutting method and video cutting device
CN106162223A (en) * 2016-05-27 2016-11-23 北京奇虎科技有限公司 A kind of news video cutting method and device
CN107623860A (en) * 2017-08-09 2018-01-23 北京奇艺世纪科技有限公司 Multi-medium data dividing method and device
CN109600665A (en) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 Method and apparatus for handling data
CN109740499A (en) * 2018-12-28 2019-05-10 北京旷视科技有限公司 Methods of video segmentation, video actions recognition methods, device, equipment and medium
CN110213670A (en) * 2019-05-31 2019-09-06 北京奇艺世纪科技有限公司 Method for processing video frequency, device, electronic equipment and storage medium
CN110598014A (en) * 2019-09-27 2019-12-20 腾讯科技(深圳)有限公司 Multimedia data processing method, device and storage medium
CN110602552A (en) * 2019-09-16 2019-12-20 广州酷狗计算机科技有限公司 Video synthesis method, device, terminal and computer readable storage medium
US20200057890A1 (en) * 2017-04-28 2020-02-20 Alibaba Group Holding Limited Method and device for determining inter-cut time range in media item
CN111126197A (en) * 2019-12-10 2020-05-08 苏宁云计算有限公司 Video processing method and device based on deep learning

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6363380B1 (en) * 1998-01-13 2002-03-26 U.S. Philips Corporation Multimedia computer system with story segmentation capability and operating program therefor including finite automation video parser
JP2006270301A (en) * 2005-03-23 2006-10-05 Nippon Hoso Kyokai <Nhk> Scene change detecting apparatus and scene change detection program
CN104519401A (en) * 2013-09-30 2015-04-15 华为技术有限公司 Video division point acquiring method and equipment
CN104780388A (en) * 2015-03-31 2015-07-15 北京奇艺世纪科技有限公司 Video data partitioning method and device
CN105611401A (en) * 2015-12-18 2016-05-25 无锡天脉聚源传媒科技有限公司 Video cutting method and video cutting device
CN106162223A (en) * 2016-05-27 2016-11-23 北京奇虎科技有限公司 A kind of news video cutting method and device
US20200057890A1 (en) * 2017-04-28 2020-02-20 Alibaba Group Holding Limited Method and device for determining inter-cut time range in media item
CN107623860A (en) * 2017-08-09 2018-01-23 北京奇艺世纪科技有限公司 Multi-medium data dividing method and device
CN109600665A (en) * 2018-08-01 2019-04-09 北京微播视界科技有限公司 Method and apparatus for handling data
CN109740499A (en) * 2018-12-28 2019-05-10 北京旷视科技有限公司 Methods of video segmentation, video actions recognition methods, device, equipment and medium
CN110213670A (en) * 2019-05-31 2019-09-06 北京奇艺世纪科技有限公司 Method for processing video frequency, device, electronic equipment and storage medium
CN110602552A (en) * 2019-09-16 2019-12-20 广州酷狗计算机科技有限公司 Video synthesis method, device, terminal and computer readable storage medium
CN110598014A (en) * 2019-09-27 2019-12-20 腾讯科技(深圳)有限公司 Multimedia data processing method, device and storage medium
CN111126197A (en) * 2019-12-10 2020-05-08 苏宁云计算有限公司 Video processing method and device based on deep learning

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257545A (en) * 2020-10-19 2021-01-22 安徽领云物联科技有限公司 Violation real-time monitoring and analyzing method and device and storage medium
CN112650880A (en) * 2020-11-30 2021-04-13 重庆紫光华山智安科技有限公司 Video analysis method and device, computer equipment and storage medium
CN112650880B (en) * 2020-11-30 2022-06-03 重庆紫光华山智安科技有限公司 Video analysis method and device, computer equipment and storage medium
CN113194333A (en) * 2021-03-01 2021-07-30 招商银行股份有限公司 Video clipping method, device, equipment and computer readable storage medium
CN113194333B (en) * 2021-03-01 2023-05-16 招商银行股份有限公司 Video editing method, device, equipment and computer readable storage medium
CN113096643A (en) * 2021-03-25 2021-07-09 北京百度网讯科技有限公司 Video processing method and device
CN115379290A (en) * 2022-08-22 2022-11-22 上海商汤智能科技有限公司 Video processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111601162B (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN111601162B (en) Video segmentation method and device and computer storage medium
CN110213670B (en) Video processing method and device, electronic equipment and storage medium
JP4132589B2 (en) Method and apparatus for tracking speakers in an audio stream
JP2003177778A (en) Audio excerpts extracting method, audio data excerpts extracting system, audio excerpts extracting system, program, and audio excerpts selecting method
CN107609149B (en) Video positioning method and device
US20210319230A1 (en) Keyframe Extractor
CN108682436B (en) Voice alignment method and device
CN111444255B (en) Training method and device for data model
Lu et al. Unsupervised speaker segmentation and tracking in real-time audio content analysis
Perez-Freire et al. A multimedia approach for audio segmentation in TV broadcast news
US20230353797A1 (en) Classifying segments of media content using closed captioning
JP2003346147A (en) Object discriminating method, object discriminating device, object discriminating program, and recording medium having the object discriminating program recorded therein
JP4859130B2 (en) Monitoring system
JP4924423B2 (en) Device for detecting cut point of moving image based on prediction error of feature amount
JP2021093627A (en) Editing system
JP4979029B2 (en) Scene segmentation apparatus for moving image data
JP2006140707A (en) Method, device and program for processing image and computer-readable recording medium recording program
CN114363720B (en) Video slicing method, system, equipment and medium based on computer vision
CN115880737B (en) Subtitle generation method, system, equipment and medium based on noise reduction self-learning
CN117099159A (en) Information processing device, information processing method, and program
JP2005252859A (en) Scene-splitting device for dynamic image data
US20220286737A1 (en) Separating Media Content into Program Segments and Advertisement Segments
CN108235137B (en) Method and device for judging channel switching action through sound waveform and television
Ruiz-Muñoz et al. Threshold estimation in energy-based methods for segmenting birdsong recordings
US20020078438A1 (en) Video signal analysis and storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant