CN110213670A - Method for processing video frequency, device, electronic equipment and storage medium - Google Patents

Method for processing video frequency, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110213670A
CN110213670A CN201910472453.7A CN201910472453A CN110213670A CN 110213670 A CN110213670 A CN 110213670A CN 201910472453 A CN201910472453 A CN 201910472453A CN 110213670 A CN110213670 A CN 110213670A
Authority
CN
China
Prior art keywords
video
scene
processed
segmentation
segmentation point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910472453.7A
Other languages
Chinese (zh)
Other versions
CN110213670B (en
Inventor
贾少勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201910472453.7A priority Critical patent/CN110213670B/en
Publication of CN110213670A publication Critical patent/CN110213670A/en
Application granted granted Critical
Publication of CN110213670B publication Critical patent/CN110213670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of method for processing video frequency, device, electronic equipment and storage mediums.Method for processing video frequency includes: to obtain video to be processed, and video to be processed is divided into multiple units video to be processed;The corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively;Scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed;Scene cut is carried out to the video to be processed according to the scene pre-segmentation point, the video clip that duration is more than the maximum time threshold value of setting is searched from the video clip that scene cut obtains, as video clip to be split;Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, obtains the video clip that segmentation is completed.The present invention avoids improving the accuracy of demolition, preferably meets user demand.

Description

Method for processing video frequency, device, electronic equipment and storage medium
Technical field
The present invention relates to Internet technical fields, more particularly to a kind of method for processing video frequency, device, electronic equipment and deposit Storage media.
Background technique
Video display are with copy, tape, film, memory etc. for carrier, for the purpose of screen, screen show, to realize The art form that vision and comprehensive hearing watch, is the comprehensive morphological of modern art, contains in film, TV play, animation etc. Hold.Video display class video is usually all long video, and user may and be not concerned with the full content of entire long video in some cases, But focus more on some segment in long video.Therefore in order to meet the needs of users, video display class video can be torn open Item splits into multiple video clips, watches for selection by the user.
In the prior art, it generallys use the demolition method based on scene change and demolition is carried out to video display class video, according to view Whether the scene image information in frequency, which occurs large change, carries out demolition, the time point that scene image information is varied widely As demolition cut-point.
But for video display class video, Same Scene duration may be long, and under the Same Scene Practical not only includes a segment.It can be using the video under Same Scene as a video using aforesaid way in the case of this kind Segment, therefore cause the video clip of demolition too long, demolition inaccuracy is unable to satisfy user demand.
Summary of the invention
The embodiment of the present invention provides a kind of method for processing video frequency, device, electronic equipment and storage medium, to solve demolition The problem of video clip is too long, demolition inaccuracy, is unable to satisfy user demand.
In a first aspect, the embodiment of the invention provides a kind of method for processing video frequency, which comprises
Video to be processed is obtained, the video to be processed is divided into multiple units video to be processed;
The corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively;
Scene pre-segmentation point, Yi Jigen are determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed Audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed;
Scene cut, the video obtained from scene cut are carried out to the video to be processed according to the scene pre-segmentation point The video clip that duration is more than the maximum time threshold value of setting is searched in segment, as video clip to be split;
Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, obtains the view that segmentation is completed Frequency segment.
It is optionally, described that audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, comprising: The intermediate point with the video clip to be split is searched from the audio pre-segmentation point apart from nearest audio pre-segmentation point;It presses Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point found;Judge the video that audio segmentation obtains In segment, if there are the video clips that duration is more than the maximum time threshold value;It is being more than the maximum time there are duration When the video clip of threshold value, using the duration be more than the maximum time threshold value video clip as video clip to be split, The intermediate point for executing the lookup from the audio pre-segmentation point and the video clip to be split is returned apart from nearest sound The step of frequency pre-segmentation point.
Optionally, described to determine that scene is divided in advance according to the corresponding scene characteristic vector of every two adjacent cells video to be processed After cutpoint, further includes: search the scene pre-segmentation point of the of short duration transformation of scene from the scene pre-segmentation point;Delete the field The scene pre-segmentation point of the of short duration transformation of scape, obtains remaining scene pre-segmentation point;It is described according to the scene pre-segmentation point to institute It states video to be processed and carries out scene cut, comprising: the video to be processed is carried out according to the remaining scene pre-segmentation point Scene cut.
Optionally, the scene pre-segmentation point that the of short duration transformation of scene is searched from the scene pre-segmentation point, comprising: obtain The duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point is taken, and searches the field that duration is less than setting Two adjacent scene pre-segmentation points of scape transformation threshold value;It obtains in the two adjacent scene pre-segmentation points found, it is previous At least one after at least one unit video to be processed and the latter scene pre-segmentation point before a scene pre-segmentation point Unit video to be processed;It calculates in the unit video to be processed obtained, the corresponding scene of every two adjacent cells video to be processed The similarity of feature vector;According to each similarity, average value is calculated;It is greater than preset scene phase in the average value of the similarity When like degree threshold value, at least one of two adjacent scene pre-segmentation points found are determined as to the field of the of short duration transformation of scene Scape pre-segmentation point.
Optionally, described to determine that scene is divided in advance according to the corresponding scene characteristic vector of every two adjacent cells video to be processed After cutpoint, further includes: search the scene pre-segmentation point that non-scene itself converts from the scene pre-segmentation point;Described in deletion The scene pre-segmentation point that non-scene itself converts, obtains remaining scene pre-segmentation point;It is described according to the scene pre-segmentation point Scene cut is carried out to the video to be processed, comprising: according to the remaining scene pre-segmentation point to the video to be processed Carry out scene cut.
Optionally, the scene pre-segmentation point that non-scene itself is searched from the scene pre-segmentation point and is converted, comprising: For each scene pre-segmentation point, at least one unit video to be processed before current scene pre-segmentation point and later is obtained At least one unit video to be processed;It determines in the unit video to be processed obtained, each unit video to be processed is corresponding straight Square figure feature vector;It calculates in the unit video to be processed obtained, the corresponding histogram of every two adjacent cells video to be processed The similarity of feature vector;According to each similarity, average value is calculated;It is greater than preset histogram in the average value of the similarity When similarity threshold, the current scene pre-segmentation point is determined as the scene pre-segmentation point that non-scene itself converts.
Optionally, described to determine that scene is divided in advance according to the corresponding scene characteristic vector of every two adjacent cells video to be processed After cutpoint, further includes: obtain the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point, and look into Duration is looked for be less than two adjacent scene pre-segmentation points of the minimum duration threshold value of setting;The two adjacent scenes that will be found At least one of pre-segmentation point is deleted, and remaining scene pre-segmentation point is obtained;It is described according to the scene pre-segmentation point to institute It states video to be processed and carries out scene cut, comprising: the video to be processed is carried out according to the remaining scene pre-segmentation point Scene cut.
Optionally, described that the video to be processed is divided into before multiple units video to be processed, further includes: detection institute The theme song segment in video to be processed is stated, and deletes the theme song segment from the video to be processed;It is described will be described Video to be processed is divided into multiple units video to be processed, comprising: will delete the video to be processed after the theme song segment It is divided into multiple units video to be processed.
Optionally, it is described obtain respectively the corresponding scene characteristic vector sum audio frequency characteristics of each unit video to be processed to Amount, comprising: while calling the first process and the second process;Each unit video to be processed is obtained respectively using first process Corresponding scene characteristic vector;Using second process obtain respectively the corresponding audio frequency characteristics of each unit video to be processed to Amount.
Second aspect, the embodiment of the invention provides a kind of video process apparatus, described device includes:
The video to be processed is divided into multiple units video to be processed for obtaining video to be processed by division module;
Obtain module, for obtain respectively the corresponding scene characteristic vector sum audio frequency characteristics of each unit video to be processed to Amount;
Determining module, for determining that scene is pre- according to the corresponding scene characteristic vector of every two adjacent cells video to be processed Cut-point, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed;
Scene cut module, for carrying out scene cut to the video to be processed according to the scene pre-segmentation point;
First searching module, for searching the maximum time that duration is more than setting from the video clip that scene cut obtains The video clip of threshold value, as video clip to be split;
Audio segmentation module, for carrying out audio minute to the video clip to be split according to the audio pre-segmentation point It cuts, obtains the video clip that segmentation is completed.
Optionally, the audio segmentation module includes: audio segmentation point searching unit, is used for from the audio pre-segmentation point The intermediate point of middle lookup and the video clip to be split is apart from nearest audio pre-segmentation point;Fragment segmentation unit, for pressing Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point found;Segment determination unit, for judging In the video clip that audio segmentation obtains, if there are the video clips that duration is more than the maximum time threshold value;In the presence of When length is more than the video clip of the maximum time threshold value, the video clip that the duration is more than the maximum time threshold value is made For video clip to be split, and call the cut-point searching unit.
Optionally, described device further include: the second searching module, it is short for searching scene from the scene pre-segmentation point The scene pre-segmentation point temporarily converted;First removing module is obtained for deleting the scene pre-segmentation point of the of short duration transformation of the scene Remaining scene pre-segmentation point;The scene cut module is specifically used for according to the remaining scene pre-segmentation point to described Video to be processed carries out scene cut.
Optionally, second searching module includes: scene cut point searching unit, for obtaining the scene pre-segmentation Duration in point between every two adjacent scene pre-segmentation point, and search two that duration is less than the scene change threshold value of setting Adjacent scene pre-segmentation point;First video acquisition unit, it is preceding for obtaining in find two adjacent scene pre-segmentation points At least one after at least one unit video to be processed and the latter scene pre-segmentation point before one scene pre-segmentation point A unit video to be processed;First similarity calculated, for calculating in the unit video to be processed obtained, every two is adjacent The similarity of the corresponding scene characteristic vector of unit video to be processed;First average calculation unit is used for according to each similarity, Calculate average value;First cut-point determination unit is greater than preset scene similarity threshold for the average value in the similarity When value, at least one of two adjacent scene pre-segmentation points found are determined as the scene of the of short duration transformation of scene in advance minute Cutpoint.
Optionally, described device further include: third searching module, for searching non-scene from the scene pre-segmentation point The scene pre-segmentation point of transformation itself;Second removing module, the scene pre-segmentation point converted for deleting the non-scene itself, Obtain remaining scene pre-segmentation point;The scene cut module is specifically used for according to the remaining scene pre-segmentation point pair The video to be processed carries out scene cut.
Optionally, the third searching module includes: the second video acquisition unit, for being directed to the scene pre-segmentation point In each scene pre-segmentation point, obtain at least one unit video to be processed before current scene pre-segmentation point and later At least one unit video to be processed;Histogram determination unit, for determining in the unit video to be processed obtained, each unit The corresponding histogram feature vector of video to be processed;Second similarity calculated, for calculating the unit view to be processed obtained In frequency, the similarity of the corresponding histogram feature vector of every two adjacent cells video to be processed;Second average calculation unit, For calculating average value according to each similarity;Second cut-point determination unit is greater than pre- for the average value in the similarity If histogram similarity threshold when, the current scene pre-segmentation point is determined as the scene pre-segmentation that non-scene itself converts Point.
Optionally, described device further include: the 4th searching module, for obtaining every two phase in the scene pre-segmentation point Duration between adjacent scene pre-segmentation point, and the two adjacent scenes for searching the minimum duration threshold value that duration is less than setting are pre- Cut-point;Third removing module is obtained for deleting at least one of two adjacent scene pre-segmentation points found Remaining scene pre-segmentation point;The scene cut module is specifically used for according to the remaining scene pre-segmentation point to described Video to be processed carries out scene cut.
Optionally, described device further include: detection module, for detecting the theme song segment in the video to be processed; 4th removing module, for deleting the theme song segment from the video to be processed;The division module, being specifically used for will It deletes the video to be processed after the theme song segment and is divided into multiple units video to be processed.
Optionally, the acquisition module includes: call unit, is used for while calling the first process and the second process;Scene Feature acquiring unit, for using first process obtain respectively the corresponding scene characteristic of each unit video to be processed to Amount;Audio frequency characteristics acquiring unit, for obtaining the corresponding audio of each unit video to be processed respectively using second process Feature vector.
The third aspect, the embodiment of the invention provides a kind of electronic equipment, comprising: processor;It can for storage processor The memory executed instruction;Wherein, the processor is configured to executing as above described in any item method for processing video frequency.
Fourth aspect, the embodiment of the invention provides a kind of non-transitorycomputer readable storage mediums, which is characterized in that When the instruction in the storage medium is executed by the processor of electronic equipment, so that electronic equipment is able to carry out any one as above The method for processing video frequency.
In embodiments of the present invention, video to be processed is obtained, it is to be processed that the video to be processed is divided into multiple units Video;The corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively;According to every two phase The corresponding scene characteristic vector of adjacent unit video to be processed determines scene pre-segmentation point, and waits locating according to every two adjacent cells The corresponding audio feature vector of reason video determines audio pre-segmentation point;According to the scene pre-segmentation point to the video to be processed Scene cut is carried out, the piece of video that duration is more than the maximum time threshold value of setting is searched from the video clip that scene cut obtains Section, as video clip to be split;Audio segmentation is carried out to video clip to be split according to the audio pre-segmentation point, is divided Cut the video clip of completion.It follows that in the angle based on scene change in the embodiment of the present invention, according to scene pre-segmentation point After carrying out scene cut to video to be processed, the angle of audio transformation is based further on to the longer video clip of duration, according to Audio pre-segmentation point carries out audio segmentation, so that avoiding being based only upon scene change carries out that the video clip after demolition is too long of to ask Topic, improves the accuracy of demolition, preferably meets user demand.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of method for processing video frequency of the embodiment of the present invention;
Fig. 2 is the step flow chart of another method for processing video frequency of the embodiment of the present invention;
Fig. 3 is a kind of schematic diagram of video processing procedure of the embodiment of the present invention;
Fig. 4 is a kind of structural block diagram of video process apparatus of the embodiment of the present invention;
Fig. 5 is the structural block diagram of another video process apparatus of the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
Referring to Fig.1, a kind of step flow chart of method for processing video frequency of the embodiment of the present invention is shown.
The method for processing video frequency of the embodiment of the present invention the following steps are included:
Step 101, video to be processed is obtained, the video to be processed is divided into multiple units video to be processed.
Video to be processed refers to the video display class video with demolition demand.For example, each collection in a TV play can be with As a video to be processed, a film can be used as a video to be processed, and each collection in an animation can be used as One video to be processed, etc..
Demolition is carried out to video to be processed, to find cut-point from video to be processed.Video to be processed for one, will It is divided into multiple units video to be processed and is analyzed.
In a kind of optional embodiment, video to be processed can be divided into multiple units as unit of setting duration and waited for Handle video.For setting the specific value of duration, those skilled in the art select any suitable value equal based on practical experience It can.For example for convenience of handling, setting duration can be set to 1s etc..
Step 102, the corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively.
The embodiment of the present invention not only considers the angle of scene change, it is also considered that sound when carrying out demolition to video to be processed The angle of frequency transformation.Angle based on scene change can determine whether scene converts according to scene characteristic vector.It is based on The angle of audio transformation, can determine whether audio converts according to audio feature vector.
Therefore, the corresponding scene characteristic vector sum audio of each unit video to be processed is obtained in the embodiment of the present invention respectively Feature vector.For example, can use has identification image information and obtains image information character pair vector field homoemorphism type, to unit Image information in video to be processed is identified, the corresponding scene characteristic vector of unit video to be processed is obtained;It can use With identification audio signal and audio signal character pair vector field homoemorphism type is obtained, to the audio signal in unit video to be processed It is identified, obtains the corresponding audio feature vector of unit video to be processed.It, will be in following reality for specific acquisition process It applies in example and is discussed in detail.
Step 103, scene pre-segmentation is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed Point, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed.
According to the difference between the corresponding scene characteristic vector of two adjacent cells videos to be processed, two phases can be learnt Whether whether the scene of adjacent unit video to be processed converts, thus may determine that being field between two adjacent cells videos to be processed Scape pre-segmentation point.According to the difference between the corresponding audio feature vector of two adjacent cells videos to be processed, two can be learnt Whether the audio of a adjacent cells video to be processed converts, thus may determine that between two adjacent cells videos to be processed whether For audio pre-segmentation point.
Therefore, it can determine that multiple scenes are pre- according to the corresponding scene characteristic vector of every two adjacent cells video to be processed Cut-point can determine multiple audio pre-segmentations according to the corresponding audio feature vector of every two adjacent cells video to be processed Point.Step 104, scene cut is carried out to the video to be processed according to the scene pre-segmentation point, is obtained from scene cut The video clip that duration is more than the maximum time threshold value of setting is searched in video clip, as video clip to be split.
Scene cut is carried out to video to be processed according to scene pre-segmentation point, obtains multiple video clips.Each piece of video The corresponding scene of section, therefore video clip may be different in size.If some scene duration in video to be processed Longer, the duration of the video clip obtained according to the scene cut is longer, and the video clip can actually continue to be divided into The shorter video clip of duration.
Therefore, in the video clip that scene cut obtains, the maximum time threshold value T that duration is more than setting is searchedmaxVideo Segment, using the video clip found as video clip to be split.Video clip to be split is to further progress audio point The video clip cut.For the specific value of maximum time threshold value, those skilled in the art select any suitable based on practical experience Value, for example can be set to 6min, 7min, 8min, 9min, etc..
Step 105, audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, is divided The video clip of completion.
For video clip to be split, the angle that can be converted based on audio, it is carried out according to audio pre-segmentation point into One step carries out audio segmentation.For specific cutting procedure, will be discussed in detail in the following embodiments.Audio segmentation completes it The video clip that segmentation is completed can be obtained afterwards.
In the angle based on scene change in the embodiment of the present invention, field is carried out to video to be processed according to scene pre-segmentation point After scape segmentation, it is based further on the angle of audio transformation to the longer video clip of duration, carries out sound according to audio pre-segmentation point Frequency division is cut, so that avoiding being based only upon scene change carries out the too long problem of the video clip after demolition, improves the accuracy of demolition, Preferably meet user demand.
Referring to Fig. 2, the step flow chart of another method for processing video frequency of the embodiment of the present invention is shown.
The method for processing video frequency of the embodiment of the present invention the following steps are included:
Step 201, video to be processed is obtained, detects the theme song segment in the video to be processed, and from described to from The theme song segment is deleted in reason video.
In video display class video, theme song segment is generally included, theme song may include Presence of the Moment and piece caudal flexure.It is treating It handles before video carries out demolition, can delete the theme song segment in video to be processed, it is subsequent in video to be processed Hold part and carries out demolition.
In the embodiment of the present invention, the video processing model for detecting theme song segment in video can be first generated.It generates The process of video processing model may comprise steps of A1~A4.
Step A1 obtains training sample.
In training pattern, can be obtained from internet largely from the Sample video of video display class video first.Sample This video may include theme song video and not a theme vision distortion frequency, and theme song video may include the head vision distortion in video display video Frequency and run-out vision distortion frequency, not a theme vision distortion frequency may include video of speaking, cheer video, applause video etc. in video display video. Sample video is labeled by mark personnel, obtains the markup information of Sample video, markup information is used to indicate sample view Whether frequency belongs to theme song classification.For example, markup information is that " 1 " indicates that Sample video is the theme bent classification, markup information is " 0 " Instruction Sample video is not a theme song classification.The markup information of the Sample video and Sample video that will acquire is instructed as one Practice sample, using a large amount of training sample as training sample set.The treatment process of each training sample is identical, the present invention The treatment process for being directed to a training sample is mainly introduced in embodiment.
In the embodiment of the present invention, it can come by acquisition from the Sample video of the video display video of multiple and different types Guarantee the diversity of sample;It can be by the theme song video and not a theme vision distortion frequency of acquisition equal number, to guarantee sample Uniformity.For example, 2000 Sample videos from TV play class video display video are obtained, wherein 1000 vision distortions that are the theme Frequently, 1000 are not a theme vision distortion frequency;2000 Sample videos from film class video display video are obtained, wherein 1000 are Theme song video, 1000 are not a theme vision distortion frequency;2000 Sample videos from animation class video display video are obtained, wherein 1000 vision distortion frequencies that are the theme, 1000 are not a theme vision distortion frequency.By above-mentioned 6000 Sample videos and the mark of Sample video Information is as training sample set.
Wherein, for the specific duration of each Sample video, those skilled in the art select any suitable based on practical experience Value, such as duration can be 3s, 4s, 5s, etc..
Sample video is divided into multiple unit sample videos by step A2.
The video for detecting the theme song segment in video is trained to handle model in the embodiment of the present invention, it is contemplated that video In theme song segment be theme song classification there are the audio in consistency namely theme song segment in audio, pass through sound Frequency feature vector may determine whether the bent classification that is the theme, therefore the video processing model in the embodiment of the present invention is based primarily upon sound Frequency feature vector detects whether the bent classification that is the theme.
For a Sample video, it is divided into multiple unit sample videos and is analyzed.
In a kind of optional embodiment, Sample video can be divided into multiple unit samples as unit of setting duration Video.For setting the specific value of duration, those skilled in the art select any suitable value based on practical experience.Than It such as, is the audio of 1s since neural network model is manageable if obtaining audio feature vector using neural network model Signal, therefore set duration and can be set to 1s, etc..
Step A3 obtains the corresponding audio feature vector of the unit sample video for each unit sample video.
For each unit sample video, the corresponding audio feature vector of unit sample video is obtained respectively.
For example, a length of 5s, is divided into unit sample video for Sample video A as unit of 1s at that time for Sample video A 1, unit sample video 2, unit sample video 3, unit sample video 4, unit sample video 5, totally 5 unit sample videos. Therefore, respectively obtain the corresponding audio feature vector of unit sample video 1, the corresponding audio feature vector of unit sample video 2, The corresponding audio feature vector of unit sample video 3, the corresponding audio feature vector of unit sample video 4, unit sample video 5 Corresponding audio feature vector.
In a kind of optional embodiment, obtain the corresponding audio feature vector of a unit sample video may include with Lower step A31~A32.
Step A31 generates the corresponding spectrogram of audio signal in the unit sample video.
Step A31 can further include following steps A311~A313:
Step A311 carries out sub-frame processing to the audio signal in the unit sample video, obtains multiple audio signals Frame.
Audio signal is extracted from unit sample video, and the audio signal in unit sample video is carried out at framing Reason.
In a kind of optional embodiment, multimedia video handling implement FFmpeg can use from unit sample video Extract audio signal.FFmpeg be it is a set of can be used to record, converted digital audio, video, and opening for stream can be translated into Source computer program.It provides recording, conversion and the total solution for fluidizing audio-video.It contains very advanced Audio/video encoding and decoding library libavcodec is many in libavcodec in order to guarantee high portable and encoding and decoding quality From the beginning code is developed.FFmpeg has very powerful function, including the conversion of video acquisition function, video format, video Grabgraf, to video with watermark etc..For example, can use FFmpeg according to sample rate and PCM_S16LE (the Pulse Code of 16k Modulation, pulse code modulation) coded format audio signal, the audio signal of extraction are extracted from unit sample video The formats such as wav can be saved as.
Audio signal is being macroscopically jiggly, is that smoothly, audio signal has short-term stationarity (10 on microcosmic It is considered that audio signal approximation is constant in~30ms), thus audio signal can be divided into some short sections to be handled, Here it is framings, each short section is known as an audio signal frame after framing.For example, can be using the framing side of overlapping segmentation Method, namely interception way back-to-back is not used, but use the interception way of overlapped a part.Wherein, former frame and The overlapping part of a later frame is known as frame shifting, and frame, which is moved, is generally 0~0.5 with the ratio of frame length.It can basis for specific frame length Actual conditions setting, it is 33~100 that frame number per second, which can be set,.
Step A312 carries out windowing process to each audio signal frame and Fourier transformation is handled, obtains the unit sample The corresponding initial spectrum figure of audio signal in this video.
Audio is not stop to change in long range, and the characteristic that do not fix can not process, so each audio is believed Number frame carries out windowing process, and audio signal frame is multiplied by adding window with a window function.The purpose of adding window is to eliminate each audio The signal discontinuity that signal frame both ends are likely to result in makes global more continuous.The cost of adding window is an audio signal frame Both ends part be weakened, so to have when framing, between frame and frame overlapping.In practical applications, audio is believed Number frame, which carries out common window function when windowing process, to be square window, Hamming window, Hanning window, etc..According to the frequency domain of window function Characteristic can preferably use Hamming window.
Since the transformation of audio signal in the time domain is generally difficult to find out the characteristic of signal, so usually converting it to frequency Energy distribution on domain is observed, and different Energy distributions can represent the characteristic of different phonetic.So after windowing process, Fourier transformation processing is carried out to each audio signal frame after windowing process, to obtain the Energy distribution on frequency spectrum, is obtained each The frequency spectrum of audio signal frame, and then obtain the corresponding initial spectrum figure of the audio signal in unit sample video.
Step A313 carries out Meier conversion process to the initial spectrum figure and obtains Meier spectrogram, by the Meier frequency Spectrogram is as the corresponding spectrogram of audio signal in the unit sample video.
Initial spectrum figure is often a biggish figure, in order to obtain the audio frequency characteristics of suitable size, can be initial frequency Spectrogram carries out Meier conversion process by Meier (Mel) filter group, is transformed to Meier spectrogram.
The unit of frequency is hertz (Hz), and the frequency range that human ear can be heard is 20-20000Hz, but human ear is this to Hz Scale unit is not linear perception relationship.For example, if we have adapted to the tone of 1000Hz, if pitch frequency is improved To 2000Hz, our ear can only be aware of frequency and improve a little, be detectable frequency at all and be doubled.It will be general Logical frequency translation is mel-frequency, and mapping relations are shown below:
Mel (f)=2595*log10(1+f/700)
Wherein, f is common frequency, and mel (f) is mel-frequency.
By above-mentioned formula, human ear is to the perceptibility of frequency just at linear relationship.That is, under mel-frequency, If the mel-frequency of two section audios differs twice, the tone that human ear can perceive probably also is differed twice.
According to the actual situation, frequency is divided into multiple Meier filters by human ear sensitivity, obtains Meier filter group, Meier filter group may include 20~40 Meier filters.In Mel frequency range, the center frequency of each Meier filter Rate is the linear distribution of equal intervals, but is not equal intervals in frequency range.Using Meier filter group to initial spectrum Figure is filtered, and obtains Meier spectrogram, and the audio signal which is determined as in unit sample video is corresponding Spectrogram.
The corresponding spectrogram of audio signal in the unit sample video is inputted preset neural network by step A32 The audio feature vector that the neural network model exports is determined as the corresponding audio frequency characteristics of the unit sample video by model Vector.
In the embodiment of the present invention, neural network model can use, the audio signal in unit sample video is corresponding Spectrogram inputs neural network model, after carrying out feature extraction inside neural network model, neural network model output Audio feature vector, the audio feature vector are the corresponding audio feature vector of unit sample video.
In a kind of optional embodiment, VGGish (Visual Geometry Group, visual geometric group) can use Model extraction audio feature vector.VGGish model may include convolutional layer, full articulamentum etc., and wherein convolutional layer can be used for mentioning Feature is taken, full articulamentum can be used for classifying the feature of extraction obtaining corresponding feature vector.Therefore, by unit sample The corresponding spectrogram of audio signal in video inputs VGGish model, extracts the audio frequency characteristics in spectrogram by convolutional layer, The audio frequency characteristics of extraction are inputted full articulamentum again by convolutional layer, are classified by full articulamentum to audio frequency characteristics, are obtained 128 dimensions Audio feature vector, full articulamentum exports the audio feature vector.
In the embodiment of the present invention, the corresponding audio feature vector of each unit sample video can be saved as into TFRecord Format.The data of TFRecord format use binary format in storage, and occupancy disk space is smaller, speed when reading data Faster.
Step A4, using the corresponding audio feature vector of continuous at least two unit samples video as input, by the sample Target of the markup information of this video as output, is trained preset initial model, and the model that training is completed determines Model is handled for video.
If representing a Sample video using the corresponding feature vector of a unit sample video to be trained, due to one The duration of a unit sample video is shorter, and corresponding feature vector may not be able to accurately and comprehensively represent entire Sample video, Therefore, a sample is represented using the corresponding audio feature vector of continuous at least two unit samples video in the embodiment of the present invention Video is trained.
For a Sample video, the continuous at least two unit samples video that will be divided by the Sample video Corresponding audio feature vector is as input, using the markup information of the Sample video as the target of output, to preset initial Model is trained.
The process being trained to preset initial model may include step A41~A43:
Step A41 randomly selects continuous at least two unit samples video, by the corresponding sound of unit sample video of extraction The initial model is inputted after the splicing of frequency feature vector, obtains the prediction probability that the Sample video belongs to theme song classification.
Initial model refers to the model with classification feature not being trained also.Initial model can be to the audio of input Feature vector is analyzed, and whether output Sample video belongs to the prediction probability of theme song classification, but initial model output Prediction probability is usually inaccurate, therefore to be trained to initial model, to obtain accurate video processing model.
From the unit sample video divided by Sample video, continuous at least two unit samples view is randomly selected Frequently, the corresponding audio feature vector of the unit sample video of extraction is inputted into initial model, initial model exports Sample video category In the prediction probability of theme song classification.
For example, Sample video A is divided into unit sample video 1, unit sample as unit of 1s for Sample video A Video 2, unit sample video 3, unit sample video 4, unit sample video 5, totally 5 unit sample videos.From 5 unit samples Continuous 3 unit sample videos are randomly selected in this video, each unit sample video corresponds to the audio feature vector of 128 dimensions, The corresponding feature vector of 3 unit sample videos is spliced into the audio feature vector of 128*3=384 dimension, inputs initial model In.Initial model output Sample video A belongs to the prediction probability of theme song classification.
Step A42 belongs to the prediction probability of theme song classification and the mark of the Sample video according to the Sample video Information is infused, the corresponding penalty values of the Sample video are calculated.
The prediction probability that Sample video belongs to theme song classification is the reality output of initial model, the mark letter of Sample video Breath is the target of output, according to reality output penalty values corresponding with the Sample video that the target of output calculates extraction.Penalty values It can indicate that Sample video belongs to the extent of deviation of the markup information of the prediction probability of theme song classification and the Sample video of extraction.
In a kind of optional embodiment, the markup information of Sample video and Sample video can be belonged into theme song classification Prediction probability between difference as penalty values.For example, the prediction probability that Sample video belongs to theme song classification is 0.8, sample The markup information of this video is 1, then penalty values can be 0.2.
Step A43 determines that training is completed when the penalty values are less than setting loss threshold value.
Penalty values are smaller, and the robustness of model is better.It is preset in the embodiment of the present invention for measuring whether model instructs Practice the loss threshold value completed.If penalty values are less than setting loss threshold value, it may be said that bright Sample video belongs to theme song classification The extent of deviation of the markup information of prediction probability and Sample video is smaller, at this time it is considered that training is completed;If penalty values are big In or equal to setting loss threshold value, it may be said that bright Sample video belongs to the prediction probability of theme song classification and the mark of Sample video The extent of deviation for infusing information is larger, and the parameter of adjustable model, continues with next training sample and be trained at this time.
For the specific value of setting loss threshold value, those skilled in the art select any suitable value based on practical experience ?.For example it can be set to 0.1,0.2,0.3, etc..
The model that training is completed can be used as video processing model, for carrying out the inspection of theme song segment to video to be processed It surveys.
In the embodiment of the present invention, the process for detecting the theme song segment in the video to be processed be may comprise steps of B1~B5.
Step B1 extracts head segment and run-out segment from the video to be processed.
Theme song includes Presence of the Moment and piece caudal flexure, and Presence of the Moment is located at the beginning part of video to be processed, piece caudal flexure be located to Handle the ending of video.Therefore, in order to save the processing time, head segment and run-out can be extracted from video to be processed Segment only detects the head segment where Presence of the Moment and the run-out segment where piece caudal flexure.
In a kind of optional embodiment, piece can be extracted from the beginning part in video to be processed according to setting percentage The head segment of Duan Zuowei video to be processed extracts segment conduct from the ending in video to be processed according to setting percentage The run-out segment of video to be processed.For setting the specific value of percentage, those skilled in the art are arranged according to the actual situation Any suitable value, for example can be set and set percentage as 10%, 15%, 20%, etc..
The head segment and the run-out segment are divided into multiple units video to be processed respectively by step B2.
It is similar with above-mentioned steps A2, based on consistency of the theme song segment in audio in video to be processed, Ke Yitong It crosses audio feature vector and determines whether the bent classification that is the theme.
Head segment and run-out segment in video to be processed for one, are divided into multiple units video to be processed It is analyzed.It waits locating for example, multiple units can be divided into for head segment and run-out segment as unit of setting duration respectively Manage video.The setting duration being related in step B2 can be identical as the setting duration being related in above-mentioned steps A2.
Step B3, for each unit video to be processed, obtain the corresponding audio frequency characteristics of unit video to be processed to Amount.
Obtaining the corresponding audio feature vector of unit video to be processed may include: to generate the unit view to be processed The corresponding spectrogram of audio signal in frequency;The corresponding spectrogram input of audio signal in unit video to be processed is pre- If neural network model, the audio feature vector that the neural network model exports is determined as unit video to be processed Corresponding audio feature vector.
The corresponding spectrogram of audio signal generated in unit video to be processed may include: to wait locating to the unit The audio signal managed in video carries out sub-frame processing, obtains multiple audio signal frames;Each audio signal frame is carried out at adding window Reason and Fourier transformation processing, obtain the corresponding initial spectrum figure of audio signal in unit video to be processed;To described Initial spectrum figure carries out Meier conversion process and obtains Meier spectrogram, using the Meier spectrogram as unit view to be processed The corresponding spectrogram of audio signal in frequency.
Step B3 is similar with above-mentioned steps A3, and referring in particular to the associated description of step A3, the embodiment of the present invention is to this No longer it is discussed in detail.
Step B4, including comprising the video to be processed, the corresponding sound of continuous at least two units video to be processed The pre-generated video of frequency feature vector input handles model, determines that the unit waits for according to the output that the video handles model Whether processing video belongs to theme song classification.
If directlying adopt whether the corresponding feature vector of a unit video to be processed detects unit video to be processed Belong to theme song classification, since the duration of a unit video to be processed is shorter, corresponding feature vector may not be able to be accurate Ground determines whether unit video to be processed really belongs to theme song classification.Therefore using comprising current single in the embodiment of the present invention Including the video to be processed of position, the corresponding audio feature vector of continuous at least two units video to be processed determines current one Whether video to be processed belongs to theme song classification.
For a unit video to be processed, including comprising unit video to be processed, continuous at least two The corresponding audio feature vector of unit video to be processed inputs the video processing model of above-mentioned generation.Video handles model to audio After feature vector is analyzed, the prediction probability that unit video to be processed belongs to theme song classification is exported.It gets at video After the output for managing model, the unit video to be processed for comparing video processing model output belongs to the prediction probability of theme song classification Whether it is more than or equal to setting probability threshold value and determines that unit video to be processed belongs to master when if it is being more than or equal to Inscribe bent classification.
For setting the specific value of probability threshold value, those skilled in the art select any suitable value based on practical experience ?.For example it can be set to 0.7,0.8,0.9, etc..
For example, waiting locating comprising continuous 3 units including unit video 3 to be processed for video 3 to be processed for unit Manage video, can as unit of video 1 to be processed, unit video 2 to be processed, unit video 3 to be processed, or unit waits locating Manage video 2, unit video 3 to be processed, unit video 4 to be processed, or unit video 3 to be processed, unit view to be processed Frequently 4, unit video 5 to be processed.Wherein, unit video 2 to be processed, unit video 3 to be processed, unit video 4 to be processed this Scheme had both considered the audio feature vector before unit video 3 to be processed, it is also considered that arrived unit video 3 to be processed it Audio feature vector afterwards, therefore utilize unit video 2 to be processed, unit video 3 to be processed, unit video 4 to be processed this 3 The corresponding audio feature vector of continuous unit video to be processed, the corresponding result of the unit determined video 3 to be processed are compared It is more accurate in other two schemes.
The video 2 to be processed, single as unit of comprising continuous 3 units video to be processed including unit video 3 to be processed For position video 3 to be processed, unit video 4 to be processed, unit video 2 corresponding 128 to be processed is tieed up into audio feature vector, list Position video 3 corresponding 128 to be processed ties up audio feature vector and unit video 4 corresponding 128 to be processed tie up audio frequency characteristics to Amount, is spliced into the audio feature vector input video processing model of 128*3=384 dimension, and video processing model output unit waits locating Reason video 3 belongs to the prediction probability of theme song classification, if the prediction probability is greater than setting probability threshold value, it is determined that unit waits locating Reason video 3 belongs to theme song classification.
Step B5, by the unit video to be processed for belonging to theme song classification, continuous unit video to be processed is spelled It connects, obtains the Presence of the Moment segment and run-out knee-piece section in the video to be processed.
After determining whether each unit video to be processed belongs to theme song classification, if some unit view to be processed Frequency belongs to theme song classification, can determine that unit video to be processed belongs to the part in theme song segment, if some unit Video to be processed is not belonging to theme song classification, can determine the part that unit video to be processed belongs in not a theme knee-piece section. So if continuous multiple unit videos to be processed belong to theme song classification, then it will continuously belong to the unit of theme song classification Video to be processed is spliced, and the Presence of the Moment segment and run-out knee-piece section in video to be processed are obtained.
Theme song in video to be processed includes Presence of the Moment and piece caudal flexure, therefore slice can be determined from video to be processed Cephalic flexure segment and run-out knee-piece section.The unit video to be processed for belonging to theme song classification that will be divided by the head segment In, continuous unit video to be processed is spliced, and the Presence of the Moment segment in the video to be processed is obtained;It will be by the run-out What segment divided belongs in the unit video to be processed of theme song classification, and continuous unit video to be processed is spliced, Obtain the run-out knee-piece section in the video to be processed.
After the head segment of video to be processed and run-out segment are divided into multiple units video to be processed, it can also mark Remember the corresponding initial time of each unit video to be processed and end time.Therefore, it is waited for by the unit for belonging to theme song classification It handles in video, continuous unit video to be processed is spliced, and Presence of the Moment segment and piece in the video to be processed are obtained It, can be using the initial time of first unit video to be processed in the Presence of the Moment segment as the head after caudal flexure segment The initial time of knee-piece section, using the end time of the last one unit video to be processed in the Presence of the Moment segment as described The end time of cephalic flexure segment;Using the initial time of first unit video to be processed in the run-out knee-piece section as described The initial time of caudal flexure segment, using the end time of the last one unit video to be processed in the run-out knee-piece section as described in The end time of run-out knee-piece section.According to the initial time of theme song segment (head segment and run-out segment) and end time, Theme song segment is deleted from video to be processed.
Step 202, the video to be processed after deletion theme song segment is divided into multiple units video to be processed.
It, can be as unit of setting duration by the view to be processed after deletion theme song segment in a kind of optional embodiment Frequency is divided into multiple units video to be processed.The setting duration being related in step 202 can be with the setting that is related in step 101 Duration is identical.
Fig. 3 is a kind of schematic diagram of video processing procedure of the embodiment of the present invention.As shown in figure 3, the long video in Fig. 3 is For video to be processed, long video is divided to obtain multiple units video to be processed.
Step 203, while the first process and the second process are called, obtains each unit respectively using first process and waits for The corresponding scene characteristic vector of video is handled, obtains the corresponding sound of each unit video to be processed respectively using second process Frequency feature vector.
In the embodiment of the present invention, if handled using the same process multiple units video to be processed, processing effect Rate is lower.Therefore the first process and the processing of the second task parallelism can be set, while calling the first process and the second process, utilize First process obtains the corresponding scene characteristic vector of each unit video to be processed respectively, is obtained respectively using the second process each The corresponding audio feature vector of unit video to be processed, to improve treatment effeciency.First process and the second process can store In in process pool.
As shown in figure 3, including the first process process2 and the second process process3 in process pool in Fig. 3.
Scene characteristic vector can be according to image information acquisition, therefore a frame figure can be extracted from unit video to be processed Picture.By the information input neural network model of the frame image, after carrying out feature extraction inside neural network model, nerve Network model exports scene feature vector, which is the corresponding scene characteristic vector of unit sample video.
In a kind of optional embodiment, multimedia video handling implement FFmpeg can use from unit video to be processed Middle extraction image.For example, image size can be 255*255, the image of extraction saves as the formats such as jpg.
In a kind of optional embodiment, it is to be processed from unit to can use the acquisition of neural network model Resnet50 model The corresponding scene characteristic vector of the image information extracted in video.Resnet50 model is residual error network model, in residual error network In, do not allow for network to be directly fitted original mapping, but regression criterion maps.Resnet50 model may include convolutional layer, Full articulamentum etc., wherein convolutional layer can be used for extracting feature, and full articulamentum can be used for classify to the feature of extraction To corresponding feature vector.Therefore, the image extracted in unit sample video is inputted into Resnet50 model, is mentioned by convolutional layer The scene characteristic in image is taken, the scene characteristic of extraction is inputted full articulamentum again by convolutional layer, by full articulamentum to scene spy Sign is classified, and obtains the scene characteristic vector of 2048 dimensions, full articulamentum exports the scene characteristic vector.
As shown in figure 3, being used to obtain the nerve net of scene characteristic vector in Fig. 3 in the first process process2 with one Network model, neural network model are specifically as follows RGB Resnet50, are obtained using neural network model RGB Resnet50 single The corresponding 2048 dimension scene characteristic vector of position Sample video.
Audio feature vector can be obtained according to audio signal.For obtain the corresponding audio frequency characteristics of unit video to be processed to The detailed process of amount, is referred to the related description of above-mentioned steps A3 and step B3, and the embodiment of the present invention is no longer discussed in detail herein It states.
As shown in figure 3, being used to obtain the nerve net of audio feature vector in Fig. 3 in the second process process3 with one Network model, neural network model are specifically as follows Audio VGGish, are obtained using neural network model Audio VGGish single The corresponding 128 dimension audio feature vector of position video to be processed.
Step 204, scene pre-segmentation is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed Point, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed.
Scene pre-segmentation point can be determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed.It crosses Journey may include: to obtain the similarity of the corresponding scene characteristic vector of every two adjacent cells video to be processed;Similarity is small In the scene similarity threshold T of settingsceneTwo adjacent cells videos to be processed intermediate point as scene pre-segmentation point.Such as The similarity of the corresponding scene characteristic vector of two adjacent cells videos to be processed of fruit is less than the scene similarity threshold of setting, can To illustrate that the scene of two adjacent cells videos to be processed converts, therefore can be by two adjacent cells videos to be processed Intermediate point is as scene pre-segmentation point.
Audio pre-segmentation point can be determined according to the corresponding audio feature vector of every two adjacent cells video to be processed.It crosses Journey may include: to obtain the similarity of the corresponding audio feature vector of every two adjacent cells video to be processed;Similarity is small In the audio similarity threshold value T of settingaudioTwo adjacent cells videos to be processed intermediate point as audio pre-segmentation point.Such as The similarity of the corresponding audio feature vector of two adjacent cells videos to be processed of fruit is less than the audio similarity threshold value of setting, can To illustrate that the audio of two adjacent cells videos to be processed converts, therefore can be by two adjacent cells videos to be processed Intermediate point is as audio pre-segmentation point.
For the specific value of scene similarity threshold and audio similarity threshold value, those skilled in the art can be according to reality Border situation is arranged any suitable value, the embodiment of the present invention to this with no restriction.Such as can be set scene similarity threshold and Audio similarity threshold value is 0.1,0.2,0.3, etc..
In a kind of optional embodiment, the similarity of two feature vectors can be according between the two feature vectors COS distance is measured.COS distance is to use in vector space two vectorial angle cosine values as measuring two inter-individual differences Size measurement.COS distance between two feature vectors is bigger, and the similarity of two feature vectors is smaller.Therefore, such as Fruit measures similarity with COS distance, then can be with set scene distance threshold and audible distance threshold value.When two scene characteristics to When COS distance between amount is greater than scene distance threshold value, determine that the similarity of two scene characteristic vectors is less than scene similarity Threshold value;When the COS distance between two audio feature vectors is greater than audible distance threshold value, two audio feature vectors are determined Similarity be less than audio similarity threshold value.Such as scene distance threshold value can be set and audible distance threshold value be 0.7,0.8, 0.9, etc..
Assuming that two feature vectors are respectively x=(x1, x2..., xN)TWith y=(y1, y2..., yN)T, T expression transposition.Two COS distance between a feature vector are as follows:
Wherein, N indicates the dimension of feature vector, and d indicates COS distance.
It is, of course, also possible to using other way measure two feature vectors similarity, such as Euclidean distance, geneva away from From, manhatton distance etc., the embodiment of the present invention to this with no restriction.
Step 205, the scene pre-segmentation point that the of short duration transformation of scene is searched from the scene pre-segmentation point, deletes the field The scene pre-segmentation point of the of short duration transformation of scape, obtains remaining scene pre-segmentation point.
In view of there may be the situations of transformation in scene of short duration (several seconds to more than ten seconds or so) in video to be processed, such as Memory scene is transformed to by current scene, by transforming to the situations such as current scene after of short duration memory scene again.According to phase Similarity between the corresponding scene characteristic vector of adjacent unit video to be processed, the initial position of the of short duration scene and stop bits Scene pre-segmentation point can be also confirmed as by setting, but may be actually the plot of Same Scene before and after the of short duration scene, by upper State the scene pre-segmentation point that scene pre-segmentation point caused by the situation of the of short duration transformation of scene is properly termed as the of short duration transformation of scene.For Above situation, the scene pre-segmentation point of the of short duration transformation of scene is searched in the embodiment of the present invention from scene pre-segmentation point, and is deleted The scene pre-segmentation point of the of short duration transformation of scene guarantees plot after segmentation to further increase the accuracy of scene pre-segmentation point Integrality.
In a kind of optional embodiment, the scene pre-segmentation point of the of short duration transformation of scene is searched from scene pre-segmentation point, It may include step C1~C5.
Step C1 obtains the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point, and looks into Duration is looked for be less than two adjacent scene pre-segmentation points of the scene change threshold value of setting.
Duration between two adjacent scene pre-segmentation points as caused by the of short duration transformation of scene is smaller, therefore can search Duration is less than the scene change threshold value T of settingshortTwo adjacent scene pre-segmentation points, it is subsequent to determine that the two are adjacent again Whether scene pre-segmentation point is caused by the of short duration transformation of scene.For the specific value of scene change threshold value, those skilled in the art Any suitable value can be set according to the actual situation, the embodiment of the present invention to this with no restriction.For example scene change can be set Changing threshold value is 7s, 8s, 9s etc..
Step C2 is obtained in the two adjacent scene pre-segmentation points found, before previous scene pre-segmentation point At least one unit video to be processed after at least one unit video to be processed and the latter scene pre-segmentation point.
Step C3 is calculated in the unit video to be processed of acquisition, the corresponding scene of every two adjacent cells video to be processed The similarity of feature vector.
Step C4 calculates average value according to each similarity.
Step C5, when the average value of the similarity is greater than preset scene similarity threshold, by find two At least one of adjacent scene pre-segmentation point is determined as the scene pre-segmentation point of the of short duration transformation of scene.
For example, duration is less than the scene change of setting between two adjacent scene pre-segmentation points 2 and scene pre-segmentation point 3 Threshold value, 5 units after 5 units video to be processed and scene pre-segmentation point 3 before obtaining scene pre-segmentation point 2 wait locating Video is managed, and every two adjacent unit video to be processed in 5 unit videos to be processed before obtaining scene pre-segmentation point 2 Every two phase in the similarity of corresponding scene characteristic vector and 5 unit videos to be processed after scene pre-segmentation point 3 The similarity of the adjacent corresponding scene characteristic vector of unit video to be processed, calculates the average value of whole similarities of acquisition, such as Fruit average value is greater than scene similarity threshold, illustrates the scene before scene pre-segmentation point 2 and the field after scene pre-segmentation point 3 Scape is similar, thus may determine that scene pre-segmentation point 2 and scene pre-segmentation point 3 are caused by the of short duration transformation of scene.This kind In the case of, any one can be deleted from scene pre-segmentation point 2 and scene pre-segmentation point 3 or two are deleted, so that field Of short duration scene between scape pre-segmentation point 2 and scene pre-segmentation point 3 in conjunction with previous video segment, or with the latter video Segment combines, or in conjunction with former and later two video clips.
Step 206, the scene pre-segmentation point that non-scene itself converts is searched from the scene pre-segmentation point, described in deletion The scene pre-segmentation point that non-scene itself converts, obtains remaining scene pre-segmentation point.
In view of there may be lead to adjacent cells video pair to be processed due to external factor influence in video to be processed The similarity for the scene characteristic vector answered is smaller, so that it is determined that being scene pre-segmentation point.Such as due to illumination effect or distance Scape transformation influences determining scene pre-segmentation point, but actually scene itself does not convert.Therefore such scene pre-segmentation Point is caused by being influenced as external factor, and not caused by scene transformation itself, it should not be used as segmentation foundation, such scene is pre- Cut-point is properly termed as the scene pre-segmentation point that non-scene itself converts.For above situation, from scene in the embodiment of the present invention The scene pre-segmentation point that non-scene itself converts is searched in pre-segmentation point, and deletes non-scene itself the scene pre-segmentation converted Point guarantees the integrality of plot after segmentation to further increase the accuracy of scene pre-segmentation point.
In a kind of optional embodiment, the scene pre-segmentation that non-scene itself converts is searched from scene pre-segmentation point Point may include step D1~D5.
Step D1, for each scene pre-segmentation point, at least one unit before obtaining current scene pre-segmentation point is waited for Handle video and at least one unit video to be processed later.
Step D2 determines in the unit video to be processed obtained, the corresponding histogram feature of each unit video to be processed Vector.
Histogram feature can show the distribution situation of tone in an image, disclose each gray scale in image The quantity that lower pixel occurs can tentatively judge the exposure status of image, histogram according to the image aspects that these numerical value are drawn Figure is the best feedback of image exposure situation.Histogram feature vector can be according to image information acquisition, therefore can wait for from unit A frame image is extracted in processing video, and the histogram feature vector for obtaining the frame image is corresponding as unit video to be processed Histogram feature vector.
In a kind of optional embodiment, it can use the acquisition of Lab color space model and mentioned from unit video to be processed The corresponding histogram feature vector of the image information taken.Lab color space model is the feeling based on people to color, is described The display mode of color, and non-display equipment generate color required for specific colorant quantity, therefore Lab be also regarded as equipment without The color model of pass.Lab colour model tri- elements of a, b by brightness (L) and in relation to color form.A is indicated from carmetta To the range of green, b indicates the range by yellow to blue.All colors can be made of this 3 value interaction variations.It will be single The image input Lab color space model extracted in the Sample video of position, in the histogram feature of model internal extraction image, and it is defeated The histogram feature vector of 4096 dimensions out.
It is similar with above-mentioned steps 203, in order to improve treatment effeciency, multiple units video to be processed can be obtained in division Afterwards, the corresponding histogram feature vector of each unit video to be processed is obtained using an independent third process.Such as Fig. 3 institute Show, in the process pool in Fig. 3 include third process process1, process1 in have one for obtain histogram feature to The color space model of amount, color space model are specifically as follows Lab Histogram, are obtained using Lab Histogram single The corresponding 4096 dimension histogram feature vector of position Sample video.
Step D3 is calculated in the unit video to be processed of acquisition, the corresponding histogram of every two adjacent cells video to be processed The similarity of figure feature vector.
Step D4 calculates average value according to each similarity.
Step D5 will be described current when the average value of the similarity is greater than preset histogram similarity threshold Scene pre-segmentation point is determined as the scene pre-segmentation point that non-scene itself converts.
For the specific value of histogram similarity threshold, those skilled in the art can be arranged arbitrarily according to the actual situation Applicable value, the embodiment of the present invention to this with no restriction.For example it is 0.2,0.3,0.4 that histogram similarity threshold, which can be set, Deng.
For example, for a scene pre-segmentation point 4, obtain 5 unit videos to be processed before scene pre-segmentation point 4 and 5 unit videos to be processed later obtain this 10 corresponding histogram feature vectors of unit video to be processed, and obtain The similarity of the corresponding histogram feature vector of every two adjacent cells video to be processed in this 10 unit videos to be processed is taken, The average value of the whole similarities obtained is calculated, if average value is greater than histogram similarity threshold, illustrates scene pre-segmentation point 4 The scene of front and back is similar, thus may determine that scene pre-segmentation point 4 is caused by non-scene transformation itself.This kind of situation Under, scene pre-segmentation point 4 can be deleted, so that 4 former and later two video clips of scene pre-segmentation point combine.
Step 207, the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point is obtained, and is looked into Duration is looked for be less than two adjacent scene pre-segmentation points of the minimum duration threshold value of setting, the two adjacent scenes that will be found At least one of pre-segmentation point is deleted, and remaining scene pre-segmentation point is obtained.
In view of there may be the scenes that duration is shorter, such as a scene only to continue more than ten seconds in video to be processed, According to the similarity between the corresponding scene characteristic vector of adjacent cells video to be processed, the start bit of the shorter scene of the duration Scene pre-segmentation point can be also confirmed as with end position by setting, but be actually a video by the shorter scene cut of the duration Segment has little significance.
For above situation, two phases that duration in scene pre-segmentation point set is less than the minimum duration threshold value of setting are searched Adjacent scene pre-segmentation point deletes at least one of two adjacent scene pre-segmentation points found.When for minimum Any suitable value, the embodiment of the present invention can be arranged in the specific value of long threshold value, those skilled in the art according to the actual situation With no restriction to this.For example it is 10s, 15s, 20s etc. that minimum duration threshold value, which can be set,.
For example, when the duration between two adjacent scene pre-segmentation points 5 and scene pre-segmentation point 6 is less than the minimum of setting Long threshold value can delete any one from scene pre-segmentation point 5 and scene pre-segmentation point 6 or two are deleted, so that field Scene between scape pre-segmentation point 5 and scene pre-segmentation point 6 in conjunction with previous video segment, or with the latter video clip In conjunction with, or in conjunction with former and later two video clips.
Above-mentioned steps 205, step 206 and step 207 are the optimization to scene pre-segmentation point, can in the embodiment of the present invention With selection at least one step therein is executed, conditioning step does not execute sequence when being executed.
Step 208, scene is carried out to the video to be processed after deleting theme song segment according to remaining scene pre-segmentation point The video clip that duration is more than the maximum time threshold value of setting is searched in segmentation from the video clip that scene cut obtains, as Video clip to be split.
Scene pre-segmentation point is deleted by least one step in above-mentioned steps 205, step 206 and step 207 After removing, according to remaining each scene pre-segmentation point, scene cut is carried out to the video to be processed after deletion theme song segment, is obtained To multiple video clips.
Step 209, it is searched from the audio pre-segmentation point nearest with the intermediate point of the frequency segment to be split distance Audio pre-segmentation point carries out audio segmentation to the video clip to be split according to the audio pre-segmentation point found.
From the video clip that scene cut in step 208 obtains, the maximum time threshold value that duration is more than setting is searched Video clip, as video clip to be split.It is pre- according to audio based on the angle of audio transformation for video clip to be split Cut-point carries out further audio segmentation to it.
It is searched from the audio pre-segmentation point pre- apart from nearest audio with the intermediate point of the video clip to be split Cut-point.Wherein, audio pre-segmentation point and the intermediate point of video clip to be split distance recently, can be divided in advance by calculating audio The first distance of the starting point of cutpoint and video clip to be split and the end of audio pre-segmentation point and video clip to be split The second distance of point, and calculates the ratio of first distance and second distance, audio segmentation point of the ratio closest to 1 be with to Divide the intermediate point of video clip apart from nearest audio pre-segmentation point.
As shown in figure 3, by the curve graph in Fig. 3, (dotted line indicates that audio feature vector, solid line indicate that scene is special in curve graph Levy vector) it is found that the scene determined according to the COS distance between the scene characteristic vector of two neighboring unit video to be processed After pre-segmentation point is split, segment 2 and segment 3 are a video clips, such as the video of the centre shown at long video in Fig. 3 Segment, the duration of the video clip are greater than 7min, thus again according to the audio frequency characteristics of two neighboring unit video to be processed to The audio pre-segmentation point that COS distance between amount determines is split the video clip to obtain segment 2 and segment 3.
Step 210, judge in video clip that audio segmentation obtains, if there are duration be more than the maximum time threshold value Video clip.
After carrying out audio segmentation to video clip to be split according to the audio pre-segmentation point found, audio point is judged again In the video clip cut, if there is also the video clips that duration is more than maximum time threshold value.It is more than if there is duration The duration is more than then the video clip of the maximum time threshold value as wait divide by the video clip of the maximum time threshold value Video clip is cut, and return step 209 continues to carry out audio segmentation to it according to audio segmentation point.It is more than if there is no duration The video clip of the maximum time threshold value determines that segmentation terminates, executes step 211.
Step 211, segmentation terminates, and obtains the video clip that segmentation is completed.
The video clip that segmentation is completed is to the video clip after video demolition to be processed.
In the embodiment of the present invention, in such a way that scene pre-segmentation point and audio pre-segmentation point combine to video to be processed into Row segmentation, avoids the longer problem of video clip duration obtained after only dividing by scene pre-segmentation point;By to scene Pre-segmentation point set is filtered, and reduces the influence of the of short duration transformation of scene, and reduces light change and far and near scape transformation Influence, guarantee segmentation after plot integrality;Processing effect is improved using multi-process parallel processing by process pool technology Rate.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.
Referring to Fig. 4, a kind of structural block diagram of video process apparatus of the embodiment of the present invention is shown.
Video process apparatus of the embodiment of the present invention comprises the following modules:
The video to be processed is divided into multiple units view to be processed for obtaining video to be processed by division module 401 Frequently.
Module 402 is obtained, it is special for obtaining the corresponding scene characteristic vector sum audio of each unit video to be processed respectively Levy vector.
Determining module 403, for determining field according to the corresponding scene characteristic vector of every two adjacent cells video to be processed Scape pre-segmentation point, and audio pre-segmentation is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed Point.
Scene cut module 404, for carrying out scene cut to the video to be processed according to the scene pre-segmentation point.
First searching module 405, for searching the maximum that duration is more than setting from the video clip that scene cut obtains The video clip of duration threshold value, as video clip to be split.
Audio segmentation module 406, for carrying out audio segmentation to video clip to be split according to the audio pre-segmentation point, Obtain the video clip that segmentation is completed.
In the angle based on scene change in the embodiment of the present invention, field is carried out to video to be processed according to scene pre-segmentation point After scape segmentation, it is based further on the angle of audio transformation to the longer video clip of duration, carries out sound according to audio pre-segmentation point Frequency division is cut, so that avoiding being based only upon scene change carries out the too long problem of the video clip after demolition, improves the accuracy of demolition, Preferably meet user demand.
Referring to Fig. 5, the structural block diagram of another video process apparatus of the embodiment of the present invention is shown.
Video process apparatus of the embodiment of the present invention comprises the following modules:
The video to be processed is divided into multiple units view to be processed for obtaining video to be processed by division module 501 Frequently.
Module 502 is obtained, it is special for obtaining the corresponding scene characteristic vector sum audio of each unit video to be processed respectively Levy vector.
Determining module 503, for determining field according to the corresponding scene characteristic vector of every two adjacent cells video to be processed Scape pre-segmentation point, and audio pre-segmentation is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed Point.
Scene cut module 504, for carrying out scene cut to the video to be processed according to the scene pre-segmentation point.
First searching module 505, for searching the maximum that duration is more than setting from the video clip that scene cut obtains The video clip of duration threshold value, as video clip to be split.
Audio segmentation module 506, for carrying out audio to the video clip to be split according to the audio pre-segmentation point Segmentation obtains the video clip that segmentation is completed.
In a kind of optional embodiment, the audio segmentation module 506 includes: audio segmentation point searching unit, is used for The intermediate point with the video clip to be split is searched from the audio pre-segmentation point apart from nearest audio pre-segmentation point;Piece Section cutting unit, for carrying out audio segmentation to the video clip to be split according to the audio pre-segmentation point found;Segment Determination unit, for judging in video clip that audio segmentation obtains, if there are duration be more than the maximum time threshold value Video clip;When there are the video clip that duration is more than the maximum time threshold value, when being more than the maximum for the duration The video clip of long threshold value calls the cut-point searching unit as video clip to be split.
In a kind of optional embodiment, described device further include: the second searching module 507, for pre- from the scene The scene pre-segmentation point of the of short duration transformation of scene is searched in cut-point;First removing module 508, for deleting the of short duration change of the scene The scene pre-segmentation point changed, obtains remaining scene pre-segmentation point;The scene cut module 504 is specifically used for according to described Remaining scene pre-segmentation point carries out scene cut to the video to be processed.
In a kind of optional embodiment, second searching module 507 includes: scene cut point searching unit, is used for The duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point is obtained, and searches duration and is less than setting Two adjacent scene pre-segmentation points of scene change threshold value;First video acquisition unit, for obtaining find two phases In adjacent scene pre-segmentation point, at least one unit video to be processed and the latter scene before previous scene pre-segmentation point are pre- At least one unit video to be processed after cut-point;First similarity calculated waits locating for calculating the unit obtained It manages in video, the similarity of the corresponding scene characteristic vector of every two adjacent cells video to be processed;First mean value calculation list Member, for calculating average value according to each similarity;First cut-point determination unit is big for the average value in the similarity When preset scene similarity threshold, at least one of two adjacent scene pre-segmentation points found are determined as field The scene pre-segmentation point of the of short duration transformation of scape.
In a kind of optional embodiment, described device further include: third searching module 509, for pre- from the scene The scene pre-segmentation point that non-scene itself converts is searched in cut-point;Second removing module 510, for deleting the non-scene sheet The scene pre-segmentation point of body transformation, obtains remaining scene pre-segmentation point;The scene cut module 504, be specifically used for according to The remaining scene pre-segmentation point carries out scene cut to the video to be processed.
In a kind of optional embodiment, the third searching module 509 includes: the second video acquisition unit, is used for needle To each scene pre-segmentation point in the scene pre-segmentation point, at least one unit before current scene pre-segmentation point is obtained Video to be processed and at least one unit video to be processed later;Histogram determination unit, for determining that the unit obtained waits for It handles in video, the corresponding histogram feature vector of each unit video to be processed;Second similarity calculated, for calculating In the unit of acquisition video to be processed, the similarity of the corresponding histogram feature vector of every two adjacent cells video to be processed; Second average calculation unit, for calculating average value according to each similarity;Second cut-point determination unit, for described When the average value of similarity is greater than preset histogram similarity threshold, the current scene pre-segmentation point is determined as non-scene The scene pre-segmentation point of transformation itself.
In a kind of optional embodiment, described device further include: the 4th searching module 511, for obtaining the scene Duration in pre-segmentation point between every two adjacent scene pre-segmentation point, and search the minimum duration threshold value that duration is less than setting Two adjacent scene pre-segmentation points;Third removing module 512, two adjacent scene pre-segmentation points for will find At least one of delete, obtain remaining scene pre-segmentation point;The scene cut module 504 is specifically used for according to described Remaining scene pre-segmentation point carries out scene cut to the video to be processed.
In a kind of optional embodiment, described device further include: detection module 513, for detecting the view to be processed Theme song segment in frequency;4th removing module 514, for deleting the theme song segment from the video to be processed;Institute Division module 501 is stated, waits locating specifically for the video to be processed for deleting after the theme song segment is divided into multiple units Manage video.
In a kind of optional embodiment, the acquisition module 502 includes: call unit, for simultaneously calling first into Journey and the second process;Scene characteristic acquiring unit, for obtaining each unit video to be processed respectively using first process Corresponding scene characteristic vector;Audio frequency characteristics acquiring unit waits locating for obtaining each unit respectively using second process Manage the corresponding audio feature vector of video.
In the embodiment of the present invention, in such a way that scene pre-segmentation point and audio pre-segmentation point combine to video to be processed into Row segmentation, avoids the longer problem of video clip duration obtained after only dividing by scene pre-segmentation point;By to scene Pre-segmentation point set is filtered, and reduces the influence of the of short duration transformation of scene, and reduces light change and far and near scape transformation Influence, guarantee segmentation after plot integrality;Processing effect is improved using multi-process parallel processing by process pool technology Rate.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.
In an embodiment of the present invention, a kind of electronic equipment for video processing is additionally provided.The electronic equipment can be with Including one or more processors, and for the memory of storage processor executable instruction, executable instruction for example using Program.Processor is configured as executing above-mentioned method for processing video frequency.
In an embodiment of the present invention, a kind of non-transitorycomputer readable storage medium including instruction is additionally provided, Memory for example including instruction, above-metioned instruction can be executed by the processor of electronic equipment, to complete above-mentioned video processing side Method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, magnetic Band, floppy disk and optical data storage devices etc..
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram The function of being specified in frame or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart And/or in one or more blocks of the block diagram specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.
Above to a kind of method for processing video frequency provided by the present invention, device, electronic equipment and storage medium, carry out in detail Thin to introduce, used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation Thought of the invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims (20)

1. a kind of method for processing video frequency, which is characterized in that the described method includes:
Video to be processed is obtained, the video to be processed is divided into multiple units video to be processed;
The corresponding scene characteristic vector sum audio feature vector of each unit video to be processed is obtained respectively;
Scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed, and according to every The corresponding audio feature vector of two adjacent cells videos to be processed determines audio pre-segmentation point;
Scene cut, the video clip obtained from scene cut are carried out to the video to be processed according to the scene pre-segmentation point It is middle to search the video clip that duration is more than the maximum time threshold value of setting, as video clip to be split;
Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point, obtains the piece of video that segmentation is completed Section.
2. the method according to claim 1, wherein it is described according to the audio pre-segmentation point to described to be split Video clip carries out audio segmentation, comprising:
The intermediate point with the video clip to be split is searched from the audio pre-segmentation point apart from nearest audio pre-segmentation Point;
Audio segmentation is carried out to the video clip to be split according to the audio pre-segmentation point found;
Judge in video clip that audio segmentation obtains, if there are the video clips that duration is more than the maximum time threshold value;
It is more than the maximum time threshold value by the duration when there are the video clip that duration is more than the maximum time threshold value Video clip as video clip to be split, return execute it is described searched from the audio pre-segmentation point with it is described to be split The step of intermediate point of video clip is apart from nearest audio pre-segmentation point.
3. the method according to claim 1, wherein
It is described scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed after, also Include:
The scene pre-segmentation point of the of short duration transformation of scene is searched from the scene pre-segmentation point;
The scene pre-segmentation point for deleting the of short duration transformation of the scene, obtains remaining scene pre-segmentation point;
It is described that scene cut is carried out to the video to be processed according to the scene pre-segmentation point, comprising:
Scene cut is carried out to the video to be processed according to the remaining scene pre-segmentation point.
4. according to the method described in claim 3, it is characterized in that, the lookup scene from the scene pre-segmentation point is of short duration The scene pre-segmentation point of transformation, comprising:
Obtain the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point, and search duration be less than set Two adjacent scene pre-segmentation points of fixed scene change threshold value;
It obtains in the two adjacent scene pre-segmentation points found, at least one unit before previous scene pre-segmentation point At least one unit video to be processed after video to be processed and the latter scene pre-segmentation point;
It calculates in the unit video to be processed obtained, the phase of the corresponding scene characteristic vector of every two adjacent cells video to be processed Like degree;
According to each similarity, average value is calculated;
It is when the average value of the similarity is greater than preset scene similarity threshold, the two adjacent scenes found are pre- At least one of cut-point is determined as the scene pre-segmentation point of the of short duration transformation of scene.
5. the method according to claim 1, wherein
It is described scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed after, also Include:
The scene pre-segmentation point that non-scene itself converts is searched from the scene pre-segmentation point;
The scene pre-segmentation point that the non-scene itself converts is deleted, remaining scene pre-segmentation point is obtained;
It is described that scene cut is carried out to the video to be processed according to the scene pre-segmentation point, comprising:
Scene cut is carried out to the video to be processed according to the remaining scene pre-segmentation point.
6. according to the method described in claim 5, it is characterized in that, described search non-scene sheet from the scene pre-segmentation point The scene pre-segmentation point of body transformation, comprising:
For each scene pre-segmentation point, at least one unit video to be processed before current scene pre-segmentation point and it are obtained At least one unit video to be processed afterwards;
It determines in the unit video to be processed obtained, the corresponding histogram feature vector of each unit video to be processed;
It calculates in the unit video to be processed obtained, the corresponding histogram feature vector of every two adjacent cells video to be processed Similarity;
According to each similarity, average value is calculated;
It is when the average value of the similarity is greater than preset histogram similarity threshold, the current scene pre-segmentation point is true It is set to the scene pre-segmentation point that non-scene itself converts.
7. the method according to claim 1, wherein
It is described scene pre-segmentation point is determined according to the corresponding scene characteristic vector of every two adjacent cells video to be processed after, also Include:
Obtain the duration in the scene pre-segmentation point between every two adjacent scene pre-segmentation point, and search duration be less than set Two adjacent scene pre-segmentation points of fixed minimum duration threshold value;
At least one of two adjacent scene pre-segmentation points found are deleted, remaining scene pre-segmentation point is obtained;
It is described that scene cut is carried out to the video to be processed according to the scene pre-segmentation point, comprising:
Scene cut is carried out to the video to be processed according to the remaining scene pre-segmentation point.
8. the method according to claim 1, wherein
It is described that the video to be processed is divided into before multiple units video to be processed, further includes:
The theme song segment in the video to be processed is detected, and deletes the theme song segment from the video to be processed;
It is described that the video to be processed is divided into multiple units video to be processed, comprising:
The video to be processed deleted after the theme song segment is divided into multiple units video to be processed.
9. the method according to claim 1, wherein described, to obtain each unit video to be processed respectively corresponding Scene characteristic vector sum audio feature vector, comprising:
The first process and the second process are called simultaneously;
The corresponding scene characteristic vector of each unit video to be processed is obtained respectively using first process;
The corresponding audio feature vector of each unit video to be processed is obtained respectively using second process.
10. a kind of video process apparatus, which is characterized in that described device includes:
The video to be processed is divided into multiple units video to be processed for obtaining video to be processed by division module;
Module is obtained, for obtaining the corresponding scene characteristic vector sum audio feature vector of each unit video to be processed respectively;
Determining module, for determining scene pre-segmentation according to the corresponding scene characteristic vector of every two adjacent cells video to be processed Point, and audio pre-segmentation point is determined according to the corresponding audio feature vector of every two adjacent cells video to be processed;
Scene cut module, for carrying out scene cut to the video to be processed according to the scene pre-segmentation point;
First searching module, for searching the maximum time threshold value that duration is more than setting from the video clip that scene cut obtains Video clip, as video clip to be split;
Audio segmentation module is obtained for carrying out audio segmentation to the video clip to be split according to the audio pre-segmentation point The video clip completed to segmentation.
11. device according to claim 10, which is characterized in that the audio segmentation module includes:
Audio segmentation point searching unit, for searching the centre with the video clip to be split from the audio pre-segmentation point Point is apart from nearest audio pre-segmentation point;
Fragment segmentation unit, for carrying out audio minute to the video clip to be split according to the audio pre-segmentation point found It cuts;
Segment determination unit, for judging in video clip that audio segmentation obtains, if when there are duration being more than described maximum The video clip of long threshold value;It is more than institute by the duration when there are the video clip that duration is more than the maximum time threshold value The video clip of maximum time threshold value is stated as video clip to be split, and calls the cut-point searching unit.
12. device according to claim 10, which is characterized in that described device further include:
Second searching module, for searching the scene pre-segmentation point of the of short duration transformation of scene from the scene pre-segmentation point;
First removing module obtains remaining scene pre-segmentation for deleting the scene pre-segmentation point of the of short duration transformation of the scene Point;
The scene cut module is specifically used for carrying out field to the video to be processed according to the remaining scene pre-segmentation point Scape segmentation.
13. device according to claim 12, which is characterized in that second searching module includes:
Scene cut point searching unit, for obtaining in the scene pre-segmentation point between every two adjacent scene pre-segmentation point Duration, and search duration be less than setting scene change threshold value two adjacent scene pre-segmentation points;
First video acquisition unit, for obtaining in find two adjacent scene pre-segmentation points, previous scene pre-segmentation At least one unit view to be processed after at least one unit video to be processed and the latter scene pre-segmentation point before point Frequently;
First similarity calculated, for calculating in the unit video to be processed obtained, every two adjacent cells view to be processed Frequently the similarity of corresponding scene characteristic vector;
First average calculation unit, for calculating average value according to each similarity;
First cut-point determination unit will when being greater than preset scene similarity threshold for the average value in the similarity At least one of two adjacent scene pre-segmentation points found are determined as the scene pre-segmentation point of the of short duration transformation of scene.
14. device according to claim 10, which is characterized in that described device further include:
Third searching module, the scene pre-segmentation point converted for searching non-scene itself from the scene pre-segmentation point;
Second removing module, the scene pre-segmentation point converted for deleting the non-scene itself, obtains remaining scene and divides in advance Cutpoint;
The scene cut module is specifically used for carrying out field to the video to be processed according to the remaining scene pre-segmentation point Scape segmentation.
15. device according to claim 14, which is characterized in that the third searching module includes:
Second video acquisition unit, for for each scene pre-segmentation point in the scene pre-segmentation point, front court to be worked as in acquisition At least one unit video to be processed before scape pre-segmentation point and at least one unit video to be processed later;
Histogram determination unit, for determining in the unit video to be processed obtained, each unit video to be processed is corresponding straight Square figure feature vector;
Second similarity calculated, for calculating in the unit video to be processed obtained, every two adjacent cells view to be processed Frequently the similarity of corresponding histogram feature vector;
Second average calculation unit, for calculating average value according to each similarity;
Second cut-point determination unit, when being greater than preset histogram similarity threshold for the average value in the similarity, The current scene pre-segmentation point is determined as the scene pre-segmentation point that non-scene itself converts.
16. device according to claim 10, which is characterized in that described device further include:
4th searching module, for obtain in the scene pre-segmentation point between every two adjacent scene pre-segmentation point when It is long, and search two adjacent scene pre-segmentation points that duration is less than the minimum duration threshold value of setting;
Third removing module is remained for deleting at least one of two adjacent scene pre-segmentation points found Remaining scene pre-segmentation point;
The scene cut module is specifically used for carrying out field to the video to be processed according to the remaining scene pre-segmentation point Scape segmentation.
17. device according to claim 10, which is characterized in that described device further include:
Detection module, for detecting the theme song segment in the video to be processed;
4th removing module, for deleting the theme song segment from the video to be processed;
The division module is waited for specifically for the video to be processed for deleting after the theme song segment is divided into multiple units Handle video.
18. device according to claim 10, which is characterized in that the acquisition module includes:
Call unit is used for while calling the first process and the second process;
Scene characteristic acquiring unit, for obtaining the corresponding scene of each unit video to be processed respectively using first process Feature vector;
Audio frequency characteristics acquiring unit, for obtaining the corresponding audio of each unit video to be processed respectively using second process Feature vector.
19. a kind of electronic equipment characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to executing such as the described in any item method for processing video frequency of claim 1-9.
20. a kind of non-transitorycomputer readable storage medium, which is characterized in that when the instruction in the storage medium is by electronics When the processor of equipment executes, so that electronic equipment is able to carry out such as the described in any item method for processing video frequency of claim 1-9.
CN201910472453.7A 2019-05-31 2019-05-31 Video processing method and device, electronic equipment and storage medium Active CN110213670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910472453.7A CN110213670B (en) 2019-05-31 2019-05-31 Video processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910472453.7A CN110213670B (en) 2019-05-31 2019-05-31 Video processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110213670A true CN110213670A (en) 2019-09-06
CN110213670B CN110213670B (en) 2022-01-07

Family

ID=67790245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910472453.7A Active CN110213670B (en) 2019-05-31 2019-05-31 Video processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110213670B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400615A (en) * 2020-03-19 2020-07-10 腾讯科技(深圳)有限公司 Resource recommendation method, device, equipment and storage medium
CN111601162A (en) * 2020-06-08 2020-08-28 北京世纪好未来教育科技有限公司 Video segmentation method and device and computer storage medium
CN111641869A (en) * 2020-06-04 2020-09-08 虎博网络技术(上海)有限公司 Video split mirror method, video split mirror device, electronic equipment and computer readable storage medium
CN112100436A (en) * 2020-09-29 2020-12-18 新东方教育科技集团有限公司 Dance segment recognition method, dance segment recognition device and storage medium
CN113259761A (en) * 2020-02-13 2021-08-13 华为技术有限公司 Video processing method, video processing apparatus, and storage medium
CN113438500A (en) * 2020-03-23 2021-09-24 阿里巴巴集团控股有限公司 Video processing method and device, electronic equipment and computer storage medium
CN113435328A (en) * 2021-06-25 2021-09-24 上海众源网络有限公司 Video clip processing method and device, electronic equipment and readable storage medium
CN113569704A (en) * 2021-07-23 2021-10-29 上海明略人工智能(集团)有限公司 Division point judgment method, system, storage medium and electronic device
CN113569703A (en) * 2021-07-23 2021-10-29 上海明略人工智能(集团)有限公司 Method and system for judging true segmentation point, storage medium and electronic equipment
CN113569706A (en) * 2021-07-23 2021-10-29 上海明略人工智能(集团)有限公司 Video scene segmentation point judgment method and system, storage medium and electronic equipment
CN113810782A (en) * 2020-06-12 2021-12-17 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN113992970A (en) * 2020-07-27 2022-01-28 阿里巴巴集团控股有限公司 Video data processing method and device, electronic equipment and computer storage medium
CN114222159A (en) * 2021-12-01 2022-03-22 北京奇艺世纪科技有限公司 Method and system for determining video scene change point and generating video clip
CN114299074A (en) * 2021-12-14 2022-04-08 北京达佳互联信息技术有限公司 Video segmentation method, device, equipment and storage medium
CN115086759A (en) * 2022-05-13 2022-09-20 北京达佳互联信息技术有限公司 Video processing method, video processing device, computer equipment and medium
CN116546264A (en) * 2023-04-10 2023-08-04 北京度友信息技术有限公司 Video processing method and device, electronic equipment and storage medium
CN117499739A (en) * 2024-01-02 2024-02-02 腾讯科技(深圳)有限公司 Frame rate control method, device, computer equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1938714A (en) * 2004-03-23 2007-03-28 英国电讯有限公司 Method and system for semantically segmenting scenes of a video sequence
CN102170528A (en) * 2011-03-25 2011-08-31 天脉聚源(北京)传媒科技有限公司 Segmentation method of news program
CN102685398A (en) * 2011-09-06 2012-09-19 天脉聚源(北京)传媒科技有限公司 News video scene generating method
CN102890778A (en) * 2011-07-21 2013-01-23 北京新岸线网络技术有限公司 Content-based video detection method and device
US20140150043A1 (en) * 2012-11-23 2014-05-29 Institute For Information Industry Scene fragment transmitting system, scene fragment transmitting method and recording medium
CN104519401A (en) * 2013-09-30 2015-04-15 华为技术有限公司 Video division point acquiring method and equipment
CN106021496A (en) * 2016-05-19 2016-10-12 海信集团有限公司 Video search method and video search device
CN108307229A (en) * 2018-02-02 2018-07-20 新华智云科技有限公司 A kind of processing method and equipment of video-audio data
CN108376147A (en) * 2018-01-24 2018-08-07 北京览科技有限公司 A kind of method and apparatus for obtaining the evaluation result information of video
CN109344780A (en) * 2018-10-11 2019-02-15 上海极链网络科技有限公司 A kind of multi-modal video scene dividing method based on sound and vision

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1938714A (en) * 2004-03-23 2007-03-28 英国电讯有限公司 Method and system for semantically segmenting scenes of a video sequence
CN102170528A (en) * 2011-03-25 2011-08-31 天脉聚源(北京)传媒科技有限公司 Segmentation method of news program
CN102890778A (en) * 2011-07-21 2013-01-23 北京新岸线网络技术有限公司 Content-based video detection method and device
CN102685398A (en) * 2011-09-06 2012-09-19 天脉聚源(北京)传媒科技有限公司 News video scene generating method
US20140150043A1 (en) * 2012-11-23 2014-05-29 Institute For Information Industry Scene fragment transmitting system, scene fragment transmitting method and recording medium
CN104519401A (en) * 2013-09-30 2015-04-15 华为技术有限公司 Video division point acquiring method and equipment
CN106021496A (en) * 2016-05-19 2016-10-12 海信集团有限公司 Video search method and video search device
CN108376147A (en) * 2018-01-24 2018-08-07 北京览科技有限公司 A kind of method and apparatus for obtaining the evaluation result information of video
CN108307229A (en) * 2018-02-02 2018-07-20 新华智云科技有限公司 A kind of processing method and equipment of video-audio data
CN109344780A (en) * 2018-10-11 2019-02-15 上海极链网络科技有限公司 A kind of multi-modal video scene dividing method based on sound and vision

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113259761A (en) * 2020-02-13 2021-08-13 华为技术有限公司 Video processing method, video processing apparatus, and storage medium
WO2021159896A1 (en) * 2020-02-13 2021-08-19 华为技术有限公司 Video processing method, video processing device, and storage medium
CN113259761B (en) * 2020-02-13 2022-08-26 华为技术有限公司 Video processing method, video processing apparatus, and storage medium
CN111400615A (en) * 2020-03-19 2020-07-10 腾讯科技(深圳)有限公司 Resource recommendation method, device, equipment and storage medium
CN113438500A (en) * 2020-03-23 2021-09-24 阿里巴巴集团控股有限公司 Video processing method and device, electronic equipment and computer storage medium
CN111641869A (en) * 2020-06-04 2020-09-08 虎博网络技术(上海)有限公司 Video split mirror method, video split mirror device, electronic equipment and computer readable storage medium
CN111601162A (en) * 2020-06-08 2020-08-28 北京世纪好未来教育科技有限公司 Video segmentation method and device and computer storage medium
CN111601162B (en) * 2020-06-08 2022-08-02 北京世纪好未来教育科技有限公司 Video segmentation method and device and computer storage medium
CN113810782A (en) * 2020-06-12 2021-12-17 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN113810782B (en) * 2020-06-12 2022-09-27 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN113992970A (en) * 2020-07-27 2022-01-28 阿里巴巴集团控股有限公司 Video data processing method and device, electronic equipment and computer storage medium
US11837028B2 (en) 2020-09-29 2023-12-05 New Oriental Education & Technology Group Inc. Dance segment recognition method, dance segment recognition apparatus, and storage medium
CN112100436A (en) * 2020-09-29 2020-12-18 新东方教育科技集团有限公司 Dance segment recognition method, dance segment recognition device and storage medium
CN112100436B (en) * 2020-09-29 2021-07-06 新东方教育科技集团有限公司 Dance segment recognition method, dance segment recognition device and storage medium
CN113435328A (en) * 2021-06-25 2021-09-24 上海众源网络有限公司 Video clip processing method and device, electronic equipment and readable storage medium
CN113435328B (en) * 2021-06-25 2024-05-31 上海众源网络有限公司 Video clip processing method and device, electronic equipment and readable storage medium
CN113569706B (en) * 2021-07-23 2024-03-01 上海明略人工智能(集团)有限公司 Video scene segmentation point judging method, system, storage medium and electronic equipment
CN113569704A (en) * 2021-07-23 2021-10-29 上海明略人工智能(集团)有限公司 Division point judgment method, system, storage medium and electronic device
CN113569704B (en) * 2021-07-23 2023-12-12 上海明略人工智能(集团)有限公司 Segmentation point judging method, system, storage medium and electronic equipment
CN113569706A (en) * 2021-07-23 2021-10-29 上海明略人工智能(集团)有限公司 Video scene segmentation point judgment method and system, storage medium and electronic equipment
CN113569703B (en) * 2021-07-23 2024-04-16 上海明略人工智能(集团)有限公司 Real division point judging method, system, storage medium and electronic equipment
CN113569703A (en) * 2021-07-23 2021-10-29 上海明略人工智能(集团)有限公司 Method and system for judging true segmentation point, storage medium and electronic equipment
CN114222159A (en) * 2021-12-01 2022-03-22 北京奇艺世纪科技有限公司 Method and system for determining video scene change point and generating video clip
CN114299074A (en) * 2021-12-14 2022-04-08 北京达佳互联信息技术有限公司 Video segmentation method, device, equipment and storage medium
CN115086759A (en) * 2022-05-13 2022-09-20 北京达佳互联信息技术有限公司 Video processing method, video processing device, computer equipment and medium
CN116546264A (en) * 2023-04-10 2023-08-04 北京度友信息技术有限公司 Video processing method and device, electronic equipment and storage medium
CN117499739A (en) * 2024-01-02 2024-02-02 腾讯科技(深圳)有限公司 Frame rate control method, device, computer equipment and storage medium
CN117499739B (en) * 2024-01-02 2024-06-07 腾讯科技(深圳)有限公司 Frame rate control method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110213670B (en) 2022-01-07

Similar Documents

Publication Publication Date Title
CN110213670A (en) Method for processing video frequency, device, electronic equipment and storage medium
US20180374491A1 (en) Systems and Methods for Recognizing Sound and Music Signals in High Noise and Distortion
EP1081960B1 (en) Signal processing method and video/voice processing device
CN101819638B (en) Establishment method of pornographic detection model and pornographic detection method
CN105224581B (en) The method and apparatus of picture are presented when playing music
CN107562760B (en) Voice data processing method and device
CN114297439B (en) Short video tag determining method, system, device and storage medium
WO2023197979A1 (en) Data processing method and apparatus, and computer device and storage medium
US20170140226A1 (en) Apparatus and method for identifying a still image contained in moving image contents
JP2003259302A (en) Method for automatically producing music video, product including information storage medium for storing information, and program
CN103729368B (en) A kind of robust audio recognition methods based on local spectrum iamge description
US20150128788A1 (en) Method, device and system for automatically adjusting a duration of a song
CN110324726B (en) Model generation method, video processing method, model generation device, video processing device, electronic equipment and storage medium
US20130266147A1 (en) System and method for identification of highly-variable vocalizations
CN115359409B (en) Video splitting method and device, computer equipment and storage medium
CN110324657A (en) Model generation, method for processing video frequency, device, electronic equipment and storage medium
CN111510765A (en) Audio label intelligent labeling method and device based on teaching video
CN107066488A (en) Video display bridge section automatic division method based on movie and television contents semantic analysis
KR101634068B1 (en) Method and device for generating educational contents map
Felipe et al. Acoustic scene classification using spectrograms
CN116567351B (en) Video processing method, device, equipment and medium
CN117609548A (en) Video multi-mode target element extraction and video abstract synthesis method and system based on pre-training model
CN110555117B (en) Data processing method and device and electronic equipment
CN110516086B (en) Method for automatically acquiring movie label based on deep neural network
US9445210B1 (en) Waveform display control of visual characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant