CN107222795B - Multi-feature fusion video abstract generation method - Google Patents
Multi-feature fusion video abstract generation method Download PDFInfo
- Publication number
- CN107222795B CN107222795B CN201710486660.9A CN201710486660A CN107222795B CN 107222795 B CN107222795 B CN 107222795B CN 201710486660 A CN201710486660 A CN 201710486660A CN 107222795 B CN107222795 B CN 107222795B
- Authority
- CN
- China
- Prior art keywords
- video
- importance
- frame
- video frame
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000004927 fusion Effects 0.000 title claims description 23
- 230000011218 segmentation Effects 0.000 claims abstract description 11
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000000007 visual effect Effects 0.000 claims description 6
- 230000015572 biosynthetic process Effects 0.000 claims description 2
- 238000013441 quality evaluation Methods 0.000 claims description 2
- 230000003068 static effect Effects 0.000 claims description 2
- 238000003786 synthesis reaction Methods 0.000 claims description 2
- 230000002123 temporal effect Effects 0.000 claims description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims 1
- 238000012545 processing Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 201000004569 Blindness Diseases 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004186 food analysis Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8549—Creating video summaries, e.g. movie trailer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Television Signal Processing For Recording (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
The invention provides a multi-feature fused video abstract generating method, which comprises the following steps: acquiring a video and taking the video as input data; segmenting input video data, and recording segmentation points and the number of video segments; extracting a video frame and a video frame center block in each video clip; respectively calculating the characteristics and the image quality of the extracted video frame and the center block of the video frame; calculating the global importance and the local importance according to the obtained features; fusing the obtained global importance and the local importance of each frame to obtain fused importance; calculating the importance of each video segment according to the dividing points; selecting the video clips according to the importance of each obtained video clip and a set threshold value, and selecting an optimized video clip subset; and synthesizing the video abstract according to the selected video segment subset.
Description
Technical Field
The invention relates to a food analysis and image processing technology, in particular to a multi-feature fusion video abstract generation method.
Background
The current internet technology and the rapid development of devices only enable people to obtain videos and browse videos more and more, and meanwhile, the facing video data is more and more, and in the facing of such a large amount of video data, how to find out the needed video data or visual information from the video data or the video data is a current research hotspot and is also the research content of the video analysis technology. On the basis of research on massive video data, methods such as analysis, processing and storage of the video data are lacked, so that a user has the defect of blindness when searching for useful video data. Therefore, a strong video abstract generation method based on multi-feature fusion of global importance and local importance is needed by performing data mining and image processing on video data.
Disclosure of Invention
The invention aims to provide a video abstract generating method based on multi-feature fusion of global importance and local importance, which comprises the following steps:
step 2, segmenting input video data, and recording segmentation points and the number of video segments;
step 3, extracting a video frame and a video frame center block in each video clip;
step 4, calculating the characteristics and the image quality of the extracted video frame and the central block of the video frame respectively;
step 5, calculating the global importance and the local importance according to the obtained features;
step 6, fusing the obtained global importance and the local importance of each frame to obtain fusion importance;
step 7, calculating the importance of each video segment according to the dividing points;
step 8, selecting the video clips according to the importance of each obtained video clip and a set threshold value, and selecting an optimized video clip subset;
and 9, synthesizing the video abstract according to the selected video segment subset.
The invention utilizes various video data acquired by users, including various video data acquired by intelligent equipment and acquired on the Internet, and the acquired video data from various sources can cover all kinds of video data on the network as much as possible; the method can quickly obtain the video abstract wanted by the user without training, thereby saving a great deal of time and energy for the user; in addition, the invention also dynamically extracts the audio information in the video and puts the audio information into the video abstract according to whether the video has the audio information; when the video abstract result is presented to the user, the technology of video analysis and image processing is utilized, the original video is analyzed and processed to obtain the concentrated video abstract, so that the user can quickly obtain the desired concentrated video, and the user experience is improved to a great extent.
The invention is further described below with reference to the accompanying drawings.
Drawings
FIG. 1 is a flow chart of a video summary generation method based on multi-feature fusion of global importance and local importance according to the present invention.
Fig. 2 is a schematic diagram of an original video frame extracted from an original video according to the present invention.
Fig. 3 is a schematic diagram of a video frame extracted by the present invention first divided into 5x5 small blocks and then extracting a central block of 3x3 in the central portion for calculating local importance.
FIG. 4 is an effect diagram of a demonstration of a video summary generation system based on multi-feature fusion of global importance and local importance in the invention.
Detailed Description
With reference to fig. 1, a video summary generation method based on multi-feature fusion of global importance and local importance includes the following steps:
step 2, processing the input video data to obtain the number of the segmentation points and the video clips;
step 3, extracting a video frame and a video frame center block in each video clip;
step 4, calculating the characteristics and the image quality of the extracted video frame and the central block of the video frame respectively;
step 5, calculating the global importance and the local importance according to the obtained features;
step 6, fusing the obtained global importance and the local importance of each frame to obtain final fusion importance;
step 7, calculating the importance of each video segment according to the dividing points;
step 8, setting a threshold value to select the video clips according to the importance of each obtained video clip, and selecting an optimized video clip subset;
and 9, synthesizing the video abstract according to the selected video segment subset.
The video data in the step 1 can be obtained through the Internet and various intelligent devices, websites for obtaining the video comprise http:// www.youku.com/, http:// www.iqiyi.com/, and the like, and the intelligent devices for obtaining the video comprise various smart phones, tablets, and the like.
And 2, taking the acquired video data as an input video, segmenting the video into segments, segmenting the video into small video segments by using a superframe segmentation method in combination with the foreground, background and motion information of the video to obtain segmentation points and the number of the video segments, and storing the cutting points and the number of the video segments for later calculation.
In step 3, video frames and central blocks of the video frames are extracted from the video, the extraction of the video frames only needs to use a conventional extraction method, but the extraction of the central blocks of the video frames needs to firstly segment the video frames, the video frames are averagely divided into 5x5 blocks in order to well reserve visual contents, and then the central blocks of 3x3 in the central part are extracted for calculating local importance.
Calculating picture characteristics and image quality of the extracted video frame and the video frame center block in the step 4, wherein the calculated characteristics comprise visual saliency exposure, saturation, chroma, Rule soft hards, contrast and direction, and in addition, the calculation of the image quality of the video frame and the video frame center block is required to be calculated; the calculation formula of the visual saliency is as follows:
in the formula, ASFor static significance, ATFor temporal significance, γ is a non-negative empirical parameter, FAJust refers to a function name, which is used to represent the fusion of two visual saliency;
the exposure is calculated by the formula:
wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in the channel V, and IV(x, y) is the V channel of the HSV image.
The formula for calculating the chromaticity is as follows:
wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in channel S, and IS(x, y) is the S channel of the HSV image.
The formula for calculating the saturation is:
wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in the channel V, and IH(x, y) is the V channel of the HSV image.
The formula for Rule ofhirds is:
wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in the channel, and IH(x,y)、IS(x,y)、IV(x, y) are three channels of the HSV image. f. of5、f6、f7The three feature values are calculated according to Ruleofhirds, and are mainly used for reflecting that the main information in the image is positioned near the three divisions of the image.
For contrast and direction calculation, Tamura texture features are mainly used for calculation, and the Tamura image texture features comprise six features which are respectively as follows: the image retrieval method comprises six characteristics of roughness, contrast, direction degree, line granularity, regularity and smoothness, wherein the first three characteristics of the six characteristics have very important functions in the field of image retrieval.
Obtaining image quality of video frame by image quality evaluation method without reference imageAnd image quality of the center block of the video frameThe image quality is mainly used for the quality of the video frames and the central blocks of the video frames extracted in a constant manner, and because some video frames and central blocks extracted from the video may have lower quality, we need to consider whether the characteristics calculated by the distorted and blurred video frames and the central blocks can well express the video or not, because the quality of the image plays a very important role in the generation of the video abstract.
In step 5, for the calculation of the global importance and the local importance of each frame of video frame, the calculation formula of the global importance is as follows:
where k denotes the k frame video, qGkIs the quality of the video frame, fG_1~fG_9Respectively, the values based on the nine features of the video frame calculated in claim 4.
The calculation formula of the local importance is as follows:
where k refers to the k-th frame of video,is the quality of the video frame, fL_1~fL_9Respectively, based on the values of nine features of the central block of the video frame.
In step 6, fusion importance of each frame of video is calculated, and the fusion importance is composed of two parts: global importance and local importance. The calculation formula is as follows:
I_Gk&Lk=I_Gk+I_Lk(10)
wherein I _ GkAnd I _ LkGlobal importance and local importance of the video frame, respectively.
In step 7, calculating the importance of each video segment, the average fusion importance of each video segment is calculated mainly according to the cut point of the video segment obtained in step 2 and the fusion importance of each frame of video frame obtained in step 6, and this calculation of importance is mainly used to prepare for the selection of the next subset of video segments.
The calculation formula of the video clip is as follows:
ICrefers to the sum of the fusion importance of video segments, IjThe average fusion importance of the video segments is referred to, i refers to a cut point obtained in step 2, and next _ i refers to the next cut point.
In step 8, the subset of the video segment set obtained by segmentation in step 2 is selected according to the fusion importance of each video segment calculated in step 7 and a set threshold, where the threshold is set to be the proportion of the video summary segments to all the video segments, and the unsettable proportion is too high or too low, otherwise, the quality of the video summary is necessarily affected by too many or too few selected video segments, for example, the proportion is set to be 15% or set to be 20% rather suitable.
The calculation formula for selecting the subset is:
where {1,0} is a decision function used to determine whether a video segment is selected as part of the video summary, if so, the value of the function is 1, otherwise, the function is 0. Based on the above formula we can select a suitable subset of video segments.
And 9, synthesizing the video abstract according to the video segment subset selected in the step 8. The synthesis is to combine each video clip in the obtained video clip subset in the order of the original video. The video abstract is synthesized by considering whether the video contains audio information, and if the video contains the audio information, the audio information is also included in the process of synthesizing the video abstract. Fig. 4 shows a video summary presentation system. The video summarization method presents the video summarization result in a concise mode to the user, and greatly improves the browsing experience and the demand of the user on the video data.
Claims (7)
1. A multi-feature fused video abstract generation method is characterized by comprising the following steps:
step 1, acquiring a video and taking the video as input data;
step 2, segmenting input video data, and recording segmentation points and the number of video segments;
step 3, extracting a video frame and a video frame center block in each video clip;
step 4, obtaining the extracted video frame and the center block of the video frame to calculate the characteristics and the image quality;
step 5, calculating the global importance and the local importance according to the obtained features;
step 6, fusing the obtained global importance and the local importance of each frame to obtain fusion importance;
step 7, calculating the importance of each video segment according to the dividing points;
step 8, selecting the video clips according to the importance of each obtained video clip and a set threshold value, and selecting an optimized video clip subset;
step 9, synthesizing the video abstract according to the selected video segment subset;
global importance I _ G in step 5kThe calculation formula of (2) is as follows:
where k is the index value of the video frame, fG_1~fG_9Values based on 9 features of the video frame, respectively;
local importance I _ L in step 5kThe calculation formula of (2) is as follows:
where k is the index value of the video frame, fL_1~fL_9Respectively, values based on 9 features of a central block of a video frame;
step 7, the importance of each video segment includes the sum of fusion importance of video segments ICAverage fusion importance of video segments Ij,
Where k is the index value of the video frame, I _ Gk&LkFor the fusion importance of each frame, i represents the ith segmentation point, and next _ i is the next segmentation point;
the step 8 selects an optimized video segment subset by equation (13):
where N refers to the total number of video segments, {1,0} is a decision function for determining whether a video segment is selected as part of the video summary, and if so, the value of the function is 1, otherwise, the function is 0.
2. The method of claim 1, wherein the superframe segmentation method is used for the input video in the step 2 to segment the video into a plurality of small video segments by calculating the foreground, background and motion information of the video, so as to obtain the segmentation points and the number of the video segments.
3. The method according to claim 1, wherein the step 3 for extracting the central block of the video frame comprises: the video frame is divided into 5x5 blocks on average, and then the center block of 3x3 of the center portion is extracted.
4. The method of claim 1, wherein the features calculated in step 4 comprise visual saliency f1Exposure f2Chroma f3Degree of saturation f4Three characteristic values f of Rule Soft hards5,f6,f7Contrast f8Degree of orientation f9The image quality calculated in step 4 comprises the image quality of the video frameAnd image quality of the center block of the video frameWherein
Wherein A isSFor static significance, ATFor temporal significance, γ is a non-negative empirical parameter;
wherein X, Y is the length and width, x, of HSV image converted from extracted video imagev、yvRespectively, the pixel position in the channel V, IV(xv,yv) A V channel for an HSV image;
wherein x iss、ysRespectively, the pixel position in the channel S, IS(xs,ys) An S channel of an HSV image;
wherein x ish、yhRespectively, the pixel position in channel H, IH(x, y) is the H channel of the HSV image;
calculating contrast and direction degree by adopting Tamura texture characteristics;
obtaining image quality q of video frame by image quality evaluation method without reference imageGkAnd the image quality q of the central block of the video frameLk。
5. The method according to claim 1, wherein the fusion importance is obtained by the formula (10) in step 6:
I_Gk&Lk=I_Gk+I_Lk(10)
where k is the index value of the video frame, I _ Gk&LkTo fuse importance, I _ GkAnd I _ LkGlobal importance and local importance of the video frame, respectively.
6. The method of claim 1, wherein the video segments selected in step 9 are combined in the order of each video segment in the subset as in the original video.
7. The method of claim 1, wherein the video summary is synthesized by including audio information, if any, during the synthesis of the video summary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710486660.9A CN107222795B (en) | 2017-06-23 | 2017-06-23 | Multi-feature fusion video abstract generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710486660.9A CN107222795B (en) | 2017-06-23 | 2017-06-23 | Multi-feature fusion video abstract generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107222795A CN107222795A (en) | 2017-09-29 |
CN107222795B true CN107222795B (en) | 2020-07-31 |
Family
ID=59950929
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710486660.9A Active CN107222795B (en) | 2017-06-23 | 2017-06-23 | Multi-feature fusion video abstract generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107222795B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804578B (en) * | 2018-05-24 | 2022-06-07 | 南京理工大学 | Unsupervised video abstraction method based on consistency segment generation |
CN110868630A (en) * | 2018-08-27 | 2020-03-06 | 北京优酷科技有限公司 | Method and device for generating forecast report |
CN109413510B (en) * | 2018-10-19 | 2021-05-18 | 深圳市商汤科技有限公司 | Video abstract generation method and device, electronic equipment and computer storage medium |
CN111246246A (en) * | 2018-11-28 | 2020-06-05 | 华为技术有限公司 | Video playing method and device |
CN111401100B (en) * | 2018-12-28 | 2021-02-09 | 广州市百果园信息技术有限公司 | Video quality evaluation method, device, equipment and storage medium |
CN109819338B (en) | 2019-02-22 | 2021-09-14 | 影石创新科技股份有限公司 | Automatic video editing method and device and portable terminal |
CN111062284B (en) * | 2019-12-06 | 2023-09-29 | 浙江工业大学 | Visual understanding and diagnosis method for interactive video abstract model |
CN111641868A (en) * | 2020-05-27 | 2020-09-08 | 维沃移动通信有限公司 | Preview video generation method and device and electronic equipment |
CN112052841B (en) * | 2020-10-12 | 2021-06-29 | 腾讯科技(深圳)有限公司 | Video abstract generation method and related device |
CN112734733B (en) * | 2021-01-12 | 2022-11-01 | 天津大学 | Non-reference image quality monitoring method based on channel recombination and feature fusion |
CN113052149B (en) * | 2021-05-20 | 2021-08-13 | 平安科技(深圳)有限公司 | Video abstract generation method and device, computer equipment and medium |
CN114140461B (en) * | 2021-12-09 | 2023-02-14 | 成都智元汇信息技术股份有限公司 | Picture cutting method based on edge picture recognition box, electronic equipment and medium |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9076043B2 (en) * | 2012-08-03 | 2015-07-07 | Kodak Alaris Inc. | Video summarization using group sparsity analysis |
CN102930061B (en) * | 2012-11-28 | 2016-01-06 | 安徽水天信息科技有限公司 | A kind of video summarization method based on moving object detection |
US10095786B2 (en) * | 2015-04-09 | 2018-10-09 | Oath Inc. | Topical based media content summarization system and method |
CN105228033B (en) * | 2015-08-27 | 2018-11-09 | 联想(北京)有限公司 | A kind of method for processing video frequency and electronic equipment |
US20170148488A1 (en) * | 2015-11-20 | 2017-05-25 | Mediatek Inc. | Video data processing system and associated method for analyzing and summarizing recorded video data |
CN106713964A (en) * | 2016-12-05 | 2017-05-24 | 乐视控股(北京)有限公司 | Method of generating video abstract viewpoint graph and apparatus thereof |
-
2017
- 2017-06-23 CN CN201710486660.9A patent/CN107222795B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN107222795A (en) | 2017-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107222795B (en) | Multi-feature fusion video abstract generation method | |
US10735494B2 (en) | Media information presentation method, client, and server | |
US20210160556A1 (en) | Method for enhancing resolution of streaming file | |
US9892324B1 (en) | Actor/person centric auto thumbnail | |
CN109803180B (en) | Video preview generation method and device, computer equipment and storage medium | |
CN109218629B (en) | Video generation method, storage medium and device | |
CN104994426B (en) | Program video identification method and system | |
US20170285916A1 (en) | Camera effects for photo story generation | |
EP2568429A1 (en) | Method and system for pushing individual advertisement based on user interest learning | |
CN116916080A (en) | Video data processing method, device, computer equipment and readable storage medium | |
CN111930994A (en) | Video editing processing method and device, electronic equipment and storage medium | |
CN113870133B (en) | Multimedia display and matching method, device, equipment and medium | |
US20150278605A1 (en) | Apparatus and method for managing representative video images | |
US20150161094A1 (en) | Apparatus and method for automatically generating visual annotation based on visual language | |
CN114331820A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
JP2016035607A (en) | Apparatus, method and program for generating digest | |
CN103984778A (en) | Video retrieval method and video retrieval system | |
CN113784171A (en) | Video data processing method, device, computer system and readable storage medium | |
CN111340101A (en) | Stability evaluation method and device, electronic equipment and computer readable storage medium | |
CN112383824A (en) | Video advertisement filtering method, device and storage medium | |
CN109618111B (en) | Cloud-shear multi-channel distribution system | |
Dev et al. | Localizing adverts in outdoor scenes | |
Husa et al. | HOST-ATS: automatic thumbnail selection with dashboard-controlled ML pipeline and dynamic user survey | |
JP2018206292A (en) | Video summary creation device and program | |
Hu et al. | Video summarization via exploring the global and local importance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |