CN107222795B - Multi-feature fusion video abstract generation method - Google Patents

Multi-feature fusion video abstract generation method Download PDF

Info

Publication number
CN107222795B
CN107222795B CN201710486660.9A CN201710486660A CN107222795B CN 107222795 B CN107222795 B CN 107222795B CN 201710486660 A CN201710486660 A CN 201710486660A CN 107222795 B CN107222795 B CN 107222795B
Authority
CN
China
Prior art keywords
video
importance
frame
video frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710486660.9A
Other languages
Chinese (zh)
Other versions
CN107222795A (en
Inventor
李泽超
唐金辉
胡铜铃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN201710486660.9A priority Critical patent/CN107222795B/en
Publication of CN107222795A publication Critical patent/CN107222795A/en
Application granted granted Critical
Publication of CN107222795B publication Critical patent/CN107222795B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention provides a multi-feature fused video abstract generating method, which comprises the following steps: acquiring a video and taking the video as input data; segmenting input video data, and recording segmentation points and the number of video segments; extracting a video frame and a video frame center block in each video clip; respectively calculating the characteristics and the image quality of the extracted video frame and the center block of the video frame; calculating the global importance and the local importance according to the obtained features; fusing the obtained global importance and the local importance of each frame to obtain fused importance; calculating the importance of each video segment according to the dividing points; selecting the video clips according to the importance of each obtained video clip and a set threshold value, and selecting an optimized video clip subset; and synthesizing the video abstract according to the selected video segment subset.

Description

Multi-feature fusion video abstract generation method
Technical Field
The invention relates to a food analysis and image processing technology, in particular to a multi-feature fusion video abstract generation method.
Background
The current internet technology and the rapid development of devices only enable people to obtain videos and browse videos more and more, and meanwhile, the facing video data is more and more, and in the facing of such a large amount of video data, how to find out the needed video data or visual information from the video data or the video data is a current research hotspot and is also the research content of the video analysis technology. On the basis of research on massive video data, methods such as analysis, processing and storage of the video data are lacked, so that a user has the defect of blindness when searching for useful video data. Therefore, a strong video abstract generation method based on multi-feature fusion of global importance and local importance is needed by performing data mining and image processing on video data.
Disclosure of Invention
The invention aims to provide a video abstract generating method based on multi-feature fusion of global importance and local importance, which comprises the following steps:
step 1, acquiring a video and taking the video as input data;
step 2, segmenting input video data, and recording segmentation points and the number of video segments;
step 3, extracting a video frame and a video frame center block in each video clip;
step 4, calculating the characteristics and the image quality of the extracted video frame and the central block of the video frame respectively;
step 5, calculating the global importance and the local importance according to the obtained features;
step 6, fusing the obtained global importance and the local importance of each frame to obtain fusion importance;
step 7, calculating the importance of each video segment according to the dividing points;
step 8, selecting the video clips according to the importance of each obtained video clip and a set threshold value, and selecting an optimized video clip subset;
and 9, synthesizing the video abstract according to the selected video segment subset.
The invention utilizes various video data acquired by users, including various video data acquired by intelligent equipment and acquired on the Internet, and the acquired video data from various sources can cover all kinds of video data on the network as much as possible; the method can quickly obtain the video abstract wanted by the user without training, thereby saving a great deal of time and energy for the user; in addition, the invention also dynamically extracts the audio information in the video and puts the audio information into the video abstract according to whether the video has the audio information; when the video abstract result is presented to the user, the technology of video analysis and image processing is utilized, the original video is analyzed and processed to obtain the concentrated video abstract, so that the user can quickly obtain the desired concentrated video, and the user experience is improved to a great extent.
The invention is further described below with reference to the accompanying drawings.
Drawings
FIG. 1 is a flow chart of a video summary generation method based on multi-feature fusion of global importance and local importance according to the present invention.
Fig. 2 is a schematic diagram of an original video frame extracted from an original video according to the present invention.
Fig. 3 is a schematic diagram of a video frame extracted by the present invention first divided into 5x5 small blocks and then extracting a central block of 3x3 in the central portion for calculating local importance.
FIG. 4 is an effect diagram of a demonstration of a video summary generation system based on multi-feature fusion of global importance and local importance in the invention.
Detailed Description
With reference to fig. 1, a video summary generation method based on multi-feature fusion of global importance and local importance includes the following steps:
step 1, acquiring a video and taking the video as input data;
step 2, processing the input video data to obtain the number of the segmentation points and the video clips;
step 3, extracting a video frame and a video frame center block in each video clip;
step 4, calculating the characteristics and the image quality of the extracted video frame and the central block of the video frame respectively;
step 5, calculating the global importance and the local importance according to the obtained features;
step 6, fusing the obtained global importance and the local importance of each frame to obtain final fusion importance;
step 7, calculating the importance of each video segment according to the dividing points;
step 8, setting a threshold value to select the video clips according to the importance of each obtained video clip, and selecting an optimized video clip subset;
and 9, synthesizing the video abstract according to the selected video segment subset.
The video data in the step 1 can be obtained through the Internet and various intelligent devices, websites for obtaining the video comprise http:// www.youku.com/, http:// www.iqiyi.com/, and the like, and the intelligent devices for obtaining the video comprise various smart phones, tablets, and the like.
And 2, taking the acquired video data as an input video, segmenting the video into segments, segmenting the video into small video segments by using a superframe segmentation method in combination with the foreground, background and motion information of the video to obtain segmentation points and the number of the video segments, and storing the cutting points and the number of the video segments for later calculation.
In step 3, video frames and central blocks of the video frames are extracted from the video, the extraction of the video frames only needs to use a conventional extraction method, but the extraction of the central blocks of the video frames needs to firstly segment the video frames, the video frames are averagely divided into 5x5 blocks in order to well reserve visual contents, and then the central blocks of 3x3 in the central part are extracted for calculating local importance.
Calculating picture characteristics and image quality of the extracted video frame and the video frame center block in the step 4, wherein the calculated characteristics comprise visual saliency exposure, saturation, chroma, Rule soft hards, contrast and direction, and in addition, the calculation of the image quality of the video frame and the video frame center block is required to be calculated; the calculation formula of the visual saliency is as follows:
Figure GDA0002479049710000031
in the formula, ASFor static significance, ATFor temporal significance, γ is a non-negative empirical parameter, FAJust refers to a function name, which is used to represent the fusion of two visual saliency;
the exposure is calculated by the formula:
Figure GDA0002479049710000032
wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in the channel V, and IV(x, y) is the V channel of the HSV image.
The formula for calculating the chromaticity is as follows:
Figure GDA0002479049710000033
wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in channel S, and IS(x, y) is the S channel of the HSV image.
The formula for calculating the saturation is:
Figure GDA0002479049710000041
wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in the channel V, and IH(x, y) is the V channel of the HSV image.
The formula for Rule ofhirds is:
Figure GDA0002479049710000042
Figure GDA0002479049710000043
Figure GDA0002479049710000044
wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in the channel, and IH(x,y)、IS(x,y)、IV(x, y) are three channels of the HSV image. f. of5、f6、f7The three feature values are calculated according to Ruleofhirds, and are mainly used for reflecting that the main information in the image is positioned near the three divisions of the image.
For contrast and direction calculation, Tamura texture features are mainly used for calculation, and the Tamura image texture features comprise six features which are respectively as follows: the image retrieval method comprises six characteristics of roughness, contrast, direction degree, line granularity, regularity and smoothness, wherein the first three characteristics of the six characteristics have very important functions in the field of image retrieval.
Obtaining image quality of video frame by image quality evaluation method without reference image
Figure GDA0002479049710000045
And image quality of the center block of the video frame
Figure GDA0002479049710000046
The image quality is mainly used for the quality of the video frames and the central blocks of the video frames extracted in a constant manner, and because some video frames and central blocks extracted from the video may have lower quality, we need to consider whether the characteristics calculated by the distorted and blurred video frames and the central blocks can well express the video or not, because the quality of the image plays a very important role in the generation of the video abstract.
In step 5, for the calculation of the global importance and the local importance of each frame of video frame, the calculation formula of the global importance is as follows:
Figure GDA0002479049710000051
where k denotes the k frame video, qGkIs the quality of the video frame, fG_1~fG_9Respectively, the values based on the nine features of the video frame calculated in claim 4.
The calculation formula of the local importance is as follows:
Figure GDA0002479049710000052
where k refers to the k-th frame of video,
Figure GDA0002479049710000053
is the quality of the video frame, fL_1~fL_9Respectively, based on the values of nine features of the central block of the video frame.
In step 6, fusion importance of each frame of video is calculated, and the fusion importance is composed of two parts: global importance and local importance. The calculation formula is as follows:
I_Gk&Lk=I_Gk+I_Lk(10)
wherein I _ GkAnd I _ LkGlobal importance and local importance of the video frame, respectively.
In step 7, calculating the importance of each video segment, the average fusion importance of each video segment is calculated mainly according to the cut point of the video segment obtained in step 2 and the fusion importance of each frame of video frame obtained in step 6, and this calculation of importance is mainly used to prepare for the selection of the next subset of video segments.
The calculation formula of the video clip is as follows:
Figure GDA0002479049710000054
Figure GDA0002479049710000055
ICrefers to the sum of the fusion importance of video segments, IjThe average fusion importance of the video segments is referred to, i refers to a cut point obtained in step 2, and next _ i refers to the next cut point.
In step 8, the subset of the video segment set obtained by segmentation in step 2 is selected according to the fusion importance of each video segment calculated in step 7 and a set threshold, where the threshold is set to be the proportion of the video summary segments to all the video segments, and the unsettable proportion is too high or too low, otherwise, the quality of the video summary is necessarily affected by too many or too few selected video segments, for example, the proportion is set to be 15% or set to be 20% rather suitable.
The calculation formula for selecting the subset is:
Figure GDA0002479049710000061
where {1,0} is a decision function used to determine whether a video segment is selected as part of the video summary, if so, the value of the function is 1, otherwise, the function is 0. Based on the above formula we can select a suitable subset of video segments.
And 9, synthesizing the video abstract according to the video segment subset selected in the step 8. The synthesis is to combine each video clip in the obtained video clip subset in the order of the original video. The video abstract is synthesized by considering whether the video contains audio information, and if the video contains the audio information, the audio information is also included in the process of synthesizing the video abstract. Fig. 4 shows a video summary presentation system. The video summarization method presents the video summarization result in a concise mode to the user, and greatly improves the browsing experience and the demand of the user on the video data.

Claims (7)

1. A multi-feature fused video abstract generation method is characterized by comprising the following steps:
step 1, acquiring a video and taking the video as input data;
step 2, segmenting input video data, and recording segmentation points and the number of video segments;
step 3, extracting a video frame and a video frame center block in each video clip;
step 4, obtaining the extracted video frame and the center block of the video frame to calculate the characteristics and the image quality;
step 5, calculating the global importance and the local importance according to the obtained features;
step 6, fusing the obtained global importance and the local importance of each frame to obtain fusion importance;
step 7, calculating the importance of each video segment according to the dividing points;
step 8, selecting the video clips according to the importance of each obtained video clip and a set threshold value, and selecting an optimized video clip subset;
step 9, synthesizing the video abstract according to the selected video segment subset;
global importance I _ G in step 5kThe calculation formula of (2) is as follows:
Figure FDA0002479049700000011
where k is the index value of the video frame, fG_1~fG_9Values based on 9 features of the video frame, respectively;
local importance I _ L in step 5kThe calculation formula of (2) is as follows:
Figure FDA0002479049700000012
where k is the index value of the video frame, fL_1~fL_9Respectively, values based on 9 features of a central block of a video frame;
step 7, the importance of each video segment includes the sum of fusion importance of video segments ICAverage fusion importance of video segments Ij
Figure FDA0002479049700000013
Figure FDA0002479049700000014
Where k is the index value of the video frame, I _ Gk&LkFor the fusion importance of each frame, i represents the ith segmentation point, and next _ i is the next segmentation point;
the step 8 selects an optimized video segment subset by equation (13):
Figure FDA0002479049700000021
where N refers to the total number of video segments, {1,0} is a decision function for determining whether a video segment is selected as part of the video summary, and if so, the value of the function is 1, otherwise, the function is 0.
2. The method of claim 1, wherein the superframe segmentation method is used for the input video in the step 2 to segment the video into a plurality of small video segments by calculating the foreground, background and motion information of the video, so as to obtain the segmentation points and the number of the video segments.
3. The method according to claim 1, wherein the step 3 for extracting the central block of the video frame comprises: the video frame is divided into 5x5 blocks on average, and then the center block of 3x3 of the center portion is extracted.
4. The method of claim 1, wherein the features calculated in step 4 comprise visual saliency f1Exposure f2Chroma f3Degree of saturation f4Three characteristic values f of Rule Soft hards5,f6,f7Contrast f8Degree of orientation f9The image quality calculated in step 4 comprises the image quality of the video frame
Figure FDA0002479049700000022
And image quality of the center block of the video frame
Figure FDA0002479049700000023
Wherein
Figure FDA0002479049700000024
Wherein A isSFor static significance, ATFor temporal significance, γ is a non-negative empirical parameter;
Figure FDA0002479049700000025
wherein X, Y is the length and width, x, of HSV image converted from extracted video imagev、yvRespectively, the pixel position in the channel V, IV(xv,yv) A V channel for an HSV image;
Figure FDA0002479049700000026
wherein x iss、ysRespectively, the pixel position in the channel S, IS(xs,ys) An S channel of an HSV image;
Figure FDA0002479049700000031
wherein x ish、yhRespectively, the pixel position in channel H, IH(x, y) is the H channel of the HSV image;
Figure FDA0002479049700000032
Figure FDA0002479049700000033
Figure FDA0002479049700000034
calculating contrast and direction degree by adopting Tamura texture characteristics;
obtaining image quality q of video frame by image quality evaluation method without reference imageGkAnd the image quality q of the central block of the video frameLk
5. The method according to claim 1, wherein the fusion importance is obtained by the formula (10) in step 6:
I_Gk&Lk=I_Gk+I_Lk(10)
where k is the index value of the video frame, I _ Gk&LkTo fuse importance, I _ GkAnd I _ LkGlobal importance and local importance of the video frame, respectively.
6. The method of claim 1, wherein the video segments selected in step 9 are combined in the order of each video segment in the subset as in the original video.
7. The method of claim 1, wherein the video summary is synthesized by including audio information, if any, during the synthesis of the video summary.
CN201710486660.9A 2017-06-23 2017-06-23 Multi-feature fusion video abstract generation method Active CN107222795B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710486660.9A CN107222795B (en) 2017-06-23 2017-06-23 Multi-feature fusion video abstract generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710486660.9A CN107222795B (en) 2017-06-23 2017-06-23 Multi-feature fusion video abstract generation method

Publications (2)

Publication Number Publication Date
CN107222795A CN107222795A (en) 2017-09-29
CN107222795B true CN107222795B (en) 2020-07-31

Family

ID=59950929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710486660.9A Active CN107222795B (en) 2017-06-23 2017-06-23 Multi-feature fusion video abstract generation method

Country Status (1)

Country Link
CN (1) CN107222795B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804578B (en) * 2018-05-24 2022-06-07 南京理工大学 Unsupervised video abstraction method based on consistency segment generation
CN110868630A (en) * 2018-08-27 2020-03-06 北京优酷科技有限公司 Method and device for generating forecast report
CN109413510B (en) * 2018-10-19 2021-05-18 深圳市商汤科技有限公司 Video abstract generation method and device, electronic equipment and computer storage medium
CN111246246A (en) * 2018-11-28 2020-06-05 华为技术有限公司 Video playing method and device
CN111401100B (en) * 2018-12-28 2021-02-09 广州市百果园信息技术有限公司 Video quality evaluation method, device, equipment and storage medium
CN109819338B (en) 2019-02-22 2021-09-14 影石创新科技股份有限公司 Automatic video editing method and device and portable terminal
CN111062284B (en) * 2019-12-06 2023-09-29 浙江工业大学 Visual understanding and diagnosis method for interactive video abstract model
CN111641868A (en) * 2020-05-27 2020-09-08 维沃移动通信有限公司 Preview video generation method and device and electronic equipment
CN112052841B (en) * 2020-10-12 2021-06-29 腾讯科技(深圳)有限公司 Video abstract generation method and related device
CN112734733B (en) * 2021-01-12 2022-11-01 天津大学 Non-reference image quality monitoring method based on channel recombination and feature fusion
CN113052149B (en) * 2021-05-20 2021-08-13 平安科技(深圳)有限公司 Video abstract generation method and device, computer equipment and medium
CN114140461B (en) * 2021-12-09 2023-02-14 成都智元汇信息技术股份有限公司 Picture cutting method based on edge picture recognition box, electronic equipment and medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9076043B2 (en) * 2012-08-03 2015-07-07 Kodak Alaris Inc. Video summarization using group sparsity analysis
CN102930061B (en) * 2012-11-28 2016-01-06 安徽水天信息科技有限公司 A kind of video summarization method based on moving object detection
US10095786B2 (en) * 2015-04-09 2018-10-09 Oath Inc. Topical based media content summarization system and method
CN105228033B (en) * 2015-08-27 2018-11-09 联想(北京)有限公司 A kind of method for processing video frequency and electronic equipment
US20170148488A1 (en) * 2015-11-20 2017-05-25 Mediatek Inc. Video data processing system and associated method for analyzing and summarizing recorded video data
CN106713964A (en) * 2016-12-05 2017-05-24 乐视控股(北京)有限公司 Method of generating video abstract viewpoint graph and apparatus thereof

Also Published As

Publication number Publication date
CN107222795A (en) 2017-09-29

Similar Documents

Publication Publication Date Title
CN107222795B (en) Multi-feature fusion video abstract generation method
US10735494B2 (en) Media information presentation method, client, and server
US20210160556A1 (en) Method for enhancing resolution of streaming file
US9892324B1 (en) Actor/person centric auto thumbnail
CN109803180B (en) Video preview generation method and device, computer equipment and storage medium
CN109218629B (en) Video generation method, storage medium and device
CN104994426B (en) Program video identification method and system
US20170285916A1 (en) Camera effects for photo story generation
EP2568429A1 (en) Method and system for pushing individual advertisement based on user interest learning
CN116916080A (en) Video data processing method, device, computer equipment and readable storage medium
CN111930994A (en) Video editing processing method and device, electronic equipment and storage medium
CN113870133B (en) Multimedia display and matching method, device, equipment and medium
US20150278605A1 (en) Apparatus and method for managing representative video images
US20150161094A1 (en) Apparatus and method for automatically generating visual annotation based on visual language
CN114331820A (en) Image processing method, image processing device, electronic equipment and storage medium
JP2016035607A (en) Apparatus, method and program for generating digest
CN103984778A (en) Video retrieval method and video retrieval system
CN113784171A (en) Video data processing method, device, computer system and readable storage medium
CN111340101A (en) Stability evaluation method and device, electronic equipment and computer readable storage medium
CN112383824A (en) Video advertisement filtering method, device and storage medium
CN109618111B (en) Cloud-shear multi-channel distribution system
Dev et al. Localizing adverts in outdoor scenes
Husa et al. HOST-ATS: automatic thumbnail selection with dashboard-controlled ML pipeline and dynamic user survey
JP2018206292A (en) Video summary creation device and program
Hu et al. Video summarization via exploring the global and local importance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant