CN111008978B - Video scene segmentation method based on deep learning - Google Patents

Video scene segmentation method based on deep learning Download PDF

Info

Publication number
CN111008978B
CN111008978B CN201911239331.XA CN201911239331A CN111008978B CN 111008978 B CN111008978 B CN 111008978B CN 201911239331 A CN201911239331 A CN 201911239331A CN 111008978 B CN111008978 B CN 111008978B
Authority
CN
China
Prior art keywords
frame
background
similarity
image
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911239331.XA
Other languages
Chinese (zh)
Other versions
CN111008978A (en
Inventor
代成
刘欣刚
李辰奇
倪铭昊
韩硕
曾昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201911239331.XA priority Critical patent/CN111008978B/en
Publication of CN111008978A publication Critical patent/CN111008978A/en
Application granted granted Critical
Publication of CN111008978B publication Critical patent/CN111008978B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video scene segmentation method based on deep learning, and belongs to the technical field of video scene segmentation. Firstly, converting video data to be segmented into frame images, and then carrying out target detection processing based on a deep learning algorithm to obtain background candidate frames of the frame images; and selecting a key background candidate frame of the frame image; determining a background candidate frame corresponding to the position information on an adjacent subsequent image frame of the image frame where the key background candidate frame is located based on the position information of the key background candidate frame; and finally, calculating the joint similarity of the adjacent image frames, and if the joint similarity is lower than a similarity threshold, performing video segmentation on a segment of video data to be segmented based on the frame position of the current adjacent frame. The method can realize the judgment of the similarity of the video background information under the condition of automatically extracting the local background area, solves the problem of overhigh algorithm complexity in the traditional algorithm, and realizes the background segmentation under the complex scene.

Description

Video scene segmentation method based on deep learning
Technical Field
The invention relates to the technical field of video scene segmentation, in particular to a video scene segmentation method based on deep learning.
Background
With the rapid development of multimedia technology, video is widely applied to daily life of people as an important information transmission medium. In recent years, the amount of video data has increased explosively, however, while the work, study and life of people are enriched by massive video data, the storage, management and retrieval of the massive video data become the basis for efficiently using the data, and especially in the big data era, how to accurately classify and retrieve videos also becomes a great challenge at present. Considering that video scene segmentation has important significance for more flexibly and efficiently identifying video data in video retrieval research, accurate scene segmentation is beginning to be paid more and more attention by researchers.
The main objective of scene segmentation is to accurately detect scene similarity and segment video under the condition of obvious discrimination, but the traditional artificial feature-based algorithm has the problems of large artificial feature engineering quantity, high calculation complexity, low accuracy and the like, so that the current real-time segmentation requirement cannot be well met, and therefore, a new method is needed to solve the problem of video background segmentation more intelligently.
Disclosure of Invention
The invention aims to: in order to solve the defects of the prior art, a more accurate and more convenient video background segmentation method is provided for mass data in a complex scene.
The invention discloses a video scene segmentation method based on deep learning, which comprises the following steps:
step S1: image preprocessing: converting video data to be segmented into frame images;
for example, frame image sampling is performed on video data to be segmented (a segment of video frame sequence to be segmented) at fixed intervals to obtain a frame image sequence;
step S2: background candidate box identification:
based on a preset target object, performing target detection processing on each frame of image by adopting a target detection algorithm, namely fast R-CNN, generating a candidate frame of the target object, and labeling coordinate information of the candidate frame;
performing target object identification on the candidate frame, and screening out the candidate frame without the target object as a background candidate frame of the frame image;
and step S3: and (3) selecting a key background candidate frame for the frame image:
step S31: screening out background candidate frames with the areas smaller than a preset area threshold;
step S32: and (3) screening out background candidate frames with high overlapping degree: when the overlapping degree of the two overlapped background candidate frames is larger than a preset overlapping degree threshold value, deleting the smaller one of the two overlapped background candidate frames;
wherein, the calculation formula of the overlapping degree is as follows:
Figure BDA0002305777480000021
wherein Area represents the Area, B-box i And B-box j Respectively representing that two overlapped background candidate frames exist, wherein i and j are background candidate frame identifiers;
taking the current residual background candidate frame as a key background candidate frame;
and step S4: determining a background candidate frame corresponding to the position information on an adjacent subsequent image frame of the image frame where the key background candidate frame is located based on the position information of the key background candidate frame;
step S5: calculating the similarity of adjacent image frames:
taking the position area of the key background candidate frame or the background candidate frame as a background area;
taking the key background candidate frame of the previous image frame obtained in the step (4) and the corresponding background candidate frame on the adjacent next image frame as similarity calculation objects of the background areas at the same position of the adjacent image frames;
respectively calculating the structural similarity and the histogram similarity of the similarity calculation object;
setting a weight value w for each background region i Comprises the following steps:
Figure BDA0002305777480000022
wherein A is i Represents the area of the ith background region; n represents the number of background regions included in the frame image;
and according to the formula
Figure BDA0002305777480000023
Calculating a joint similarity of adjacent image frames, wherein
Figure BDA0002305777480000024
SSIM i 、Hist i Respectively representing the structural similarity and the histogram similarity of the ith background area corresponding to the two adjacent frame images;
step S6: video scene segmentation:
and if the joint similarity is lower than a preset similarity threshold, performing video segmentation on the video data to be segmented based on the frame position of the current adjacent frame, so as to segment the video data to be segmented into multiple segments of sub-video segments, wherein each segment of sub-video segment is a scene of one class.
For example, for a frame image sequence obtained by fixed-interval sampling, adjacent frames in the frame image sequence are not adjacent to original video data, a certain number of original video frames are included between the two frames, and only one segmentation position needs to be selected from the two frames at will, that is, the adjacent frames with the joint similarity degree lower than a preset similarity degree threshold value in the frame image sequence obtained by fixed-interval sampling are segmented into different types of scenes, the previous frame image of the adjacent frame corresponds to one type of scene, and the next frame image corresponds to another type of scene.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
the target detection under the complex scene can be learned through a deep learning technology, and a local background candidate frame is obtained. And then labeling the corresponding coordinates of the candidate frame of the adjacent frame images, and through the weighted comparison of the structural similarity SSIM and the histogram similarity Hist of the local region of the image, the complexity of the algorithm can be reduced, and meanwhile, the feature region based on deep learning has universality and higher segmentation accuracy compared with the traditional manual region labeling.
Drawings
FIG. 1 is a schematic diagram of a specific implementation process in an embodiment;
figure 2 is a schematic diagram of tensor modeling in an example.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
The invention discloses a video scene segmentation method based on deep learning, which comprises the following steps:
s1: image preprocessing, converting video data into frame images: namely, a conventional video frame extraction mode is adopted to complete the conversion from a video to a corresponding frame, so as to obtain a frame image to be processed;
s2: identifying a background area, determining a target object in the frame image by using a target detection algorithm, namely a Faster R-CNN algorithm, and further determining a background candidate frame of the frame image:
firstly, a CNN + RPN network (a convolutional neural network + a region generation network) is adopted to generate a candidate frame, namely a candidate region frame, and coordinate information of the candidate frame is labeled;
performing classification regression on the content features in the candidate frames so as to realize object target identification;
and screening out the candidate frames without the target object in the candidate frames to obtain the coordinates of the background candidate frame of the frame image (the position area where the background candidate frame is located is the background area).
The Faster R-CNN algorithm can be referred to the literature "Faster R-CNN: targets Real-Time Object Detection with Region pro-site Networks".
S3: selecting a key background candidate box for each frame of image in the video:
calculating according to the area of the background region, quantizing through a region overlapping detection function, deleting the overlapping part in the background region and the background candidate frame with small region area, and realizing the selection of an effective background candidate frame, namely a key background candidate frame;
s31: and screening out the background candidate frame with a small area, and when the area is smaller than a certain threshold value, neglecting, wherein the area formula of the background candidate frame is as follows:
Figure BDA0002305777480000031
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002305777480000032
and
Figure BDA0002305777480000033
left and right abscissas representing the ith background candidate frame;
Figure BDA0002305777480000034
and
Figure BDA0002305777480000035
the upper and lower vertical coordinates of the ith background candidate frame are represented; a. The i Representing the area of the ith background candidate frame;
s32: and (3) screening out the background candidate frames with high overlapping degree, and deleting the smaller one of the background candidate frames with high overlapping degree when the overlapping degree is high, wherein the overlapping detection function is as follows:
Figure BDA0002305777480000041
wherein Area represents an Area, B-box i And B-box j Respectively representing the ith and jth background candidate boxes.
S4: extracting the characteristics of the background candidate frames, namely extracting the background candidate frames of the corresponding areas of the adjacent frames according to the coordinates;
extracting coordinates of corresponding points of the key background candidate frame, and finding out a corresponding background candidate frame of an adjacent subsequent frame according to the extracted coordinates of the corresponding points;
s5: and (3) contrast of the background frame similarity, namely weighting the corresponding background area of the adjacent frame by using a structure similarity SSIM and histogram similarity Hist combined algorithm to complete contrast of the background similarity of the adjacent frame.
Referring to fig. 2, the specific calculation method of the structural similarity SSIM is as follows:
SSIM(x,y)=L(x,y)×C(x,y)×S(x,y)
wherein, the functions of L (x, y), C (x, y) and S (x, y) respectively represent the brightness, contrast and structure contrast of the two images, and SSIM (x, y) is the structure similarity of the two images.
The specific calculation formulas of L (x, y), C (x, y) and S (x, y) are as follows:
(1)
Figure BDA0002305777480000042
wherein u is x ,u y The average values of the pixels representing images x, y respectively,
Figure BDA0002305777480000043
x i the ith pixel value of the image x is represented, and N represents the number of pixel points; u. u y And u x In the same way as C 1 Is constant and is used to avoid denominator of 0, usually taking the value C 1 =(K 1 ×L),K 1 =0.01,L=255。
(2)
Figure BDA0002305777480000044
Wherein σ x ,σ y Representing the pixel standard deviation of the images x, y,
Figure BDA0002305777480000045
wherein, mu x Mean value of pixels, C, representing image x 2 =(K 2 ×L) 2 ,K 1 =0.03,L=255。
(3)
Figure BDA0002305777480000046
Wherein σ xy Represents the covariance of the pixels of the image x, y, and
Figure BDA0002305777480000047
μ y represents the mean value of the pixels of the image y,
Figure BDA0002305777480000048
the specific calculation formula of the histogram similarity Hist is as follows:
Figure BDA0002305777480000051
wherein the content of the first and second substances,
Figure BDA0002305777480000052
the ith number of the histogram of the image x, y is shown, and N is the number of all the numbers contained in the histogram.
When the structural similarity SSIM and the histogram similarity Hist are jointly processed, the weight value of each background frame is firstly set, then the weighted average of two kinds of similarity of all the background frames is calculated, and the combined weighted average of the two kinds of similarity is combined to obtain the final similarity metric value, namely the joint similarity:
weight value w of each background frame i Comprises the following steps:
Figure BDA0002305777480000053
wherein A is i Indicates the area of the ith background box.
The combined similarity is:
Figure BDA0002305777480000054
wherein
Figure BDA0002305777480000055
SSIM i 、Hist i Respectively representing the structure and the histogram similarity of the ith background frame between two adjacent frame images.
S6: and (4) segmenting a video scene.
According to the result of the scene similarity comparison, if the similarity is lower than the threshold value, the relationship between the images of the similar frames (adjacent frames) is not large, the images do not belong to a class of scenes, and the video is segmented based on the frame position of the current adjacent frame, namely the video is segmented into different paragraph shots.
Examples
The video scene segmentation method based on the invention is applied to the application based on video processing, and realizes a video segmentation algorithm based on an improved Faster R-CNN network, referring to fig. 1, and the specific implementation process is as follows:
s1: image preprocessing, namely converting video data into frame images; in the present embodiment, the length of the processed video is mostly short video files within 1.5 minutes to 3 minutes, and calculated by 24 frames per second, there are approximately 2160 frames to 4320 frames. In order to reduce the amount of calculation and increase the calculation speed, the present embodiment samples video frames at equal intervals with a width of 5 frames. Therefore, the frame number of a single video is reduced to 432-864, and the continuity of the original video can be ensured, thereby avoiding information loss caused by excessively large content change.
S2: target identification, namely marking a target object in a video by using a Faster R-CNN algorithm;
the Faster R-CNN model is mainly composed of 4 parts.
Firstly, the convolution layer performs feature extraction on an input picture frame;
secondly, the extracted feature map enters an RPN (Region Proposal Network) Network to generate 300 candidate Region frames;
thirdly, converting the candidate Region frame into a feature with a fixed length through RoI (Region 0f Interest) pooling;
finally, regression and classification are carried out on each candidate region frame, and the object in the candidate region and the accurate coordinates of the region are output.
In this embodiment, a CNN model of VGG-16 is used for feature extraction, and an image classification dataset VOC2007 is used for training, so that 21 classes of objects can be distinguished. If there are objects in the region box, then it is considered foreground and removed. Then, a certain number of region frames are selected as background candidate region frames (background frames) from the remaining region frames, and 20 region frames are selected in this embodiment.
S3: selecting a key background area, calculating according to the area of the background area, quantizing through an area overlapping detection function, deleting an overlapping part in the background area and a background frame with a small area, and selecting an effective background frame; experiments prove that when the area of the region is larger than 800, the distribution effect of the background region frames is the best, and therefore, the region smaller than 800 is regarded as a small region. Meanwhile, if the overlapping area of the two regions is more than 70% of the area of the smaller region, the region having the smaller area is removed.
S4: extracting the area characteristics of the candidate frames, extracting the background areas of the corresponding areas of the adjacent frames according to the coordinates, and cutting the areas on the images of the two adjacent frames;
s5: and (3) contrast of the background frame similarity, namely weighting the corresponding background area of the adjacent frame by using a structure similarity SSIM and histogram similarity Hist combined algorithm to complete contrast of the background similarity of the adjacent frame. In two adjacent frames, SSIM and histogram similarity calculation is performed once for each corresponding background region. And then, according to the area ratio, giving a weighted value to each area, and respectively carrying out weighted addition on the two indexes to obtain the total SSIM and the histogram similarity of the two images. And finally, combining the two similarities to obtain a new similarity index through a harmonic average method, so as to judge and segment the scene change.
The video scene segmentation algorithm based on the similarity of the background region uses a deep learning Faster R-CNN model to select and extract the background content of a video frame, performs video segmentation tests on 16 short videos including 4 types of sports, movies, news and daily life screened from a network, and refers to F-score to judge the segmentation accuracy. For the motion video with complicated video scene change, the accuracy of the algorithm reaches 80.4 percent on average, and the current method only has 64.8 percent without deep learning. Other three types of videos have simpler scenes, and the accuracy of the algorithm result is higher, the film type video reaches 93.7%, the news type video reaches 93.0%, and the daily life type video even reaches 98.1%. And if the deep learning model is not used, the recognition rate is only 70.5%, 71.4% and 80.0%, respectively. According to experimental results, the video segmentation method for selecting background contents by utilizing deep learning and then comparing similarity can effectively improve the accuracy of simple video segmentation and has very good application prospect.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (4)

1. The video scene segmentation method based on deep learning is characterized by comprising the following steps of:
step S1: image preprocessing: converting video data to be segmented into frame images;
step S2: background candidate box identification:
determining a target object in the frame image by using a target detection algorithm, namely a Faster R-CNN algorithm, and further determining a background candidate frame of the frame image:
firstly, a CNN + RPN network (a convolutional neural network + a region generation network) is adopted to generate a candidate frame, namely a candidate region frame, and coordinate information of the candidate frame is labeled;
performing classification regression on the content features in the candidate frames, thereby realizing object target identification;
screening out candidate frames without target objects in the candidate frames to obtain coordinates of background candidate frames of the frame images;
and step S3: selecting key background candidate frames for the frame image:
step S31: screening out background candidate frames with the areas smaller than a preset area threshold;
step S32: and (3) screening out background candidate frames with high overlapping degree: when the overlapping degree of the two overlapped background candidate frames is larger than a preset overlapping degree threshold value, deleting the smaller one of the two overlapped background candidate frames;
wherein, the calculation formula of the overlapping degree is as follows:
Figure FDA0003777297870000011
wherein Area represents the Area, B-box i And B-box j Respectively representing that two overlapped background candidate frames exist, wherein i and j are background candidate frame identifiers;
taking the current residual background candidate frame as a key background candidate frame;
and step S4: determining a background candidate frame corresponding to the position information on an adjacent subsequent image frame of the image frame where the key background candidate frame is located based on the position information of the key background candidate frame;
step S5: calculating the similarity of adjacent image frames:
taking the position area of the key background candidate frame or the background candidate frame as a background area;
taking the key background candidate frame of the previous image frame obtained in the step (4) and the corresponding background candidate frame on the adjacent next image frame as similarity calculation objects of the background areas at the same position of the adjacent image frames;
calculating structural similarity and histogram similarity of the similarity calculation objects respectively;
setting a weight value w for each background region i Comprises the following steps:
Figure FDA0003777297870000012
wherein A is i Represents the area of the ith background region; n represents the number of background regions included in the frame image;
and according to the formula
Figure FDA0003777297870000013
Calculating a joint similarity of adjacent image frames, wherein
Figure FDA0003777297870000021
SSIM i 、Hist i Respectively representing the structural similarity and the histogram similarity of the ith background area corresponding to the two adjacent frame images;
step S6: video scene segmentation:
and if the joint similarity is lower than a preset similarity threshold, performing video segmentation on the video data to be segmented based on the frame position of the current adjacent frame.
2. The method of claim 1, wherein in step S1, a segment of video data to be segmented is frame image sampled at regular intervals to obtain a sequence of frame images.
3. The method of claim 1, wherein the area threshold is set at 800.
4. The method of claim 1, wherein the overlap threshold is set at 70%.
CN201911239331.XA 2019-12-06 2019-12-06 Video scene segmentation method based on deep learning Active CN111008978B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911239331.XA CN111008978B (en) 2019-12-06 2019-12-06 Video scene segmentation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911239331.XA CN111008978B (en) 2019-12-06 2019-12-06 Video scene segmentation method based on deep learning

Publications (2)

Publication Number Publication Date
CN111008978A CN111008978A (en) 2020-04-14
CN111008978B true CN111008978B (en) 2022-10-14

Family

ID=70114962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911239331.XA Active CN111008978B (en) 2019-12-06 2019-12-06 Video scene segmentation method based on deep learning

Country Status (1)

Country Link
CN (1) CN111008978B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950425B (en) * 2020-08-06 2024-05-10 北京达佳互联信息技术有限公司 Object acquisition method, device, client, server, system and storage medium
CN112601068B (en) * 2020-12-15 2023-01-24 山东浪潮科学研究院有限公司 Video data augmentation method, device and computer readable medium
CN112689200B (en) * 2020-12-15 2022-11-11 万兴科技集团股份有限公司 Video editing method, electronic device and storage medium
CN113709584A (en) * 2021-03-05 2021-11-26 腾讯科技(北京)有限公司 Video dividing method, device, server, terminal and storage medium
CN113923378B (en) * 2021-09-29 2024-03-19 北京字跳网络技术有限公司 Video processing method, device, equipment and storage medium
CN114372994B (en) * 2022-01-10 2022-07-22 北京中电兴发科技有限公司 Method for generating background image in video concentration

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7020351B1 (en) * 1999-10-08 2006-03-28 Sarnoff Corporation Method and apparatus for enhancing and indexing video and audio signals
CN104867161A (en) * 2015-05-14 2015-08-26 国家电网公司 Video-processing method and device
CN108537134A (en) * 2018-03-16 2018-09-14 北京交通大学 A kind of video semanteme scene cut and mask method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6404925B1 (en) * 1999-03-11 2002-06-11 Fuji Xerox Co., Ltd. Methods and apparatuses for segmenting an audio-visual recording using image similarity searching and audio speaker recognition
CN100495438C (en) * 2007-02-09 2009-06-03 南京大学 Method for detecting and identifying moving target based on video monitoring
CN101577824B (en) * 2009-06-12 2011-01-19 西安理工大学 Method for extracting compressed domain key frame based on similarity of adjacent I frame DC image
CN102129688B (en) * 2011-02-24 2012-09-05 哈尔滨工业大学 Moving target detection method aiming at complex background
CN103400155A (en) * 2013-06-28 2013-11-20 西安交通大学 Pornographic video detection method based on semi-supervised learning of images
CN106683086B (en) * 2016-12-23 2018-02-27 深圳市大唐盛世智能科技有限公司 The background modeling method and device of a kind of intelligent video monitoring
CN106875406B (en) * 2017-01-24 2020-04-14 北京航空航天大学 Image-guided video semantic object segmentation method and device
CN107563345B (en) * 2017-09-19 2020-05-22 桂林安维科技有限公司 Human body behavior analysis method based on space-time significance region detection
CN110175591B (en) * 2019-05-31 2021-06-22 中科软科技股份有限公司 Method and system for obtaining video similarity
CN110427807B (en) * 2019-06-21 2022-11-15 诸暨思阔信息科技有限公司 Time sequence event action detection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7020351B1 (en) * 1999-10-08 2006-03-28 Sarnoff Corporation Method and apparatus for enhancing and indexing video and audio signals
CN104867161A (en) * 2015-05-14 2015-08-26 国家电网公司 Video-processing method and device
CN108537134A (en) * 2018-03-16 2018-09-14 北京交通大学 A kind of video semanteme scene cut and mask method

Also Published As

Publication number Publication date
CN111008978A (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN111008978B (en) Video scene segmentation method based on deep learning
CN108562589B (en) Method for detecting surface defects of magnetic circuit material
CN108846446B (en) Target detection method based on multi-path dense feature fusion full convolution network
CN109086777B (en) Saliency map refining method based on global pixel characteristics
CN113112519A (en) Key frame screening method based on interested target distribution
Asha et al. Content based video retrieval using SURF descriptor
CN108647703B (en) Saliency-based classification image library type judgment method
CN115240024A (en) Method and system for segmenting extraterrestrial pictures by combining self-supervised learning and semi-supervised learning
CN105825201A (en) Moving object tracking method in video monitoring
Yang et al. Edge computing-based real-time passenger counting using a compact convolutional neural network
CN108664968B (en) Unsupervised text positioning method based on text selection model
CN110765314A (en) Video semantic structural extraction and labeling method
CN106066887B (en) A kind of sequence of advertisements image quick-searching and analysis method
Li et al. An efficient self-learning people counting system
CN114758135A (en) Unsupervised image semantic segmentation method based on attention mechanism
CN115457620A (en) User expression recognition method and device, computer equipment and storage medium
Liu et al. [Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video
Yu et al. Automatic image captioning system using integration of N-cut and color-based segmentation method
Chatur et al. A simple review on content based video images retrieval
Prabakaran et al. Key frame extraction analysis based on optimized convolution neural network (ocnn) using intensity feature selection (ifs)
CN110580503A (en) AI-based double-spectrum target automatic identification method
CN109800818A (en) A kind of image meaning automatic marking and search method and system
Mu et al. Automatic video object segmentation using graph cut
Hao et al. Video summarization based on sparse subspace clustering with automatically estimated number of clusters
Zhu et al. [Retracted] Basketball Object Extraction Method Based on Image Segmentation Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant