CN107766838B - Video scene switching detection method - Google Patents

Video scene switching detection method Download PDF

Info

Publication number
CN107766838B
CN107766838B CN201711089563.2A CN201711089563A CN107766838B CN 107766838 B CN107766838 B CN 107766838B CN 201711089563 A CN201711089563 A CN 201711089563A CN 107766838 B CN107766838 B CN 107766838B
Authority
CN
China
Prior art keywords
video scene
detection model
scene switching
layer
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711089563.2A
Other languages
Chinese (zh)
Other versions
CN107766838A (en
Inventor
苏许臣
朱立松
黄建杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cntv Wuxi Co ltd
Original Assignee
Cntv Wuxi Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cntv Wuxi Co ltd filed Critical Cntv Wuxi Co ltd
Priority to CN201711089563.2A priority Critical patent/CN107766838B/en
Publication of CN107766838A publication Critical patent/CN107766838A/en
Application granted granted Critical
Publication of CN107766838B publication Critical patent/CN107766838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video scene switching detection method, which belongs to the technical field of multimedia information processing, and completes detection through a video scene switching detection model, wherein the detection comprises training of the video scene switching detection model and application of the video scene switching detection model. The invention adopts a deep learning algorithm, and the discrimination threshold value of the model is automatically adjusted to be optimal in the training process, so that the threshold value is not required to be set; the model input increases the frame difference of two frames, so that the convergence speed of the model is higher; because the model adopts the batch normalization technology to prevent over-fitting training, the generalization capability of the model is improved.

Description

Video scene switching detection method
Technical Field
The invention relates to a video detection method, in particular to a video scene switching detection method, and belongs to the technical field of multimedia information processing.
Background
A video is generally composed of a plurality of scenes, one scene is composed of a plurality of video frames, video scene detection refers to finding out the frames and frame positions of one video, wherein scene switching occurs, the obtained positions can be used for fast and precise editing of the video, and a frame sequence composed of the obtained frames can be used for roughly describing the whole video content.
At present, the conventional video scene detection method generally adopts a mode of manually extracting features, such as calculating color histogram similarity of adjacent frames, or directly calculating frame difference, or detecting scene switching by using the change degree feature VH of the high-frequency subband coefficient of each frame in a video scene, wherein an algorithm such as three-dimensional wavelet transform is required for calculating the high-frequency subband coefficient, for example, chinese patent application No. 200810118534.9, these techniques all calculate a feature value and then compare with a threshold value, and if the feature value is greater than the threshold value or less than the threshold value, the frame is determined to be switched. There are also adaptive threshold algorithms based on the above technology, such as the method for detecting video scene change based on adaptive threshold described in chinese patent application No. 201410466385.0, but the size of the sliding window and the preset value B still need to be manually set.
At present, the traditional video scene detection method adopts the classical mathematical algorithm to extract features, the design of the algorithm is complex, the quality of the algorithm determines the final accuracy, in addition, the traditional algorithm can not avoid the setting of various thresholds, such as the threshold of the similarity, the threshold of a sliding window and the like, the setting of the thresholds needs to be obtained by experience, and the quality of the setting of the thresholds also determines the detection accuracy.
Disclosure of Invention
The invention mainly aims to provide a video scene switching detection method, which trains a large number of switching frame pairs and non-switching frame pairs prepared in advance into models, extracts adjacent frames of a video to be detected and sequentially inputs the extracted adjacent frames into the trained models, finds out the positions of all the switching frames according to the output of the models, does not need to specify any threshold value, and has high accuracy.
The purpose of the invention can be achieved by adopting the following technical scheme:
a video scene switching detection method is characterized in that detection is completed through a video scene switching detection model, and the detection comprises training of the video scene switching detection model and application of the video scene switching detection model.
Further, the training of the video scene change detection model includes the following steps:
step 11: defining parameters of a video scene switching detection model;
step 12: constructing a video scene switching detection model;
step 13: defining a loss function, and adopting cross entropy as the loss function;
step 14: defining an optimizer and adopting an Adam optimization algorithm;
step 15: defining an evaluation function to calculate the discrimination accuracy of the video scene switching detection model;
step 16: training and evaluating the video scene switching detection model, and storing the parameters once every 20 times of training.
Further, the application of the video scene change detection model comprises the following steps:
step 21: sequentially reading a frame of a video to be detected, and resize to 96x 96;
step 22: inputting the current frame and the previous frame into the trained video scene switching detection model to obtain the output result of the video scene switching detection model;
step 23: and if the output result of the video scene switching detection model is a switching frame, outputting the current frame sequence number and storing the frame.
Further, the video scene switching detection model comprises a PAD layer, a plurality of convolution groups, a Reshape layer, a full link layer 512, a full link layer 2 and a Softmax layer.
Further, the convolution groups include a convolution group of 9 × 9 × 32, a convolution group of 3 × 3 × 64, and a convolution group of 5 × 5 × 128.
Further, each convolution group comprises a convolution layer, a Relu layer, a pooling layer and a batch normalization layer.
Further, the convolution kernel size of the convolution group 9 × 9 × 32 is 9 × 9, and the output feature number is 32;
the convolution kernel of the convolution group 3 × 3 × 64 is 3 × 3, and the output feature number is 64;
the convolution kernel of the convolution group 5 × 5 × 128 is 5 × 5, and the output feature number is 128;
the step size of the pooling layer is 2x 2.
Further, the input of the video scene cut detection model is a pair of image frames, denoted X1 and X2, respectively, with the size of the image being 96 × 96 × 3.
Further, the detecting of the video scene switching detection model includes: inputting X1, X2 and X1-X2 into a PAD layer, superposing three images together on the PAD layer to form a 96 × 96 × 9 matrix, and outputting the matrix after being subjected to a convolution group of 9 × 9 × 32 to form a 48 × 48 × 32 matrix; and finally, calculating the probability of outputting switching frames and non-switching frames by using a Softmax layer, and taking the larger one of the switching frames and the non-switching frames to represent the final judgment output result.
The invention has the beneficial technical effects that: according to the video scene switching detection method, the video scene switching detection method provided by the invention adopts a deep learning algorithm, and the discrimination threshold value of the model is automatically adjusted to be optimal from the training process, so that the threshold value is not required to be set; the model input increases the frame difference of two frames, so that the convergence speed of the model is higher; because the model adopts the batch normalization technology to prevent over-fitting training, the generalization capability of the model is improved.
Drawings
Fig. 1 is a schematic diagram of a model structure of a preferred embodiment of a video scene change detection method according to the present invention;
FIG. 2 is a schematic diagram of a convolution group model in accordance with a preferred embodiment of the video scene change detection method of the present invention;
fig. 3 is a flowchart of a model application of a video scene change detection method according to a preferred embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention more clear and definite for those skilled in the art, the present invention is further described in detail below with reference to the examples and the accompanying drawings, but the embodiments of the present invention are not limited thereto.
As shown in fig. 1, fig. 2, and fig. 3, in the video scene switching detection method provided in this embodiment, detection is completed through a video scene switching detection model, which includes training of the video scene switching detection model and application of the video scene switching detection model; the training of the video scene switching detection model comprises the following steps:
step 11: defining parameters of a video scene switching detection model;
step 12: constructing a video scene switching detection model;
step 13: defining a loss function, and adopting cross entropy as the loss function;
step 14: defining an optimizer and adopting an Adam optimization algorithm;
step 15: defining an evaluation function to calculate the discrimination accuracy of the video scene switching detection model;
step 16: training and evaluating the video scene switching detection model, and storing the parameters once every 20 times of training.
Further, the application of the video scene change detection model comprises the following steps:
step 21: sequentially reading a frame of a video to be detected, and resize to 96x 96;
step 22: inputting the current frame and the previous frame into the trained video scene switching detection model to obtain the output result of the video scene switching detection model;
step 23: and if the output result of the video scene switching detection model is a switching frame, outputting the current frame sequence number and storing the frame.
Further, in this embodiment, as shown in fig. 1 and fig. 2, the video scene change detection model includes a PAD layer, a plurality of convolution groups, a Reshape layer, a full-link layer 512, a full-link layer 2, and a Softmax layer; the convolution groups include a convolution group of 9 × 9 × 32, a convolution group of 3 × 3 × 64, and a convolution group of 5 × 5 × 128; each convolution group comprises a convolution layer, a Relu layer, a pooling layer and a batch normalization layer.
Further, in the present embodiment, as shown in fig. 1, the convolution kernel size of the convolution group 9 × 9 × 32 is 9 × 9, and the output feature number is 32;
the convolution kernel of the convolution group 3 × 3 × 64 is 3 × 3, and the output feature number is 64;
the convolution kernel of the convolution group 5 × 5 × 128 is 5 × 5, and the output feature number is 128;
the step size of the pooling layer is 2x 2.
Further, in the present embodiment, the input of the video scene change detection model is a pair of image frames, which are respectively denoted as X1 and X2, and the size of the image is 96 × 96 × 3; the detection of the video scene switching detection model comprises the following steps: inputting X1, X2 and X1-X2 into a PAD layer, superposing three images together on the PAD layer to form a 96 × 96 × 9 matrix, and outputting the matrix after being subjected to a convolution group of 9 × 9 × 32 to form a 48 × 48 × 32 matrix; and finally, calculating the probability of outputting switching frames and non-switching frames by using a Softmax layer, and taking the larger one of the switching frames and the non-switching frames to represent the final judgment output result.
Further, in the present embodiment, the composition of the model is first described. As shown in fig. 1, the input to the model is a pair of image frames, denoted X1, X2, respectively, the size of the image being 96 × 96 × 3(3 representing the number of channels). Inputting X1, X2 and X1-X2 into a PAD layer, superposing three images together at the PAD layer to form a 96X9 matrix, and passing through a convolution group which comprises a convolution layer, a relu layer, a max-posing pooling layer and a batch normalization layer (batch normalization), wherein the convolution kernel size of the convolution layer is 9X 9, the output characteristic number is 32, and the step size of the pooling layer is 2X 2, so that the output becomes a 48X 32 matrix after passing through the convolution group; then, after passing through the second convolution group (the convolution kernel of which is 3 × 3 and the output characteristic number is 64), a 24 × 24 × 64 matrix is output, then, after passing through the third convolution group (the convolution kernel of which is 5 × 5 and the output characteristic number is 128), a 12 × 12 × 128 matrix is output, then, the matrix is flattened into a one-dimensional matrix 1 × 18432(18432 is 12 × 12 × 128) through a reshape layer, then, the output becomes 1 × 2 through two full-connection layers, and finally, the output layer is an output layer, the probabilities of outputting the two types are calculated by softmax, respectively represent the probabilities of a switching frame and a non-switching frame, and the larger one of the two types is taken to represent the final judgment output result, for example, the output [0.886,0.114] represents the switching frame.
In summary, in this embodiment, according to the video scene switching detection method of this embodiment, the video scene switching detection method provided in this embodiment adopts a deep learning algorithm, and the discrimination threshold of the model is automatically adjusted to be optimal from the training process, so that no threshold needs to be set; the model input increases the frame difference of two frames, so that the convergence speed of the model is higher; because the model adopts the batch normalization technology to prevent over-fitting training, the generalization capability of the model is improved.
The above description is only for the purpose of illustrating the present invention and is not intended to limit the scope of the present invention, and any person skilled in the art can substitute or change the technical solution of the present invention and its conception within the scope of the present invention.

Claims (8)

1. A video scene switching detection method is characterized in that detection is completed through a video scene switching detection model, and the detection comprises training of the video scene switching detection model and application of the video scene switching detection model;
the training of the video scene switching detection model comprises the following steps:
step 11: defining parameters of a video scene switching detection model;
step 12: constructing a video scene switching detection model;
step 13: defining a loss function, and adopting cross entropy as the loss function;
step 14: defining an optimizer and adopting an Adam optimization algorithm;
step 15: defining an evaluation function to calculate the discrimination accuracy of the video scene switching detection model;
step 16: training and evaluating the video scene switching detection model, and storing the parameters once every 20 times of training.
2. The method according to claim 1, wherein the application of the video scene cut detection model comprises the following steps:
step 21: sequentially reading a frame of a video to be detected, and resize to 96x 96;
step 22: inputting the current frame and the previous frame into the trained video scene switching detection model to obtain the output result of the video scene switching detection model;
step 23: and if the output result of the video scene switching detection model is a switching frame, outputting the current frame sequence number and storing the frame.
3. The method according to claim 1, wherein the video scene cut detection model comprises a PAD layer, a plurality of convolution groups, a Reshape layer, a full link layer 512, a full link layer 2, and a Softmax layer.
4. The method of claim 3, wherein said convolution group comprises convolution group 9 x 32, convolution group 3 x 64 and convolution group 5 x 128.
5. The method of claim 3, wherein each convolution group comprises a convolution layer, a Relu layer, a pooling layer, and a batch normalization layer.
6. The method of claim 5, wherein the convolution group has a convolution kernel size of 9 x 32 of 9 x9, and an output feature number of 32;
the convolution kernel of the convolution group 3 × 3 × 64 is 3 × 3, and the output feature number is 64;
the convolution kernel of the convolution group 5 × 5 × 128 is 5 × 5, and the output feature number is 128;
the step size of the pooling layer is 2x 2.
7. A method as claimed in claim 3, wherein the input of the video scene cut detection model is a pair of image frames, denoted X1 and X2, respectively, the size of the image being 96X 3.
8. The method according to claim 1, wherein the detecting of the video scene cut detection model comprises: inputting X1, X2 and X1-X2 into a PAD layer, superposing three images together on the PAD layer to form a 96 × 96 × 9 matrix, and outputting the matrix after being subjected to a convolution group of 9 × 9 × 32 to form a 48 × 48 × 32 matrix; and finally, calculating the probability of outputting switching frames and non-switching frames by using a Softmax layer, and taking the larger one of the switching frames and the non-switching frames to represent the final judgment output result.
CN201711089563.2A 2017-11-08 2017-11-08 Video scene switching detection method Active CN107766838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711089563.2A CN107766838B (en) 2017-11-08 2017-11-08 Video scene switching detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711089563.2A CN107766838B (en) 2017-11-08 2017-11-08 Video scene switching detection method

Publications (2)

Publication Number Publication Date
CN107766838A CN107766838A (en) 2018-03-06
CN107766838B true CN107766838B (en) 2021-06-01

Family

ID=61273831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711089563.2A Active CN107766838B (en) 2017-11-08 2017-11-08 Video scene switching detection method

Country Status (1)

Country Link
CN (1) CN107766838B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110876143A (en) * 2018-08-31 2020-03-10 北京意锐新创科技有限公司 Method and device for preventing switching application system based on mobile payment equipment
CN110377794B (en) * 2019-06-12 2022-04-01 杭州当虹科技股份有限公司 Video feature description and duplicate removal retrieval processing method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SU654952A1 (en) * 1978-02-06 1979-03-30 Ставропольское высшее военное инженерное училище связи Device for teaching pupils to detect signals against the noise background
CN103458261A (en) * 2013-09-08 2013-12-18 华东电网有限公司 Video scene variation detection method based on stereoscopic vision
CN104615986A (en) * 2015-01-30 2015-05-13 中国科学院深圳先进技术研究院 Method for utilizing multiple detectors to conduct pedestrian detection on video images of scene change
CN105005772A (en) * 2015-07-20 2015-10-28 北京大学 Video scene detection method
CN105718890A (en) * 2016-01-22 2016-06-29 北京大学 Method for detecting specific videos based on convolution neural network
CN105930402A (en) * 2016-04-15 2016-09-07 乐视控股(北京)有限公司 Convolutional neural network based video retrieval method and system
CN106446930A (en) * 2016-06-28 2017-02-22 沈阳工业大学 Deep convolutional neural network-based robot working scene identification method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SU654952A1 (en) * 1978-02-06 1979-03-30 Ставропольское высшее военное инженерное училище связи Device for teaching pupils to detect signals against the noise background
CN103458261A (en) * 2013-09-08 2013-12-18 华东电网有限公司 Video scene variation detection method based on stereoscopic vision
CN104615986A (en) * 2015-01-30 2015-05-13 中国科学院深圳先进技术研究院 Method for utilizing multiple detectors to conduct pedestrian detection on video images of scene change
CN105005772A (en) * 2015-07-20 2015-10-28 北京大学 Video scene detection method
CN105718890A (en) * 2016-01-22 2016-06-29 北京大学 Method for detecting specific videos based on convolution neural network
CN105930402A (en) * 2016-04-15 2016-09-07 乐视控股(北京)有限公司 Convolutional neural network based video retrieval method and system
CN106446930A (en) * 2016-06-28 2017-02-22 沈阳工业大学 Deep convolutional neural network-based robot working scene identification method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Comparison of Scene Change Detection Algorithms for Videos;Bindu Reddy等;《2015 Fifth International Conference on Advanced Computing & Communication Technologies》;20150406;第84-89页 *
Hybrid approach for video compression based on scene change detection;Ankita P. Chauhan等;《 2013 IEEE International Conference on Signal Processing, Computing and Control (ISPCC)》;20131114;第1-5页 *
Scene Change Detection Using DCT Features in Transform Domain Video Indexing;S. Primechaev等;《 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services》;20071112;第369-372页 *
适配分辨率动态变化的低复杂度视频场景切换检测方法;方宏俊等;《计算机科学》;20170228;第44卷(第2期);摘要 *

Also Published As

Publication number Publication date
CN107766838A (en) 2018-03-06

Similar Documents

Publication Publication Date Title
CN109614922B (en) Dynamic and static gesture recognition method and system
CN108510485B (en) Non-reference image quality evaluation method based on convolutional neural network
CN109583340B (en) Video target detection method based on deep learning
CN110516536B (en) Weak supervision video behavior detection method based on time sequence class activation graph complementation
CN103366180B (en) A kind of cell image segmentation method based on automated characterization study
CN107145889B (en) Target identification method based on double CNN network with RoI pooling
CN106709453B (en) Sports video key posture extraction method based on deep learning
CN110287777B (en) Golden monkey body segmentation algorithm in natural scene
CN109740721B (en) Wheat ear counting method and device
CN109446922B (en) Real-time robust face detection method
CN111062278B (en) Abnormal behavior identification method based on improved residual error network
CN110533022B (en) Target detection method, system, device and storage medium
CN111079539B (en) Video abnormal behavior detection method based on abnormal tracking
CN107944354B (en) Vehicle detection method based on deep learning
CN112950477A (en) High-resolution saliency target detection method based on dual-path processing
CN110969164A (en) Low-illumination imaging license plate recognition method and device based on deep learning end-to-end
CN112418032A (en) Human behavior recognition method and device, electronic equipment and storage medium
CN107463932A (en) A kind of method that picture feature is extracted using binary system bottleneck neutral net
CN114445651A (en) Training set construction method and device of semantic segmentation model and electronic equipment
CN107766838B (en) Video scene switching detection method
CN105825234A (en) Superpixel and background model fused foreground detection method
CN109978858B (en) Double-frame thumbnail image quality evaluation method based on foreground detection
CN104268845A (en) Self-adaptive double local reinforcement method of extreme-value temperature difference short wave infrared image
CN112446417B (en) Spindle-shaped fruit image segmentation method and system based on multilayer superpixel segmentation
CN110781936B (en) Construction method of threshold learnable local binary network based on texture description and deep learning and remote sensing image classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant