CN110287876A - A kind of content identification method based on video image - Google Patents

A kind of content identification method based on video image Download PDF

Info

Publication number
CN110287876A
CN110287876A CN201910556426.8A CN201910556426A CN110287876A CN 110287876 A CN110287876 A CN 110287876A CN 201910556426 A CN201910556426 A CN 201910556426A CN 110287876 A CN110287876 A CN 110287876A
Authority
CN
China
Prior art keywords
layer
video image
model
content
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910556426.8A
Other languages
Chinese (zh)
Inventor
孙绍辉
曹勇
田云龙
孙绍光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heilongjiang Electric Power Dispatching Industry Co Ltd
Original Assignee
Heilongjiang Electric Power Dispatching Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heilongjiang Electric Power Dispatching Industry Co Ltd filed Critical Heilongjiang Electric Power Dispatching Industry Co Ltd
Priority to CN201910556426.8A priority Critical patent/CN110287876A/en
Publication of CN110287876A publication Critical patent/CN110287876A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Abstract

A kind of content identification method based on video image, the invention belongs to artificial intelligence fields, and in particular to a kind of video image identification method.It is an object of the present invention to solve the problems, such as that the existing identification real-time based on video content is bad.The present invention constructs image recognition network model first, is then directed to video image, extracts key frame images;Key frame images are handled using image recognition network model, determine the content object of image;The optical flow field information between two field pictures is calculated by optical flow method simultaneously, the feature of key frame is transferred to other frame images;Then model is trained, obtains trained final identification model;The content of video image is identified using trained final identification model.The present invention is used for the content recognition of video image.

Description

A kind of content identification method based on video image
Technical field
The invention belongs to artificial intelligence fields, and in particular to a kind of video image identification method.
Background technique
With the gradually development of science and technology, the development such as automatic Pilot technology, robot technology are getting faster, and technology is accordingly more next More mature, no wheel is automatic Pilot technical field or robotic technology field etc., it is desirable to realize autonomous classification and independently judge It is typically handled based on image, in particular for the autonomous classification in automatic Pilot and robot (in motion process Collision prevention etc.) etc. it is most of be that base and video image are handled.
But current video image processing has certain disadvantage: the data volume of video is huge, not only for Image Acquisition There is very high requirement with hardware such as image procossings, also has higher requirement for the software environment of processing, cause existing Hardware or software processing speed are slower, are not able to satisfy the requirement of real-time.In particular for for automatic Pilot technology, to reality The requirement of when property judgement is high, if not being able to satisfy the requirement of real-time, not can guarantee traffic safety, if in order to guarantee reality The requirement of when property, then may need to sacrifice the precision of images be cost, lessen in this way content recognition accuracy or this Cause rate of false alarm to increase, great security risk is equally existed to traffic safety.This is also to restrict to have real-time simultaneously It is required that the fields such as robot development.
Summary of the invention
It is an object of the present invention to solve the problems, such as that the existing identification real-time based on video content is bad.
A kind of content identification method based on video image, comprising the following steps:
Step 1, building image recognition network model:
The structure of the image recognition network model are as follows: input layer, the first convolutional layer, the first pond layer, the second convolution Layer, the second pond layer, third convolutional layer, third pond layer, fisrt feature splicing layer, second feature splice layer, output layer;It is described Fisrt feature splices layer splicing and carries out merging features to the characteristic pattern of third pond layer and the second pond layer characteristic pattern, then rolls up It is again passed by after product, batch standardization, ReLU activation fusion and carries out the processing of attention mechanism, characteristic information input second feature is spelled Connect layer;Second feature splices layer and the characteristic pattern of fisrt feature splicing layer input and the first pond layer characteristic pattern is carried out feature spelling It connects, is then again passed by after convolution, batch standardization, ReLU activation fusion and carry out the processing of attention mechanism, by depth characteristic information Input and output layer;
Step 2 is directed to video image, extracts key frame images;
Key frame images are handled using image recognition network model, determine the content object of image;
The optical flow field information between two field pictures is calculated by optical flow method simultaneously, the feature of key frame is transferred to other frames Image;
Step 3 is trained for the model of step 2, obtains trained final identification model;
Step 4 identifies the content of video image using trained final identification model.
The invention has the benefit that
The parameter for the image recognition network model that the present invention constructs can control in reasonable range, while needle of the present invention Processing is distinguished to key frame and non-key frame, to ensure the real-time identified to video content;This hair simultaneously Bright content recognition accuracy rate can also reach 90 percent, have good video image content recognition effect.
Detailed description of the invention
Fig. 1 is the schematic diagram for constructing image recognition network model.
Specific embodiment
Specific embodiment 1:
A kind of content identification method based on video image, comprising the following steps:
Step 1, as shown in Figure 1, building image recognition network model:
The structure of the image recognition network model are as follows: input layer, the first convolutional layer, the first pond layer, the second convolution Layer, the second pond layer, third convolutional layer, third pond layer, fisrt feature splicing layer, second feature splice layer, output layer;It is described Fisrt feature splices layer splicing and carries out merging features to the characteristic pattern of third pond layer and the second pond layer characteristic pattern, then rolls up It is again passed by after product, batch standardization, ReLU activation fusion and carries out the processing of attention mechanism, characteristic information input second feature is spelled Connect layer;Second feature splices layer and the characteristic pattern of fisrt feature splicing layer input and the first pond layer characteristic pattern is carried out feature spelling It connects, is then again passed by after convolution, batch standardization, ReLU activation fusion and carry out the processing of attention mechanism, by depth characteristic information Input and output layer;
Step 2 is directed to video image, extracts key frame images;It extracts key frame images and uses existing method, In present embodiment, key frame images are extracted using based on Content Analysis Method, this mode is simple and convenient, can help whole calculation Method meets the requirement of real-time, while on the content object that can identify with key frame images to the content of image of this method more Add similar, advantageously ensures that the accuracy of algorithm.It is carried out based on color and texture that Content Analysis Method is based on every frame image etc. Key-frame extraction determines key frame according to the difference of picture frame and the threshold value of setting.
Key frame images are handled using image recognition network model, determine the content object of image;
The optical flow field information between two field pictures is calculated by light stream (Optical Flow) method simultaneously, by the spy of key frame Sign is transferred to other frame images;
Light stream in present embodiment is dense optical flow, under the visualization pseudocode of light stream enters:
When the described carry out light stream visualization, tone H: being measured with angle, and value range is 0 °~360 °, since red It calculates counterclockwise, red is 0 °, and green is 120 °, and blue is 240 °;Saturation degree S: value range is 0.0~1.0;It is bright Spend V: value range is 0.0 (black)~1.0 (white).Flownet is that V is assigned a value of 255, this function follows flownet, is satisfied The size of pixel displacement is represented with degree S.
Step 3 is trained for the model of step 2, obtains final identification model;It is tested using test set;Such as The final identification model of fruit meets discrimination requirement, then is used as trained final identification model, and otherwise return step 1 is readjusted Model parameter.
Loss function all uses cross entropy loss function when being trained, and is shown below:
Wherein N is the total number for the training sample chosen, and k represents k-th of the sample chosen when training, and j is data set Class number;pkIndicate the probability of k-th of sample, pkIndicate the probability of jth class.
Step 4 identifies the content of video image using trained final identification model.

Claims (5)

1. a kind of content identification method based on video image, which comprises the following steps:
Step 1, building image recognition network model:
The structure of the image recognition network model are as follows: input layer, the first convolutional layer, the first pond layer, the second convolutional layer, Two pond layers, third convolutional layer, third pond layer, fisrt feature splicing layer, second feature splice layer, output layer;Described first Merging features layer splicing to the characteristic pattern of third pond layer and the second pond layer characteristic pattern progress merging features, then convolution, criticize It is again passed by after standardization, ReLU activation fusion and carries out the processing of attention mechanism, characteristic information input second feature is spliced into layer; Second feature splices layer and the characteristic pattern of fisrt feature splicing layer input and the first pond layer characteristic pattern is carried out merging features, then It is again passed by after convolution, batch standardization, ReLU activation fusion and carries out the processing of attention mechanism, depth characteristic information input is exported Layer;
Step 2 is directed to video image, extracts key frame images;
Key frame images are handled using image recognition network model, determine the content object of image;
The optical flow field information between two field pictures is calculated by optical flow method simultaneously, the feature of key frame is transferred to other frame figures Picture;
Step 3 is trained for the model of step 2, obtains trained final identification model;
Step 4 identifies the content of video image using trained final identification model.
2. a kind of content identification method based on video image according to claim 1, which is characterized in that the first volume Lamination, the second convolutional layer, third convolutional layer activation primitive be RELU.
3. a kind of content identification method based on video image according to claim 1, which is characterized in that extract key frame The process of image, which is used, extracts key frame images based on Content Analysis Method.
4. a kind of content identification method based on video image according to claim 1,2 or 3, which is characterized in that be directed to Loss function all uses cross entropy loss function when the model of step 2 is trained, and is shown below:
Wherein N is the total number for the training sample chosen, and k represents k-th of the sample chosen when training, and j is the classification of data set Number;pkIndicate the probability of k-th of sample, pkIndicate the probability of jth class.
5. a kind of content identification method based on video image according to claim 4, which is characterized in that be directed to step 2 Model be trained the final identification model after being trained after, tested using test set;If final identification model Meet discrimination requirement, be then used as trained final identification model, otherwise return step 1 readjusts model parameter.
CN201910556426.8A 2019-06-25 2019-06-25 A kind of content identification method based on video image Pending CN110287876A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910556426.8A CN110287876A (en) 2019-06-25 2019-06-25 A kind of content identification method based on video image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910556426.8A CN110287876A (en) 2019-06-25 2019-06-25 A kind of content identification method based on video image

Publications (1)

Publication Number Publication Date
CN110287876A true CN110287876A (en) 2019-09-27

Family

ID=68005684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910556426.8A Pending CN110287876A (en) 2019-06-25 2019-06-25 A kind of content identification method based on video image

Country Status (1)

Country Link
CN (1) CN110287876A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110672343A (en) * 2019-09-29 2020-01-10 电子科技大学 Rotary machine fault diagnosis method based on multi-attention convolutional neural network
CN111652081A (en) * 2020-05-13 2020-09-11 电子科技大学 Video semantic segmentation method based on optical flow feature fusion
CN112446342A (en) * 2020-12-07 2021-03-05 北京邮电大学 Key frame recognition model training method, recognition method and device
CN115115822A (en) * 2022-06-30 2022-09-27 小米汽车科技有限公司 Vehicle-end image processing method and device, vehicle, storage medium and chip

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092883A (en) * 2017-04-20 2017-08-25 上海极链网络科技有限公司 Object identification method for tracing
CN109740419A (en) * 2018-11-22 2019-05-10 东南大学 A kind of video behavior recognition methods based on Attention-LSTM network
CN109871781A (en) * 2019-01-28 2019-06-11 山东大学 Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092883A (en) * 2017-04-20 2017-08-25 上海极链网络科技有限公司 Object identification method for tracing
CN109740419A (en) * 2018-11-22 2019-05-10 东南大学 A kind of video behavior recognition methods based on Attention-LSTM network
CN109871781A (en) * 2019-01-28 2019-06-11 山东大学 Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
俞璜悦等: "基于用户兴趣语义的视频关键帧提取", 《计算机应用》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110672343A (en) * 2019-09-29 2020-01-10 电子科技大学 Rotary machine fault diagnosis method based on multi-attention convolutional neural network
CN110672343B (en) * 2019-09-29 2021-01-26 电子科技大学 Rotary machine fault diagnosis method based on multi-attention convolutional neural network
CN111652081A (en) * 2020-05-13 2020-09-11 电子科技大学 Video semantic segmentation method based on optical flow feature fusion
CN111652081B (en) * 2020-05-13 2022-08-05 电子科技大学 Video semantic segmentation method based on optical flow feature fusion
CN112446342A (en) * 2020-12-07 2021-03-05 北京邮电大学 Key frame recognition model training method, recognition method and device
CN115115822A (en) * 2022-06-30 2022-09-27 小米汽车科技有限公司 Vehicle-end image processing method and device, vehicle, storage medium and chip
CN115115822B (en) * 2022-06-30 2023-10-31 小米汽车科技有限公司 Vehicle-end image processing method and device, vehicle, storage medium and chip

Similar Documents

Publication Publication Date Title
CN110287876A (en) A kind of content identification method based on video image
CN106952269B (en) The reversible video foreground object sequence detection dividing method of neighbour and system
CN109190475B (en) Face recognition network and pedestrian re-recognition network collaborative training method
CN110796018B (en) Hand motion recognition method based on depth image and color image
CN110033040B (en) Flame identification method, system, medium and equipment
CN109635728B (en) Heterogeneous pedestrian re-identification method based on asymmetric metric learning
CN110298297A (en) Flame identification method and device
CN104240256A (en) Image salient detecting method based on layering sparse modeling
CN107067015A (en) A kind of vehicle checking method and device based on multiple features deep learning
CN112487981A (en) MA-YOLO dynamic gesture rapid recognition method based on two-way segmentation
CN113780132A (en) Lane line detection method based on convolutional neural network
CN112712052A (en) Method for detecting and identifying weak target in airport panoramic video
CN113298024A (en) Unmanned aerial vehicle ground small target identification method based on lightweight neural network
CN110991412A (en) Face recognition method and device, storage medium and electronic equipment
CN113034506A (en) Remote sensing image semantic segmentation method and device, computer equipment and storage medium
CN115359406A (en) Post office scene figure interaction behavior recognition method and system
CN114202775A (en) Transformer substation dangerous area pedestrian intrusion detection method and system based on infrared image
CN106960188B (en) Weather image classification method and device
WO2022222036A1 (en) Method and apparatus for determining parking space
CN109977738A (en) A kind of video scene segmentation judgment method, intelligent terminal and storage medium
CN112686122A (en) Human body and shadow detection method, device, electronic device and storage medium
CN116912648A (en) Method, device, equipment and storage medium for generating material parameter identification model
CN110796008A (en) Early fire detection method based on video image
CN112200840B (en) Moving object detection system in visible light and infrared image combination
CN114998801A (en) Forest fire smoke video detection method based on contrast self-supervision learning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190927