CN110287876A

CN110287876A - A kind of content identification method based on video image

Info

Publication number: CN110287876A
Application number: CN201910556426.8A
Authority: CN
Inventors: 孙绍辉; 曹勇; 田云龙; 孙绍光
Original assignee: Heilongjiang Electric Power Dispatching Industry Co Ltd
Current assignee: Heilongjiang Electric Power Dispatching Industry Co Ltd
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2019-09-27

Abstract

A kind of content identification method based on video image, the invention belongs to artificial intelligence fields, and in particular to a kind of video image identification method.It is an object of the present invention to solve the problems, such as that the existing identification real-time based on video content is bad.The present invention constructs image recognition network model first, is then directed to video image, extracts key frame images；Key frame images are handled using image recognition network model, determine the content object of image；The optical flow field information between two field pictures is calculated by optical flow method simultaneously, the feature of key frame is transferred to other frame images；Then model is trained, obtains trained final identification model；The content of video image is identified using trained final identification model.The present invention is used for the content recognition of video image.

Description

A kind of content identification method based on video image

Technical field

The invention belongs to artificial intelligence fields, and in particular to a kind of video image identification method.

Background technique

With the gradually development of science and technology, the development such as automatic Pilot technology, robot technology are getting faster, and technology is accordingly more next More mature, no wheel is automatic Pilot technical field or robotic technology field etc., it is desirable to realize autonomous classification and independently judge It is typically handled based on image, in particular for the autonomous classification in automatic Pilot and robot (in motion process Collision prevention etc.) etc. it is most of be that base and video image are handled.

But current video image processing has certain disadvantage: the data volume of video is huge, not only for Image Acquisition There is very high requirement with hardware such as image procossings, also has higher requirement for the software environment of processing, cause existing Hardware or software processing speed are slower, are not able to satisfy the requirement of real-time.In particular for for automatic Pilot technology, to reality The requirement of when property judgement is high, if not being able to satisfy the requirement of real-time, not can guarantee traffic safety, if in order to guarantee reality The requirement of when property, then may need to sacrifice the precision of images be cost, lessen in this way content recognition accuracy or this Cause rate of false alarm to increase, great security risk is equally existed to traffic safety.This is also to restrict to have real-time simultaneously It is required that the fields such as robot development.

Summary of the invention

It is an object of the present invention to solve the problems, such as that the existing identification real-time based on video content is bad.

A kind of content identification method based on video image, comprising the following steps:

Step 1, building image recognition network model:

The structure of the image recognition network model are as follows: input layer, the first convolutional layer, the first pond layer, the second convolution Layer, the second pond layer, third convolutional layer, third pond layer, fisrt feature splicing layer, second feature splice layer, output layer；It is described Fisrt feature splices layer splicing and carries out merging features to the characteristic pattern of third pond layer and the second pond layer characteristic pattern, then rolls up It is again passed by after product, batch standardization, ReLU activation fusion and carries out the processing of attention mechanism, characteristic information input second feature is spelled Connect layer；Second feature splices layer and the characteristic pattern of fisrt feature splicing layer input and the first pond layer characteristic pattern is carried out feature spelling It connects, is then again passed by after convolution, batch standardization, ReLU activation fusion and carry out the processing of attention mechanism, by depth characteristic information Input and output layer；

Step 2 is directed to video image, extracts key frame images；

Key frame images are handled using image recognition network model, determine the content object of image；

The optical flow field information between two field pictures is calculated by optical flow method simultaneously, the feature of key frame is transferred to other frames Image；

Step 3 is trained for the model of step 2, obtains trained final identification model；

Step 4 identifies the content of video image using trained final identification model.

The invention has the benefit that

The parameter for the image recognition network model that the present invention constructs can control in reasonable range, while needle of the present invention Processing is distinguished to key frame and non-key frame, to ensure the real-time identified to video content；This hair simultaneously Bright content recognition accuracy rate can also reach 90 percent, have good video image content recognition effect.

Detailed description of the invention

Fig. 1 is the schematic diagram for constructing image recognition network model.

Specific embodiment

Specific embodiment 1:

Step 1, as shown in Figure 1, building image recognition network model:

Step 2 is directed to video image, extracts key frame images；It extracts key frame images and uses existing method, In present embodiment, key frame images are extracted using based on Content Analysis Method, this mode is simple and convenient, can help whole calculation Method meets the requirement of real-time, while on the content object that can identify with key frame images to the content of image of this method more Add similar, advantageously ensures that the accuracy of algorithm.It is carried out based on color and texture that Content Analysis Method is based on every frame image etc. Key-frame extraction determines key frame according to the difference of picture frame and the threshold value of setting.

The optical flow field information between two field pictures is calculated by light stream (Optical Flow) method simultaneously, by the spy of key frame Sign is transferred to other frame images；

Light stream in present embodiment is dense optical flow, under the visualization pseudocode of light stream enters:

When the described carry out light stream visualization, tone H: being measured with angle, and value range is 0 °~360 °, since red It calculates counterclockwise, red is 0 °, and green is 120 °, and blue is 240 °；Saturation degree S: value range is 0.0~1.0；It is bright Spend V: value range is 0.0 (black)~1.0 (white).Flownet is that V is assigned a value of 255, this function follows flownet, is satisfied The size of pixel displacement is represented with degree S.

Step 3 is trained for the model of step 2, obtains final identification model；It is tested using test set；Such as The final identification model of fruit meets discrimination requirement, then is used as trained final identification model, and otherwise return step 1 is readjusted Model parameter.

Loss function all uses cross entropy loss function when being trained, and is shown below:

Wherein N is the total number for the training sample chosen, and k represents k-th of the sample chosen when training, and j is data set Class number；p_kIndicate the probability of k-th of sample, p_kIndicate the probability of jth class.

Claims

1. a kind of content identification method based on video image, which comprises the following steps:

Step 1, building image recognition network model:

The structure of the image recognition network model are as follows: input layer, the first convolutional layer, the first pond layer, the second convolutional layer, Two pond layers, third convolutional layer, third pond layer, fisrt feature splicing layer, second feature splice layer, output layer；Described first Merging features layer splicing to the characteristic pattern of third pond layer and the second pond layer characteristic pattern progress merging features, then convolution, criticize It is again passed by after standardization, ReLU activation fusion and carries out the processing of attention mechanism, characteristic information input second feature is spliced into layer； Second feature splices layer and the characteristic pattern of fisrt feature splicing layer input and the first pond layer characteristic pattern is carried out merging features, then It is again passed by after convolution, batch standardization, ReLU activation fusion and carries out the processing of attention mechanism, depth characteristic information input is exported Layer；

Step 2 is directed to video image, extracts key frame images；

The optical flow field information between two field pictures is calculated by optical flow method simultaneously, the feature of key frame is transferred to other frame figures Picture；

2. a kind of content identification method based on video image according to claim 1, which is characterized in that the first volume Lamination, the second convolutional layer, third convolutional layer activation primitive be RELU.

3. a kind of content identification method based on video image according to claim 1, which is characterized in that extract key frame The process of image, which is used, extracts key frame images based on Content Analysis Method.

4. a kind of content identification method based on video image according to claim 1,2 or 3, which is characterized in that be directed to Loss function all uses cross entropy loss function when the model of step 2 is trained, and is shown below:

Wherein N is the total number for the training sample chosen, and k represents k-th of the sample chosen when training, and j is the classification of data set Number；p_kIndicate the probability of k-th of sample, p_kIndicate the probability of jth class.

5. a kind of content identification method based on video image according to claim 4, which is characterized in that be directed to step 2 Model be trained the final identification model after being trained after, tested using test set；If final identification model Meet discrimination requirement, be then used as trained final identification model, otherwise return step 1 readjusts model parameter.