CN112434730A

CN112434730A - GoogleNet-based video image quality abnormity classification method

Info

Publication number: CN112434730A
Application number: CN202011247921.XA
Authority: CN
Inventors: 林嘉鑫; 赖蔚蔚; 吴广财; 郑杰生; 郑颖龙; 周昉昉; 刘佳木
Original assignee: Guangdong Electric Power Information Technology Co Ltd
Current assignee: Guangdong Electric Power Information Technology Co Ltd
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2021-03-02

Abstract

The invention discloses a GoogleNet-based video image quality abnormity classification method, which relates to the technical field of video image quality processing and comprises the following steps: acquiring original data information in advance and carrying out data preprocessing, wherein the data information comprises data information labeling, data augmentation, data set splitting and tfrecrd file generation; building a neural network model and training, wherein the neural network model comprises pre-training weights for calibrating an ImageNet image data set, convolution bases for calibrating the model and a dense connection classifier; and taking the trained neural network model as a video image quality abnormity classification model and outputting a result. The method is simple, high in classification accuracy and high in speed, can detect the image quality abnormity in real time and classify the abnormity, and is convenient to operate and expand.

Description

GoogleNet-based video image quality abnormity classification method

Technical Field

The invention relates to the technical field of video image quality processing, in particular to a GoogleNet-based video image quality abnormity classification method.

Background

In the prior art, a video image acquired by a video monitoring system has a large number of moving objects, and in all the moving objects, two types of objects, namely people and vehicles, are generally taken as main attention objects. There is a clear distinction between the management requirements for these two types of objects, and there is therefore a need for classification of these two types of objects in video surveillance systems.

At present, the main part in research and development is to use a statistical training based method to classify targets. However, the method needs to collect a large number of image samples of vehicles and people, and has slow recognition speed and high requirement on computing equipment. It can be seen that in the prior art, the above-mentioned problems have severely limited the application of this type of method to object recognition

The patent CN101882217B of the invention discloses a method and a device for classifying objects of video images, which comprises the following steps: after receiving a video image, filtering foreground blocks obtained from the video image, and taking the foreground blocks meeting preset filtering conditions as moving targets; tracking the moving target by a mean iterative drift algorithm, and extracting the moving target at the position of the tracked result; after the extracted moving target is subjected to normalization processing, scanning the contour of the moving target subjected to the normalization processing to obtain a characteristic statistic value; and determining the type of the moving target according to the characteristic statistic value. The targets are classified by utilizing the contour features of the targets, so that the classification accuracy is improved; the method overcomes the defect of inaccurate wide-high ratio characteristics caused by the existing normalization method by carrying out size normalization processing on the moving target through the scaling factor, reduces the data volume of the color histogram by calculating the color histogram through joint probability distribution, but still has the problems of low accuracy and low efficiency.

An effective solution to the problems in the related art has not been proposed yet.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides a video image quality abnormity classification method based on GoogleNet, which is simple in implementation method, high in classification accuracy and high in speed, can detect image quality abnormity in real time and classify the abnormity, is convenient to operate and expand, and solves the technical problems in the prior art.

The technical scheme of the invention is realized as follows:

a video image quality abnormity classification method based on GoogleNet comprises the following steps:

acquiring original data information in advance and carrying out data preprocessing, wherein the data information comprises data information labeling, data augmentation, data set splitting and tfrecrd file generation;

building a neural network model and training, wherein the neural network model comprises pre-training weights for calibrating an ImageNet image data set, convolution bases for calibrating the model and a dense connection classifier;

and taking the trained neural network model as a video image quality abnormity classification model and outputting a result.

Further, the labeling of the data information includes setting classification labels, including normal, color cast abnormal, brightness abnormal and fuzzy abnormal.

Further, the data augmentation includes mirroring, rotation, scaling, clipping, translation, gaussian noise, brightness adjustment, saturation adjustment, and contrast adjustment.

Further, the splitting the data set includes the following steps:

taking a sample data set of model fitting as a training set;

monitoring whether the model is over-fitted or not, and reserving a sample data set independently in the training process for adjusting the hyper-parameters of the model and performing primary evaluation on the capability of the model to serve as a verification set;

and evaluating the generalization capability of the model.

Further, the calibration model convolution base and dense connection classifier comprises the following steps:

running a convolution base on the data set, and saving the output into a Numpy array in the hard disk;

the saved data information is taken as input and input into a separate dense connection classifier.

Further, neural network model optimization is also included, including using a larger learning rate to quickly obtain a better solution, and gradually reducing the learning rate with the continuation of iteration, so that the model is more stable at the later stage of training, which is expressed as:

decay_learning_rate＝learning_rate×decay_rate^(global_step/decay_step)

wherein, the learning rate used in each round of optimization is the learning _ rate, the learning _ rate is the set initial learning rate, the decay _ rate is the attenuation coefficient, and the decay _ step is the attenuation speed.

The invention has the beneficial effects that:

according to the method for classifying video image quality abnormity based on GoogleNet, the original data information is obtained in advance and is subjected to data preprocessing, the neural network model is built and trained, the trained neural network model is used as the video image quality abnormity classification model and outputs the result, the method is simple to implement, high in classification accuracy and high in speed, can detect the image quality abnormity in real time and classify the abnormity, and is convenient to operate and expand.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a GoogleNet-based video image quality anomaly classification method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a dense connection classifier of a GoogleNet-based video image quality anomaly classification method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a convolution basis of a GoogleNet-based video image quality anomaly classification method according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

According to the embodiment of the invention, a video image quality abnormity classification method based on GoogleNet is provided.

As shown in fig. 1 to fig. 3, the method for classifying video image quality abnormality based on GoogleNet according to the embodiment of the present invention includes the following steps:

step S1, acquiring original data information in advance and preprocessing the data, wherein the data information is marked, data is augmented, a data set is split, and tfrechrd files are generated;

step S2, building a neural network model and training, wherein the neural network model comprises pre-training weights for calibrating ImageNet image data sets, convolution bases for calibrating the model and a dense connection classifier;

and step S3, taking the trained neural network model as a video image quality abnormity classification model and outputting the result.

And marking data information, wherein the marking data information comprises setting classification labels, including normal, color cast abnormal, brightness abnormal and fuzzy abnormal.

Wherein the data augmentation includes mirroring, rotation, scaling, clipping, translation, Gaussian noise, brightness adjustment, saturation adjustment, and contrast adjustment.

Wherein splitting the data set comprises the following steps:

taking a sample data set of model fitting as a training set;

and evaluating the generalization capability of the model.

The convolutional basis and dense connection classifier of the calibration model comprises the following steps:

The method also comprises neural network model optimization, wherein the neural network model optimization comprises the steps of rapidly obtaining a better solution by using a larger learning rate, and gradually reducing the learning rate along with the continuation of iteration so that the model is more stable in the later training stage and is represented as follows:

decay_learning_rate＝learning_rate×decay_rate^(global_step/decay_step)

By means of the scheme, the original data information is obtained in advance and is subjected to data preprocessing, the neural network model is built and trained, the trained neural network model is used as a video image quality abnormity classification model, the result is output, the implementation method is simple, the classification accuracy is high, the speed is high, the image quality abnormity can be detected in real time, the abnormity can be classified, the operation is convenient, and the expansion is convenient.

Specifically, for the above-described generation of tfrecrd file, it is

Tfrecord is a binary file format built into TF, but tfrecrd files do not have to be generated. It has the following advantages: storing the data and the label in the same file; the memory is fully utilized, so that the copying and the moving are convenient; unifying the operations of various input files.

In addition, the convolutional neural network for image classification contains two parts: first a series of convolutional and pooling layers and finally a dense connection classifier. The first part is called the convolution base of the model. For convolutional neural networks, feature extraction is to take the convolutional basis of a previously trained network, run new data, and then train a new classifier at the output layer.

In addition, as shown in fig. 3, where InputLayer represents a convolution basis, the feature map of the convolutional neural network indicates whether a general concept exists in the image, and such a feature map may be useful in any computer vision problem. The representation of the densely connected layers does not contain position information of the object in the input image. Dense connected layers give up the notion of space, and if object position is important to the problem, the features of dense connected layers are largely useless.

After the convolution base, a dense connection classifier is added, and two methods can be selected: the convolution base is run on the data set, the output is saved as a Numpy array in the hard disk, and this data is then used as input to the independent dense join classifier. This method is fast and computationally inexpensive, but does not use data enhancement. A density layer is added on top to extend the existing model (i.e., the convolution base) and run the entire model end-to-end on the input data. Data enhancement can be used because each input image enters the model via the convolution basis, but is computationally expensive. I.e., first freeze the convolution base, add a new classifier, and then train.

In addition, when the neural network is trained by adopting a stochastic gradient descent algorithm, the performance of the final model on test data can be improved to a certain extent by using the moving average model. TF provides tf.train.exponentialmovingaverage method to implement the running average model. During initialization, a decay rate is required to control the update speed of the model. Exponental MovingAverage maintains a shadow variable for each variable, the initial value of which is the initial value of the corresponding variable. And each time the running variable is updated, the value of the shadow variable is updated as follows:

shadow_variable＝decay×shadow_variable+(1-decay)×variable

wherein, shadow _ variable is an independent variable, variable is a variable to be updated, decay is an attenuation rate, decay determines the updating speed of the model, and the larger the decay is, the more stable the model tends to. In practice, the decade is typically set to a number very close to 1, e.g., 0.999.

In summary, according to the technical scheme of the invention, the original data information is obtained in advance and is subjected to data preprocessing, the neural network model is built and trained, and the trained neural network model is used as the video image quality abnormity classification model and outputs a result.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A video image quality abnormity classification method based on GoogleNet is characterized by comprising the following steps:

2. The GoogleNet-based video image quality abnormality classification method according to claim 1, wherein the labeling of data information includes setting classification labels including normal, color cast abnormality, brightness abnormality and blur abnormality.

3. The GoogleNet-based video image quality anomaly classification method according to claim 2, wherein the data augmentation includes mirroring, rotation, scaling, cropping, translation, gaussian noise, brightness adjustment, saturation adjustment, and contrast adjustment.

4. The GoogleNet-based video image quality anomaly classification method according to claim 1, wherein the splitting of the data set comprises the steps of:

taking a sample data set of model fitting as a training set;

and evaluating the generalization capability of the model.

5. The GoogleNet-based video image quality anomaly classification method according to claim 1, wherein the convolutional basis and dense connection classifier of the calibration model comprises the steps of:

6. The GoogleNet-based video image quality anomaly classification method according to claim 1, further comprising neural network model optimization, including using a larger learning rate to obtain a better solution quickly, and gradually reducing the learning rate as the iteration continues, so that the model is more stable in the late training stage, and expressed as:

decay_learning_rate＝learning_rate×decay_rate^(global_step/decay_step)