CN112330650A

CN112330650A - Retrieval video quality evaluation method

Info

Publication number: CN112330650A
Application number: CN202011263356.6A
Authority: CN
Inventors: 李庆春; 严国建; 李志强; 王彬; 曾璐; 梁瑞凡; 许璐; 谢兰迟; 晏于文; 槐森; 赵明磊; 于晏平; 潘培培
Original assignee: WUHAN DAQIAN INFORMATION TECHNOLOGY CO LTD
Current assignee: WUHAN DAQIAN INFORMATION TECHNOLOGY CO LTD
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-02-05

Abstract

The invention relates to a retrieval video quality evaluation method, which comprises the following steps: calculating an environment index of the retrieval video; calculating target physical characteristics of the retrieval video; performing discrete wavelet transform on the retrieval video; and inputting the retrieved video, the environmental index, the target physical characteristic and the image after discrete wavelet transformation into different granularity feature layers of a neural network, and finally forming a model capable of classifying the video according to the video quality through mass data learning. The invention combines the traditional digital image processing technology and the deep learning method to analyze and classify the self characteristics of the video to be retrieved so as to improve the quality of the retrieved videos of different types and provide a parameter adjusting suggestion for a video retrieval comparison system.

Description

Retrieval video quality evaluation method

Technical Field

The invention relates to processing and analyzing of videos, in particular to a retrieval video quality evaluation method.

Background

With the popularization of urban video monitoring systems, the manner of criminal investigation and solution solving of the public security department is greatly changed, and investigation and solution solving (namely video investigation) by using field videos is greatly developed and applied. In video investigation applications, retrieval and comparison of suspected targets and behaviors thereof are important requirements.

Research on video retrieval technology has made great progress, many retrieval models are proposed one after another and are improved and verified continuously in practice, which greatly facilitates users to find satisfactory targets to some extent, but most retrieval systems still have serious robustness problems, for example, for some user queries, retrieval results are high in quality, and for other queries, retrieval results often contain many targets unrelated to queries; moreover, even those systems that are generally recognized to have good average search performance, their returned results may not be satisfactory for certain queries, and in short, there is often a large difference between the search results for different queries.

The reason is that the existing retrieval and comparison technologies only study the accuracy of retrieval and comparison, neglect to evaluate the environment of the input video and the physical characteristics of the target in the video, lack to analyze and classify the quality of the video to be retrieved, and lack to solve the problem of video quality and improve the quality of the video to be retrieved. If any unknown video is input into the retrieval comparison system, an undesirable or unpredictable retrieval result is inevitably obtained, and in many complex cases, the retrieval effect is undesirable.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a retrieval video quality evaluation method, and the method is combined with the traditional digital image processing technology and a deep learning method to analyze and classify the characteristics of the video to be retrieved so as to improve the quality of different types of retrieval videos and provide a parameter adjustment suggestion for a video retrieval comparison system.

The technical scheme adopted for realizing the aim of the invention is a retrieval video quality evaluation method, which comprises the following steps:

and calculating the environmental index of the retrieval video. Because the retrieval system can output unsatisfactory results due to low definition, abnormal brightness, low contrast and the like of a video picture, the invention firstly utilizes a digital image processing technology to calculate the environment quality indexes such as definition, abnormal brightness, contrast and the like of a video segment and comprehensively analyzes the three indexes to obtain the environment index score of the video.

And calculating the target physical characteristics of the retrieval video. The method comprises the steps that targets in a video frame are too small or the moving speed of the targets is too high, so that a retrieval system is difficult to detect the targets from the video, a general target detection algorithm based on a deep neural network is used for detecting the general targets in the video sequence frame, and the size of the targets in the frame is analyzed according to the size of a target frame; and then, acquiring physical characteristic quality indexes such as target motion speed and the like by using a multi-target tracking algorithm, and comprehensively analyzing the size and the motion speed of the target to obtain the physical characteristic index score of the video target.

And performing discrete wavelet transform on the retrieval video. The video that uses shooting equipment such as cell-phone, shoots and show in displays such as computer is the video of reprinting, and this kind of special video, environmental index such as its definition and physical characteristic index such as target size are as good as normal video is not different, but this kind of video can have noise such as mole line, and in addition the shake of reprinting the video, can make retrieval system rate of accuracy greatly reduced. The wavelet transform has the characteristic of multi-resolution analysis, can focus on any details of signals to perform multi-resolution time-frequency domain analysis, uses Discrete Wavelet Transform (DWT) to convert an input image, and the converted image can better highlight noises such as moire and the like so as to form more discriminative characteristics.

And inputting the retrieved video, the environmental index, the target physical characteristic and the image after discrete wavelet transformation into different granularity feature layers of a neural network, and finally forming a model capable of classifying the video according to the video quality through mass data learning.

In the above technical solution, the calculating of the environmental index of the retrieved video includes calculating a definition, calculating a brightness anomaly, and calculating a contrast, and the three indexes are calculated as follows:

the calculation of the sharpness includes:

wherein DR is definition, the larger DR is, the more clear the image is, x and y are horizontal and vertical coordinates, p_(x,y)The pixel values at the coordinate point (x, y) are w, h are the width and height of the image block, respectively.

The luminance abnormality calculation includes:

in the formula, CAST represents a deviation value, less than 1 represents normal, and more than 1 represents abnormal brightness; when CAST is abnormal, DA is larger than 0 to indicate that the light is too bright, and DA is smaller than 0 to indicate that the light is too dark; p is a radical of_(x,y)The pixel values at the coordinate point (x, y) are w and h are the width and height of the image block respectively, Mean is a brightness average value point, and Hist is a gray histogram of the image block.

The calculation of the contrast includes:

wherein, Contrast is definition, the larger Contrast represents the better Contrast of the image block, x and y are respectively horizontal and vertical coordinates, p_(x,y)The pixel values at the coordinate point (x, y) are w, h are the width and height of the image block, respectively.

In the above technical solution, the performing discrete wavelet transform on the search video includes:

wherein x (t) is the result of the change, c₀[k]、d₀[k]The value of the coefficient of expansion is equal to,

as a function of scale, #_0,k(t) etc. are wavelet functions.

The method adopts a multi-input deep convolutional neural network, takes a video sequence original image, an environmental quality index, a physical characteristic index and a DWT change image as 4 items of input, inputs or fuses the 4 items of input into different granularity characteristic layers of the neural network, and finally forms a model capable of classifying videos according to the video quality through mass data learning.

Due to the sudden increase of the amount of the monitoring videos, the video scenes and the video types are more and more complex, the quality of the videos to be retrieved is classified and analyzed, the quality of the videos to be retrieved is improved, and the improvement of the video retrieval quality is greatly facilitated. The invention innovatively provides a multi-input convolutional neural network combined with the traditional digital image processing technology and a deep learning method, carries out all-round analysis and classification on a video to be retrieved, and solves the problem that a single network or method cannot accurately analyze the quality of a complex video. The invention classifies the searched videos, the videos of different types have different quality attributes, and after the videos of different quality attributes are evaluated according to the actual searching requirement, the videos of the types which accord with the conditions are selected as the input of the searching comparison system, thereby improving the searching efficiency.

Drawings

Fig. 1 is a flowchart of a retrieval video quality evaluation method according to the present invention.

FIG. 2 is a schematic diagram of the structure of the multi-input deep convolutional neural network of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments.

S1, calculating the environmental index of the search video

In order to be able to be integrated into a deep convolutional neural network, when calculating the index of the video environment, the original image is divided into 16 × 16 image blocks, each image block generates an index value, and the index is input into the model in the form of a feature map (1/16 size of the original image).

And calculating the definition by using the X, Y direction adjacent pixel difference value in the video image. The formula is as follows:

wherein DR is definition, the larger DR is, the more clear the image is, x and y are horizontal and vertical coordinates, p_(x,y)Is the pixel at coordinate point (x, y)The values w, h are the width and height of the image block, respectively.

And (4) calculating brightness abnormity, namely calculating whether the brightness is abnormal or not by using a gray histogram of the video image. The formula is as follows:

in the formula, CAST represents a deviation value, less than 1 represents normal, and more than 1 represents abnormal brightness; when CAST is abnormal, DA greater than 0 means too bright, and DA less than 0 means too dark. p is a radical of_(x,y)The pixel values at the coordinate point (x, y) are w, h are the width and height of the image block respectively, Mean is the brightness average value point, and Hist is the gray histogram of the image block.

And calculating the contrast by using the variance of adjacent pixels in the video image. The formula is as follows:

wherein, Contrast is definition, the larger Contrast represents the better Contrast of the image block, x and y are respectively horizontal and vertical coordinates, p_(x,y)The pixel value at coordinate point (x, y) is w, h are width and height of the image block, respectively.

S2, calculating the target physical characteristics of the search video

The universal target detection and tracking algorithm uses an open source algorithm, detects and tracks the universal target in the video sequence, and obtains the target size and the target motion speed information.

S3, discrete wavelet transform is carried out on the retrieval video

Converting the original image by using a wavelet transformation algorithm, wherein the formula is as follows:

wherein x (t) is the result of the change, c₀[k]、d₀[k]、d₁[k]The value of the coefficient of expansion is equal to,

as a function of scale, #_0,k(t) etc. are wavelet functions.

S4, multiple input convolution neural network for video quality classification

The core of the invention is as follows: the method adopts a multi-input deep convolutional neural network, takes a video sequence original image, an environmental quality index, a physical characteristic index and a discrete wavelet transform change image as four items of input, inputs or fuses the four items of input into different granularity characteristic layers of the neural network, and finally forms a model capable of classifying videos according to the video quality through mass data learning.

In the invention, the video quality classification result is used as the video quality evaluation result, and in the specific implementation, the output of the multi-input convolutional neural network (namely the video quality analysis result) is classified into five categories, specifically as follows:

1. videos such as video cannot be analyzed, video messy codes and black screens are taken as the type 1, and the videos cannot be searched and analyzed.

2. The reproduction video is taken as a type 2 video, the mole noise of the type of video is large, part of the video has jitter, and retrieval analysis can be carried out after optimization.

3. The condition of video illumination, contrast and the like is very poor, the video of the target in the video is hardly seen as the 3 rd class, and the retrieval and analysis of the class of video can hardly be carried out.

4. Videos with moderate conditions such as illumination, contrast and the like are taken as the 4 th class, the videos are fuzzy, and retrieval analysis can be performed after optimization.

5. And taking videos with good video environment conditions as a 5 th class, wherein the videos are clear and can be directly retrieved and analyzed.

The classification of the five categories corresponds to different video qualities, a model capable of classifying videos according to the video qualities is finally formed, and the classification of the videos is realized after the model is evaluated according to the five categories, namely, the evaluation and the classification of the video qualities are realized.

Claims

1. A retrieval video quality evaluation method is characterized by comprising the following steps:

calculating an environment index of the retrieval video;

calculating target physical characteristics of the retrieval video;

performing discrete wavelet transform on the retrieval video;

2. The retrieval video quality evaluation method according to claim 1, wherein: and calculating the environment indexes of the retrieval video, wherein the environment indexes comprise definition calculation, brightness abnormity calculation and contrast calculation.

3. The retrieved video quality evaluation method according to claim 2, wherein the calculating of the sharpness comprises:

wherein DR is definition, x and y are respectively abscissa and ordinate, and p_(x,y)The pixel value at coordinate point (x, y) is w, h are width and height of the image block, respectively.

4. The retrieved video quality evaluation method according to claim 2, wherein the luminance anomaly calculation includes:

5. The retrieved video quality evaluation method according to claim 2, wherein the calculating of the contrast comprises:

wherein Contrast is definition, x and y are respectively abscissa and ordinate, and p_(x,y)The pixel value at coordinate point (x, y) is w, h are width and height of the image block, respectively.

6. The retrieved video quality evaluation method according to claim 1 or 2, wherein the performing discrete wavelet transform on the retrieved video comprises:

wherein x (t) is the result of the change, c₀[k]、d₀[k]In order to be the coefficient of expansion,

as a function of scale, #_0,k(t) etc. are wavelet functions.