CN112330650B

CN112330650B - Retrieval video quality evaluation method

Info

Publication number: CN112330650B
Application number: CN202011263356.6A
Authority: CN
Inventors: 李庆春; 严国建; 李志强; 王彬; 曾璐; 梁瑞凡; 许璐; 谢兰迟; 晏于文; 槐森; 赵明磊; 于晏平; 潘培培
Original assignee: WUHAN DAQIAN INFORMATION TECHNOLOGY CO LTD
Current assignee: WUHAN DAQIAN INFORMATION TECHNOLOGY CO LTD
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2024-06-28
Anticipated expiration: 2040-11-12
Also published as: CN112330650A

Abstract

The invention relates to a retrieval video quality evaluation method, which comprises the following steps: calculating the environment index of the search video; calculating target physical characteristics of the search video; performing discrete wavelet transform on the search video; and inputting the retrieved video, the environmental index, the target physical characteristics and the images after discrete wavelet transformation into different granularity characteristic layers of a neural network, and finally forming a model capable of classifying the video according to the video quality through a large amount of data learning. The invention combines the traditional digital image processing technology and the deep learning method to analyze and classify the characteristics of the video to be searched, so as to improve the quality of different types of searched videos and provide a call and reference for a video search comparison system.

Description

Retrieval video quality evaluation method

Technical Field

The invention relates to processing analysis of video, in particular to a retrieval video quality evaluation method.

Background

With the popularization of urban video monitoring systems, criminal investigation and case breaking modes of public security departments are greatly changed, and the use of on-site videos for investigation and case breaking (namely video investigation) is greatly developed and applied. In video investigation applications, search comparison for suspected targets and their behavior is an important requirement.

Research on video retrieval technology has made great progress, many retrieval models are sequentially proposed and are continuously improved and verified in practice, so that users can find satisfactory targets to a certain extent, but most retrieval systems still have serious robustness problems, for example, for some user queries, the retrieval results are high in quality, and for other queries, the retrieval results often contain a plurality of targets irrelevant to the query; even more so, even those systems that are well-accepted for average retrieval performance, their returned results are unsatisfactory for some queries, and in short, there is often a large variance in the retrieval results from query to query.

The method is characterized in that the existing search comparison technology only researches the accuracy of search comparison, but omits the evaluation of the environment of the input video and the physical characteristics of the target in the video, lacks analysis and classification of the quality of the video to be searched, lacks the problem of video quality, and promotes the scheme of the video quality to be searched. If any unknown video is input into the search comparison system, it is unavoidable to obtain an undesirable or unpredictable search result, and in many complex cases, the search result is undesirable.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a retrieval video quality evaluation method, and the invention combines the traditional digital image processing technology and the deep learning method to analyze and classify the characteristics of the video to be retrieved so as to improve the quality of different types of retrieval videos and provide a call reference for a video retrieval comparison system.

The technical scheme adopted for realizing the purpose of the invention is a retrieval video quality evaluation method, which comprises the following steps:

And calculating the environment index of the search video. Because the video picture has low definition, abnormal brightness, low contrast and the like, the retrieval system outputs an undesirable result, the invention firstly utilizes the digital image processing technology to calculate the environmental quality indexes such as the definition, the abnormal brightness, the contrast and the like of the video fragment, and comprehensively analyzes the three indexes to obtain the environmental index score of the video.

The target physical characteristics of the search video are calculated. The method comprises the steps that a target in a video picture is too small or the moving speed of the target is too high, so that a retrieval system is difficult to detect the target from the video, a general target in a video sequence picture is detected by using a general target detection algorithm based on a depth neural network, and the size of the target in the picture is analyzed through the size of a target frame; and then, a multi-target tracking algorithm is used for obtaining physical characteristic quality indexes such as target movement speed and the like, and the physical characteristic index score of the video target is obtained by comprehensively analyzing the size and movement speed of the target.

Discrete wavelet transforms are performed on the retrieved video. The special video has no difference between the physical characteristic indexes such as the environmental indexes such as definition and the target size and the normal video, but the video has noise such as moire and the like, and the accuracy of a retrieval system is greatly reduced due to the shake of the flip video. The wavelet transformation has the characteristic of multi-resolution analysis, can focus on any details of signals to perform multi-resolution time-frequency domain analysis, and uses Discrete Wavelet Transformation (DWT) to transform an input image, so that the transformed image can better highlight noise such as mole lines and the like so as to form more discriminant features.

And inputting the retrieved video, the environmental index, the target physical characteristics and the images after discrete wavelet transformation into different granularity characteristic layers of a neural network, and finally forming a model capable of classifying the video according to the video quality through a large amount of data learning.

In the above technical solution, calculating the environmental index of the search video includes sharpness calculation, brightness anomaly calculation, and contrast calculation, where the three indexes are calculated as follows:

The calculation of the definition includes:

where DR is sharpness, where greater DR represents clearer images, x, y are respectively the abscissa and ordinate, p _(x,y) is a pixel value at the coordinate point (x, y), and w, h are respectively the width and height of the image block.

The above luminance anomaly calculation includes:

Wherein CAST represents an offset value, less than 1 represents a normal condition, and more than 1 represents an abnormal brightness condition; when CAST is abnormal, DA greater than 0 indicates too bright and DA less than 0 indicates too dark; p _(x,y) is the pixel value at coordinate point (x, y), w, h are the width, height, mean is the luminance Mean point, hist is the gray histogram of the image block.

The calculation of the contrast includes:

In the formula, contrast is definition, the larger the Contrast is, the better the Contrast of the image block is, x and y are respectively the abscissa, p _(x,y) is the pixel value at the coordinate point (x and y), and w and h are respectively the width and height of the image block.

In the above technical solution, the performing discrete wavelet transform on the search video includes:

wherein x (t) is the result of the change, c ₀[k]、d₀ [ k ] and the like are expansion coefficients, As a scale function, ψ _0,k (t) and the like are wavelet functions.

The invention adopts a multi-input deep convolutional neural network, takes an original image of a video sequence, an environmental quality index, a physical characteristic index and a DWT change image as 4 items of input, inputs or fuses the images into different granularity characteristic layers of the neural network, and finally forms a model capable of classifying videos according to video quality through a large amount of data learning.

Because of the sudden increase of the monitoring video quantity, video scenes and types are more and more complex, and the quality of the video to be searched is classified and analyzed and improved, so that the video quality searching method is greatly beneficial to improvement of video searching quality. The invention creatively proposes to design a multi-input convolutional neural network by combining the traditional digital image processing technology and a deep learning method from scratch, carries out omnibearing analysis and classification on the video to be searched, and solves the problem that a single network or method cannot accurately analyze the quality of the complex video. According to the invention, the search videos are classified, videos in different categories have different quality attributes, and after the videos in different quality attributes are evaluated according to the actual search requirement, the videos in the categories meeting the conditions are selected as the input of a search comparison system, so that the search efficiency is improved.

Drawings

Fig. 1 is a flowchart of a method for evaluating the quality of a search video according to the present invention.

Fig. 2 is a schematic structural diagram of a multi-input deep convolutional neural network according to the present invention.

Detailed Description

The invention will now be described in further detail with reference to the drawings and to specific examples.

S1, calculating environmental indexes of search videos

In order to be integrated into the deep convolutional neural network, when calculating the video environment index, the original image is divided into 16 x 16 image blocks, each image block can generate an index value, and the index is input into the model in the form of a characteristic image (1/16 of the original image).

Sharpness calculation is performed by using the difference between X, Y adjacent pixels in the video image. The formula is as follows:

And calculating brightness abnormality, namely calculating whether brightness is abnormal or not by using a gray level histogram of the video image. The formula is as follows:

wherein CAST represents an offset value, less than 1 represents a normal condition, and more than 1 represents an abnormal brightness condition; when CAST is abnormal, DA greater than 0 indicates too bright and DA less than 0 indicates too dark. p _(x,y) is the pixel value at coordinate point (x, y), w, h are the width and height of the image block, mean is the luminance Mean point, hist is the gray histogram of the image block, respectively.

Contrast calculation, which uses the variance of adjacent pixels in a video image. The formula is as follows:

S2, calculating target physical characteristics of the search video

The universal target detection and tracking algorithm uses an open source algorithm, and detects and tracks a universal target in a video sequence to obtain the size of the target and the movement speed information of the target.

S3, discrete wavelet transformation is carried out on the search video

Using wavelet change algorithm to convert original image, the formula is as follows:

Wherein x (t) is the result of the change, c ₀[k]、d₀[k]、d₁ [ k ] and the like are expansion coefficients, As a scale function, ψ _0,k (t) and the like are wavelet functions.

S4, multiple-input convolutional neural network for video quality classification

The core of the invention is as follows: the method adopts a multi-input deep convolutional neural network, takes an original image of a video sequence, an environmental quality index, a physical characteristic index and a discrete wavelet transformation change image as four inputs, inputs or merges the four inputs into different granularity characteristic layers of the neural network, and finally forms a model capable of classifying the video according to the video quality through a large amount of data learning.

The invention takes the video quality classification result as the video quality evaluation result, and in the specific implementation, the output of the multi-input convolutional neural network (namely the video quality analysis result) is divided into five main categories, and the specific steps are as follows:

1. the video cannot be analyzed, video disorder codes, black screens and the like are taken as class 1 videos, and the class of videos cannot be searched and analyzed.

2. The type 2 is taken as the flip video, the video has larger molar noise and jitter in part, and the retrieval analysis can be carried out after the optimization.

3. The conditions such as video illumination and contrast are very poor, the video of the target in the video is hardly seen as class 3, and the video can not be searched and analyzed.

4. And taking videos with medium conditions such as illumination, contrast and the like as class 4, wherein the videos are fuzzy, and can be searched and analyzed after optimization.

5. The video with good video environment conditions is used as the 5 th class, the video is clear, and the retrieval analysis can be directly carried out.

The five classes of classification correspond to different video quality, and the method and the device finally form a model capable of classifying the video according to the video quality, and the classification of the video, namely the evaluation and classification of the video quality, are realized after the model is evaluated according to the five classes.

Claims

1. A search video quality evaluation method, comprising:

calculating the environment index of the search video;

Calculating target physical characteristics of the search video;

performing discrete wavelet transform on the search video; retrieving the video for discrete wavelet transformation includes:

Is a wavelet function;

2. The search video quality evaluation method according to claim 1, wherein: the environmental indexes of the calculation search video include definition calculation, brightness abnormality calculation and contrast calculation.

3. The search video quality evaluation method according to claim 2, wherein the calculation of the sharpness includes:

where DR is sharpness, x and y are horizontal and vertical coordinates, p _(x,y) is a pixel value at a coordinate point (x, y), and w and h are width and height of an image block, respectively.

4. The search video quality evaluation method according to claim 2, wherein the luminance abnormality calculation includes:

5. The search video quality evaluation method according to claim 2, wherein the calculation of the contrast includes:

Wherein Contrast is definition, x and y are respectively horizontal and vertical coordinates, p _(x,y) is a pixel value at a coordinate point (x and y), and w and h are respectively width and height of an image block.