Quality evaluation method and device for video stream
Technical Field
The present invention relates to multimedia communication technologies in the field of communications, and in particular, to a method and an apparatus for evaluating quality of a video stream.
Background
At present, with the rapid development of the internet and the mobile communication network, the demand of people for video services is increasing, and services such as video monitoring, video conference, online video playing and the like are increasingly strong. A basic feature common to all types of video processing and transmission systems is the transmission of a video stream generated by one node to another node over a network. Wherein the links affecting the quality of the video stream comprise the stages of generation, transmission and reconstruction of the video stream. Because the original video source data volume is huge, the data volume needing to be transmitted is greatly reduced by adopting a lossy compression standard in the generation stage of the video stream, and the video quality is reduced; the video quality is also reduced due to the problems of network packet loss, time delay, jitter and the like in the network transmission process; in addition, in the video stream reconstruction stage, factors such as the display quality and the lighting environment of the terminal device also affect the video quality. Since the above various types of video degradation factors are different, great difficulty is brought to video quality evaluation at a video receiving end.
The current Video Quality evaluation methods are classified into Video Subjective Quality evaluation methods (VSQA for short) and Video Objective Quality evaluation methods (VOQA for short) according to whether evaluation conclusion is given by human eye observation.
Video subjective quality assessment is to play a series of test video sequences under a test environment specified by a tester according to international standards (such as ITU-R BT 500) and allow the tester to give subjective scores to the quality of the test video sequences. Since the subjective scores given by the test video sequences are all the perceived values of the test video under human vision, the results of the subjective evaluation are considered to be accurate. However, the subjective evaluation process is complicated and time-consuming, and the test result obtained by evaluation has no expansibility, so that the method cannot be used in the field with high real-time requirements.
The video objective quality assessment method is simple and quick to operate, can meet the real-time requirement and is widely applied.
The video objective quality assessment method is further classified into a full-reference type video quality assessment method, a partial-reference type video quality assessment method, and a no-reference type video quality assessment method. The video quality evaluation methods of the full reference type and the partial reference type generally need to refer to all information or partial information of an original video sequence, and in practical application, a receiving end often has difficulty in obtaining the information of the original video sequence. The video quality evaluation method without reference type does not need to transmit any information of the original video sequence, and can directly estimate the video distortion degree according to some distortion characteristics of the video code stream received by the receiving end, and the method of the type is still in a research stage, can not accurately obtain the real video distortion degree, and has certain limitation in application.
In the practical application of various network video services, what is needed is an evaluation method which is easy to configure, simple and efficient, and can accurately detect the objective quality of video in the generation, transmission and reconstruction stages of video streams respectively.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method and an apparatus for evaluating the quality of a video stream, which can realize full reference evaluation of objective quality of video at different processing stages of the video stream respectively without increasing network transmission load.
The quality evaluation method of the video stream comprises the following steps:
acquiring a first compressed video stream generated by processing an original video sequence; the first compressed video stream carries a sequence identifier and an image sequence number corresponding to the original video sequence;
acquiring the original video sequence corresponding to the first compressed video stream according to the sequence identifier;
acquiring an original compressed video stream corresponding to the original video sequence according to the sequence identifier and the image sequence number;
and evaluating the video quality of the first compressed video stream according to the first compressed video stream and the original compressed video stream.
The invention also provides a quality evaluation method of the video stream, which comprises the following steps:
acquiring an original video sequence, a sequence identifier corresponding to the original video sequence and an image sequence number corresponding to the original video sequence;
and generating an original compressed video stream corresponding to the original video sequence according to the sequence identifier and the image sequence number.
The present invention also provides a quality evaluation device for video streams, comprising:
a first acquisition unit that acquires a first compressed video stream generated by processing an original video sequence; the first compressed video stream carries a sequence identifier and an image sequence number corresponding to the original video sequence;
the second acquisition unit is used for acquiring the original video sequence corresponding to the first compressed video stream according to the sequence identifier;
a third obtaining unit, configured to obtain an original compressed video stream corresponding to the original video sequence according to the sequence identifier and the image sequence number;
and the first evaluation unit evaluates the video quality of the first compressed video stream according to the first compressed video stream and the original compressed video stream.
The present invention also provides a quality evaluation device for video streams, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an original video sequence, a sequence identifier corresponding to the original video sequence and an image serial number corresponding to the original video sequence;
and the generating unit is used for generating an original compressed video stream corresponding to the original video sequence according to the sequence identification and the image sequence number.
The technical scheme of the invention has the following beneficial effects:
the invention can identify the corresponding compressed video code stream and the original video sequence at the receiving end under the condition of not increasing the network transmission burden, thereby realizing the full reference evaluation of the objective quality of the video respectively aiming at different processing stages of the video stream, having the characteristics of flexible application, higher precision, objective evaluation and the like, and being widely applied to the video field.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the present invention is further described below with reference to the accompanying drawings and the embodiments. Obviously, the described embodiments of the present invention are some of the embodiments of the present invention, and all other embodiments obtained by those skilled in the art without any inventive work are within the scope of the present invention based on the described embodiments of the present invention.
Fig. 1 is a schematic flow chart of a method for evaluating the quality of a video stream according to an embodiment of the present invention;
fig. 2 is a schematic connection diagram of a quality evaluation apparatus for video streams according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an apparatus for objective quality assessment of video streams according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a method of a video stream generation unit according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a method of a video capture and frame information identification unit according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a method of an objective quality calculation unit for a video stream according to an embodiment of the present invention;
fig. 7 is a schematic view of an application scenario for video streaming quality assessment according to an embodiment of the present invention;
fig. 8 is a schematic view of an application scenario for compression quality evaluation of a video encoding apparatus according to an embodiment of the present invention.
Fig. 9 is a schematic view of an application scenario for reconstructing quality assessment of a video stream according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the present invention provides a method for evaluating the quality of a video stream, which includes:
step 11, acquiring a first compressed video stream generated by processing an original video sequence; and the first compressed video stream carries the sequence identification and the image sequence number corresponding to the original video sequence. That is, each video sequence is composed of a series of video images, and the playing at a certain frame rate forms a moving image. The invention has a plurality of original video sequences, and different original video sequences correspond to different sequence identifications. The same original video sequence corresponds to different video images, and different video images of the same original video sequence correspond to different image serial numbers.
Wherein the processing of the original video sequence comprises: encoding processing of the original video sequence; or, the transmission processing of the original compressed video stream generated by the original video sequence; or, a decoding process of the received original compressed video stream generated from the original video sequence. Correspondingly, step 11 is: step 11A, receiving a video of the original compressed video stream after transmission processing as a first compressed video stream; or, step 11B, receiving the video of the original compressed video stream after decoding processing as a first compressed video stream; or, step 11C, capturing a video displayed on the original compressed video stream as a first compressed video stream. Taking processing as an example of transmission processing, an original video sequence and a corresponding original compressed video stream are located at a transmitting end, and a first video sequence and a corresponding first compressed video stream are located at a receiving end.
Step 12, obtaining the original video sequence corresponding to the first compressed video stream according to the sequence identifier;
step 13, obtaining an original compressed video stream corresponding to the original video sequence according to the sequence identifier and the image sequence number;
and step 14, evaluating the video quality of the first compressed video stream according to the first compressed video stream and the original compressed video stream.
The method further comprises the following steps:
step 15, acquiring a first video sequence corresponding to the first compressed video stream; the step 15 specifically comprises:
step 15A, decoding the first compressed video stream to generate a first video sequence; or
And step 15B, collecting the video displayed on the original compressed video stream to generate a first video sequence.
And step 16, evaluating the video quality of the first video sequence according to the first video sequence and the original video sequence.
Optionally, before step 16, the method further includes:
step 16A, determining whether the first video sequence has a frame loss:
step 16B, if a frame is lost, filling up the lost video frame to generate a first video sequence after frame filling;
step 16 specifically comprises: and evaluating the video quality of the first video sequence after the frame is supplemented according to the first video sequence after the frame is supplemented and the original video sequence.
In one embodiment, step 12 comprises:
step 121A, extracting a sequence identifier carried by the first compressed video stream;
and step 122A, acquiring an original video sequence corresponding to the sequence identifier according to the corresponding relationship between the sequence identifier and the original video sequence.
In another embodiment, step 12 comprises:
step 121B, extracting a sequence identifier carried by the first compressed video stream;
and step 122B, generating an original video sequence corresponding to the sequence identifier according to the sequence identifier.
In one embodiment, step 13 comprises:
step 131A, extracting a sequence identifier and an image sequence number carried by the first compressed video stream;
step 132A, obtaining the original compressed video stream corresponding to the sequence identifier and the image sequence number according to the corresponding relationship between the sequence identifier and the image sequence number and the original compressed video stream.
In another embodiment, step 13 comprises:
step 131B, extracting sequence identifiers and image sequence numbers carried by the first compressed video stream;
step 131B, superimposing the sequence identifier and the image sequence number on the frame of the original video sequence to generate an original compressed video stream.
Step 14 specifically comprises the following steps: step 141, calculating a video packet loss ratio of the first compressed video stream relative to the original compressed video stream according to the first compressed video stream and the original compressed video stream.
Or, the step 14 is specifically a step 142 of calculating a video source error rate of the first compressed video stream relative to the original compressed video stream according to the first compressed video stream and the original compressed video stream.
Step 141 specifically includes:
step 142 specifically comprises:
the method for calculating the error bit number comprises the following steps: the first compressed video stream and the original compressed video stream are directly compared with each other to determine the number of error bits, and the comparison method comprises the following steps: and comparing the number of the bits corresponding to the difference between the two binary files.
Step 16 specifically comprises: a step 161 of calculating a mean square error of the first video sequence relative to the original video sequence based on the first video sequence and the original video sequence;
alternatively, step 16 specifically includes: step 162, calculating a peak signal-to-noise ratio of the first video sequence relative to the original video sequence according to the first video sequence and the original video sequence;
alternatively, step 16 specifically includes: step 163, calculating a structural similarity mean of the first video sequence with respect to the original video sequence according to the first video sequence and the original video sequence.
Step 161 specifically comprises:
<math>
<mrow>
<mi>MSE</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>NM</mi>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>M</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<mo>[</mo>
<mi>X</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>Y</mi>
<msup>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>]</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
wherein,square error; the frame resolution of the first compressed video stream is M pixels × N pixels, and X (i, j) is the pixel value of one frame image of the original video sequence at the point (i, j); y (i, j) represents the pixel value of the corresponding frame image of the frame image in the first video sequence at the point (i, j); the pixel values may be gray scale or color differences.
Step 162 specifically comprises:
<math>
<mrow>
<mi>PSNR</mi>
<mo>=</mo>
<mn>10</mn>
<mi>log</mi>
<mo>[</mo>
<mfrac>
<mrow>
<mi>N</mi>
<mo>×</mo>
<mi>M</mi>
<mo>×</mo>
<msup>
<mi>E</mi>
<mn>2</mn>
</msup>
</mrow>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>M</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<mo>[</mo>
<mi>X</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>Y</mi>
<msup>
<mrow>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
</mrow>
<mn>2</mn>
</msup>
<mo>]</mo>
</mrow>
</mfrac>
<mo>]</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein, PSNR is a peak signal-to-noise ratio, a frame resolution of the first compressed video stream is M pixels × N pixels, and X (i, j) is a pixel value of one frame image of the original video sequence at a point (i, j); y (i, j) represents the pixel value of the corresponding frame image of the frame image in the first video sequence at the point (i, j); e is the peak amplitude of the first compressed video stream under a sampling condition of a predetermined bit;
step 163 specifically comprises:
<math>
<mrow>
<mi>MSSIM</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>M</mi>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>M</mi>
</munderover>
<mi>SSIM</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>k</mi>
</msub>
<mo>,</mo>
<msub>
<mi>y</mi>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>5</mn>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
wherein, SSIM is a structural similarity parameter, and k is a sequence number of a local window of one frame of image of the first video sequence; m is the total number of local windows of one frame of image of the first video sequence; xk and yk are the contents of the video frame of the kth local window; content is a collective term for all digital images within a window.
SSIM is calculated according to the following formula;
SSIM=[l(x,y)]α·[c(x,y)]β·[s(x,y)]γ (6);
wherein α, β, γ >0, which are weighting coefficients of the luminance comparison function l (x, y), the contrast comparison function c (x, y), and the structure information comparison function s (x, y), respectively;
<math>
<mrow>
<mi>l</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mn>2</mn>
<msub>
<mi>μ</mi>
<mi>x</mi>
</msub>
<msub>
<mi>μ</mi>
<mi>y</mi>
</msub>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>1</mn>
</msub>
</mrow>
<mrow>
<msup>
<msub>
<mi>μ</mi>
<mi>x</mi>
</msub>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<msub>
<mi>μ</mi>
<mi>y</mi>
</msub>
<mn>2</mn>
</msup>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>1</mn>
</msub>
</mrow>
</mfrac>
<mo>,</mo>
<msub>
<mi>C</mi>
<mn>1</mn>
</msub>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>K</mi>
<mn>1</mn>
</msub>
<mi>L</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>7</mn>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
<math>
<mrow>
<mi>c</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mn>2</mn>
<msub>
<mi>σ</mi>
<mi>x</mi>
</msub>
<msub>
<mi>σ</mi>
<mi>y</mi>
</msub>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>2</mn>
</msub>
</mrow>
<mrow>
<msup>
<msub>
<mi>σ</mi>
<mi>x</mi>
</msub>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<msub>
<mi>σ</mi>
<mi>y</mi>
</msub>
<mn>2</mn>
</msup>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>2</mn>
</msub>
</mrow>
</mfrac>
<mo>,</mo>
<msub>
<mi>C</mi>
<mn>2</mn>
</msub>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>K</mi>
<mn>2</mn>
</msub>
<mi>L</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>8</mn>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
<math>
<mrow>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>σ</mi>
<mi>xy</mi>
</msub>
<mi></mi>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>3</mn>
</msub>
</mrow>
<mrow>
<msub>
<mi>σ</mi>
<mi>xy</mi>
</msub>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>3</mn>
</msub>
</mrow>
</mfrac>
<mo>,</mo>
<msub>
<mi>C</mi>
<mn>3</mn>
</msub>
<mo>=</mo>
<msub>
<mi>C</mi>
<mn>2</mn>
</msub>
<mo>/</mo>
<mn>2</mn>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>9</mn>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
wherein,represents the average luminance of the original video sequence; n is the total number of frames of the original video sequence.
Representing an average luminance of the first video sequence;
representing the standard deviation of the original video sequence,
representing a standard deviation of a first video sequence;
representing an original video sequence anda covariance of the first video sequence;
C1、C2、C3is a constant;
K1、K2<<1, L represents the dynamic variation range of the pixel value.
As shown in fig. 2, the present invention further provides a quality evaluation apparatus for a video stream, comprising:
a first acquisition unit 21 that acquires a first compressed video stream generated by processing an original video sequence; the first compressed video stream carries a sequence identifier and an image sequence number corresponding to the original video sequence;
a second obtaining unit 22, obtaining the original video sequence corresponding to the first compressed video stream according to the sequence identifier;
a third obtaining unit 23, obtaining an original compressed video stream corresponding to the original video sequence according to the sequence identifier and the image sequence number;
the first evaluation unit 24 evaluates the video quality of the first compressed video stream according to the first compressed video stream and the original compressed video stream.
The device, still include:
a fourth obtaining unit 25, which obtains a first video sequence corresponding to the first compressed video stream;
a second evaluation unit 26 evaluates the video quality of the first video sequence based on the first video sequence and the original video sequence.
The first acquisition unit 21 includes:
the transmission module receives a video of the original compressed video stream after transmission processing as a first compressed video stream;
the decoding module receives a video of the original compressed video stream after decoding processing as a first compressed video stream; or
And the first acquisition module acquires a video displayed on the original compressed video stream as a first compressed video stream.
The fourth acquiring unit 25 includes:
a decoding module, decoding the first compressed video stream to generate a first video sequence; or
And the second acquisition module acquires the video displayed on the original compressed video stream to generate a first video sequence.
The second acquisition unit 22 includes:
the first extraction module is used for extracting the sequence identification carried by the first compressed video stream;
and the first acquisition module acquires the original video sequence corresponding to the sequence identifier according to the corresponding relation between the sequence identifier and the original video sequence.
Optionally, the second obtaining unit 22 includes:
the second extraction module is used for extracting the sequence identification carried by the first compressed video stream;
and the first generation module generates an original video sequence corresponding to the sequence identifier according to the sequence identifier.
The third acquiring unit 23 includes:
the third extraction module is used for extracting the sequence identification and the image sequence number carried by the first compressed video stream;
and the second acquisition module acquires the original compressed video stream corresponding to the sequence identifier and the image sequence number according to the corresponding relation between the sequence identifier and the image sequence number and the original compressed video stream.
Optionally, the third obtaining unit 23 includes:
the fourth extraction module is used for extracting the sequence identification and the image sequence number carried by the first compressed video stream;
and the second generation module is used for superposing the sequence identification and the image sequence number on the frames of the original video sequence to generate an original compressed video stream.
The embodiment of the invention can identify the corresponding compressed video code stream and the original video sequence at the receiving end under the condition of not increasing network transmission burden through different configuration modes, thereby realizing full reference evaluation aiming at the objective quality of the video at the generation, transmission and reconstruction stages of the video stream respectively, having the characteristics of flexible application, higher precision, objective evaluation and the like, and being widely applied to the field of videos.
The following describes an application scenario of the method of the present invention.
As shown in fig. 4, the present invention provides a method for generating a test-specific video stream (equivalent to the original compressed video stream described above), including:
first, the original video test sequence (equivalent to the original video sequence) is divided into four types of still, fast moving, slow moving and zooming according to the lens motion characteristics, and all the sequences are subjected to letter numbering (equivalent to the sequence identification), and the number of the test sequences and the encoding control parameters can be determined according to the specific test environment.
Then, according to the selected video compression mode and the reference frame structure, the image sequence numbers (Picture Order Count, POC for short) of the single test sequence are subjected to letter numbering (which is identical to the image sequence numbers), and the sequence letter numbers and POC numbers are superimposed at specific positions (positions can be determined according to actual conditions) of each original video frame.
Selecting a required video compression standard, generating a compression video stream special for testing according to a coding parameter table (mainly comprising coding control parameters such as grade level, quantization step length, code rate of a preset value and the like) recommended by the standard, and simultaneously recording the average peak signal-to-noise ratio and the code rate of the generated video stream.
The embodiment of the invention also provides a full-reference evaluation device for the objective quality of the video code stream, which can be used for testing two user sides needing to carry out video communication with each other. The method comprises the following steps: the device comprises a video stream generating unit, a video stream receiving and analyzing unit, a video stream reconstructing and displaying unit, a video collecting and frame information identifying unit and a video objective quality calculating unit. The video stream generation unit is connected with the video stream reconstruction and display unit and the video objective quality calculation unit; the video stream receiving and analyzing unit is connected with the video stream reconstruction and display unit; the video stream reconstruction and display unit is connected with the video acquisition and frame information identification unit; the video acquisition and frame information identification unit is connected with the video objective quality calculation unit; the video visitor quality calculating unit is connected with the video stream generating unit.
The video stream generation unit is deployed at the sending end and the receiving end simultaneously to generate a compressed video stream special for testing;
the video stream receiving and analyzing Unit is used for extracting parameters such as coding grade level, quantization step size, code rate parameter control parameters of a preset value, POC (POC (point of sale) and the like in a Network abstraction Layer Unit (NALU for short) in a video stream to analyze after the video stream is received by a receiving end, and synchronously generating a compressed video stream consistent with the currently received video stream according to the received parameters so as to be used by a video stream objective quality calculating Unit;
the video stream reconstruction and display unit completes the decoding and display functions of the video stream;
the video acquisition and frame information identification unit is used for collecting a locally acquired video sequence and identifying the currently received video sequence number and the POC number through the acquisition of the video display unit and the image signal identification of a designated area so as to be used by the video objective quality calculation unit;
the video objective quality calculating unit is used for synchronously generating a consistent video stream by the video stream generating unit at the receiving end through detecting and identifying the currently received video stream so as to evaluate and calculate the objective quality of the video; or, directly extracting the consistent video stream and the original video data stored at the receiving end for objective quality evaluation and calculation of the video; meanwhile, objective quality indexes such as packet loss rate, bit error rate and the like of the video code stream during channel transmission can be calculated by utilizing the consistent video code stream.
The invention has the following beneficial effects:
on one hand, the invention directly superposes the type information of the video sequence and the corresponding image serial number as a part of the test video sequence image, thereby facilitating the identification of the video information at the receiving end; meanwhile, objective quality detection of the video stream under the full reference condition can be realized under the condition that the original video stream does not need to be transmitted independently and the network transmission load is not increased.
On the other hand, the device disclosed by the invention has the functions of outputting the original video stream, outputting the compressed video stream and receiving the input of the video stream, so that a single video communication node can be independently tested; or, a plurality of video communication nodes are tested simultaneously, the test environment can be configured separately aiming at the generation, transmission and reconstruction stages of the video stream, and a flexible configuration mode is provided.
On the other hand, the method can objectively and quantitatively analyze the video quality degradation introduced in the video compression, transmission and reconstruction processes, and avoids the subjectivity introduced by a subjective test method; meanwhile, the authoritative objective evaluation index can be adopted to accurately describe the video stream quality and the reliability of the video transmission system. The invention is not only suitable for various video communication systems, but also can be used for equipment evaluation of video acquisition and coding systems.
Embodiments of the present invention are described below.
The first embodiment is as follows:
the embodiment is a method for accurately detecting objective quality of a video stream transmitted through a network, and a hardware system used in the method comprises: at a sending end, the video code stream objective quality assessment device according to the embodiment of the present invention is connected to a video code stream sending device, as shown in fig. 7; at the receiving end, the video stream receiving apparatus is connected to the video stream objective quality assessment apparatus according to the embodiment of the present invention, as shown in fig. 7.
As shown in fig. 7 and fig. 4, the basic principle of the method of this embodiment is:
at a sending end, firstly, determining the video category and the specific sequence used by the current test, and simultaneously selecting corresponding coding control parameters according to the selected video compression method;
then, a video stream generating unit of the video code stream objective quality evaluation device is used for selecting a corresponding original video sequence, generating a video sequence number and an image serial number, overlaying the video sequence number and the image serial number on the original video sequence, and compressing the video sequence number and the image serial number into a compressed video stream special for testing by using a corresponding video encoder.
In order to save the video stream generation time, each type of original video sequence can be compressed into each type of video stream to be stored according to determined parameters in advance, when in use, the corresponding video stream is directly selected according to the video coding control parameters to be output, and finally, the video stream transmitting device is utilized to circularly transmit the special compressed video stream for testing which is not less than 10s into a network for transmission. The cycle time may be determined according to specific test requirements, and the transport format and protocol may be determined according to a specific network physical layer transport protocol.
The method of the invention can be adapted to the following situations; at a receiving end, firstly, a video stream receiving device is used for receiving and restoring a video stream sequence which can be processed by a video decoder according to a specific network transmission protocol;
then, the video code stream objective quality evaluation device is sent to process the video code stream objective quality evaluation device. The video stream receiving and analyzing unit extracts the NALU unit of the video code stream and analyzes the basic parameters of the extracted video stream, such as frame type, POC number, video resolution, quantization step size and the like, for subsequent analysis;
the video stream reconstruction and display unit decodes and stores the received compressed video stream and transmits the compressed video stream to the display equipment for playing;
the video acquisition and frame information identification unit acquires and stores the decoded video played on the display device, and extracts the information characters at the designated position to obtain the type and POC number of the decoded video for subsequent analysis;
the video object quality calculating unit extracts the corresponding original video sequence and the compressed video stream in the video stream generating unit according to the video stream parameters and the video sequences of various types output by the units, and calculates the objective quality evaluation result of the output video stream.
As shown in fig. 5, the method for extracting the corresponding original video sequence and the compressed video stream includes: binary code stream comparison and video frame information extraction. Both types of methods may be used separately or together to authenticate each other depending on the configuration of the detection environment.
The binary code stream comparison method needs to compress each type of original video sequence into each type of video stream in advance at a receiving end for storage; or after the type and the coding parameters of the received video stream are obtained, the receiving end video stream generating unit is called to generate the corresponding compressed video stream. The method specifically comprises the following steps: the binary code stream comparison method utilizes the parameters extracted by the video stream receiving and analyzing unit to limit the analysis range of the compressed video stream of the video receiving end. After a code stream structure of a test special compressed video stream of not less than 10s is obtained, the video object quality calculation unit compares each NALU unit of the code stream with the compressed video stream stored by the receiving end by using a binary comparator to obtain a compressed video stream matched with the received video code stream, and extracts a corresponding original video sequence and the compressed video stream for video object quality calculation.
The video frame information extraction rule extracts the video stream type and the picture sequence number directly from the decoded video sequence. The method specifically comprises the following steps: the video frame information extraction method extracts the video sequence number letter images superposed at the appointed position on the video frame, and performs related matching with the letter characteristics in the character characteristic library to determine the letter information and the corresponding sequence number, and extracts the corresponding original video sequence and the compressed video stream according to the letter information and the corresponding sequence number for the objective quality calculation of the video. The method is susceptible to video degradation caused by network packet loss, and if the reading is not accurate, an accurate identification result is obtained after the network recovers to be stable by prolonging the observation time.
Wherein the step of calculating the objective quality of the video from the received compressed video stream and the extracted corresponding original video sequence comprises:
decoding a received compressed video stream unit of not less than 10s to generate a video sequence;
judging whether POC number extracted by the unit analyzes the compressed video stream and generates packet loss or not;
if there is a packet loss, the lost video frame needs to be filled in the decoded video sequence. The padding method can use a frame copy method or a motion compensation method directly. Directly copying the previous video frame to the position of the current lost video frame by using a frame copying method; the motion compensation method completes the information of the current lost frame position according to the motion compensation relation of the previous frame and the next frame.
As shown in FIG. 6, after the decoded video sequence is aligned, the objective error between the decoded video sequence and the corresponding original video sequence can be calculated. All-reference video objective quality evaluation parameters such as Mean Squared Error (MSE), Peak Signal to Noise Ratio (PSNR), and structurally similar Mean (MSSIM) may be used.
Assuming that the video frame resolution is M × N (pixels), X denotes the original video sequence and Y denotes the decoded video sequence, the above evaluation parameters can be calculated as follows:
the MSE can be obtained as equation (10):
<math>
<mrow>
<mi>MSE</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>NM</mi>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>M</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<mo>[</mo>
<mi>X</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>Y</mi>
<msup>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>]</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>10</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein, X (i, j) is the pixel value of a certain frame image of the original video sequence at the point (i, j), and Y (i, j) represents the pixel value of the image of the corresponding frame of the decoded video sequence at the point (i, j).
PSNR can be obtained as in equation (11):
<math>
<mrow>
<mi>PSNR</mi>
<mo>=</mo>
<mn>10</mn>
<mi>log</mi>
<mo>[</mo>
<mfrac>
<mrow>
<mi>N</mi>
<mo>×</mo>
<mi>M</mi>
<mo>×</mo>
<msup>
<mn>255</mn>
<mn>2</mn>
</msup>
</mrow>
<mrow>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>M</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<mo>[</mo>
<mi>X</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>Y</mi>
<msup>
<mrow>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
</mrow>
<mn>2</mn>
</msup>
<mo>]</mo>
</mrow>
</mfrac>
<mo>]</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>11</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
where X (i, j) is the pixel value of a certain frame image of the original video sequence at point (i, j), Y (i, j) represents the pixel value of the image of the corresponding frame of the decoded video sequence at point (i, j), and 255 is the peak amplitude of the video signal under the sampling condition of 8 bits.
The MSSIM can be obtained according to equations (12), (13), (14), (15), (16):
first, a Structural Similarity Index (SSIM) parameter is calculated, and the expression is calculated as follows:
SSIM=[l(x,y)]α·[c(x,y)]β·[s(x,y)]γ (12)
where α, β, γ >0 are weighting coefficients of the luminance comparison function l (x, y), the contrast comparison function c (x, y), and the structure information comparison function s (x, y), respectively. The computational expressions for these three functions are as follows:
<math>
<mrow>
<mi>l</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mn>2</mn>
<msub>
<mi>μ</mi>
<mi>x</mi>
</msub>
<msub>
<mi>μ</mi>
<mi>y</mi>
</msub>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>1</mn>
</msub>
</mrow>
<mrow>
<msup>
<msub>
<mi>μ</mi>
<mi>x</mi>
</msub>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<msub>
<mi>μ</mi>
<mi>y</mi>
</msub>
<mn>2</mn>
</msup>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>1</mn>
</msub>
</mrow>
</mfrac>
<mo>,</mo>
<msub>
<mi>C</mi>
<mn>1</mn>
</msub>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>K</mi>
<mn>1</mn>
</msub>
<mi>L</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>13</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
<math>
<mrow>
<mi>c</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mn>2</mn>
<msub>
<mi>σ</mi>
<mi>x</mi>
</msub>
<msub>
<mi>σ</mi>
<mi>y</mi>
</msub>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>2</mn>
</msub>
</mrow>
<mrow>
<msup>
<msub>
<mi>σ</mi>
<mi>x</mi>
</msub>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<msub>
<mi>σ</mi>
<mi>y</mi>
</msub>
<mn>2</mn>
</msup>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>2</mn>
</msub>
</mrow>
</mfrac>
<mo>,</mo>
<msub>
<mi>C</mi>
<mn>2</mn>
</msub>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>K</mi>
<mn>2</mn>
</msub>
<mi>L</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>14</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
<math>
<mrow>
<mi>s</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>σ</mi>
<mi>xy</mi>
</msub>
<mi></mi>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>3</mn>
</msub>
</mrow>
<mrow>
<msub>
<mi>σ</mi>
<mi>xy</mi>
</msub>
<mo>+</mo>
<msub>
<mi>C</mi>
<mn>3</mn>
</msub>
</mrow>
</mfrac>
<mo>,</mo>
<msub>
<mi>C</mi>
<mn>3</mn>
</msub>
<mo>=</mo>
<msub>
<mi>C</mi>
<mn>2</mn>
</msub>
<mo>/</mo>
<mn>2</mn>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>15</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein,representing the average luminance of the original video sequence,represents the average luminance of the decoded video sequence;representing the standard deviation of the original video sequence, <math>
<mrow>
<msub>
<mi>σ</mi>
<mi>y</mi>
</msub>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>μ</mi>
<mi>y</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>)</mo>
</mrow>
<mrow>
<mn>1</mn>
<mo>/</mo>
<mn>2</mn>
</mrow>
</msup>
</mrow>
</math> represents the standard deviation of the decoded video sequence; <math>
<mrow>
<msub>
<mi>σ</mi>
<mi>xy</mi>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>μ</mi>
<mi>y</mi>
</msub>
<mo>)</mo>
</mrow>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>μ</mi>
<mi>y</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</math> representing the covariance of the original video sequence and the decoded video sequence; c1、C2、C3Is a constant number, K1、K2<<1. L represents the dynamic range of the pixel value. When L is 255, the video image is an 8-bit image.
The quality assessment value of the entire video frame is expressed by a structural similarity Mean (MSSIM):
<math>
<mrow>
<mi>MSSIM</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>M</mi>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>M</mi>
</munderover>
<mi>SSIM</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>k</mi>
</msub>
<mo>,</mo>
<msub>
<mi>y</mi>
<mi>k</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>16</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
where M is the number of local windows (the size of the local window is 8X8) in a frame of video image, XkAnd ykIs the content of the kth local window video frame.
The step of calculating the video network transmission quality according to the received compressed video stream and the extracted corresponding locally stored compressed video stream specifically comprises: and analyzing whether packet loss occurs in the received compressed video stream according to the POC number extracted by the unit from a compressed video stream unit not less than 10 s. The packet loss rate in video network transmission can be calculated according to equation (17):
meanwhile, the bit error rate of the video compression source in network transmission can be calculated according to the formula (18):
because the network state is unstable in the network transmission process, in order to ensure the accuracy of the test result, a large number of tests can be carried out in a circulating way, and the objective quality evaluation results of the video sequences generated by decoding in a large number of compressed video stream units of not less than 10s are output after being subjected to arithmetic mean.
Example two:
the embodiment is an evaluation method for accurately detecting the compression quality of a video encoder in a video communication system. The hardware system used by the method comprises: the objective quality assessment device for video code streams according to the embodiment of the present invention is connected to a video encoder device, as shown in fig. 8.
As shown in fig. 8 and 3, the method of the embodiment includes:
firstly, generating an original video sequence by a video stream generating unit in the video code stream objective quality evaluation device according to the input test video category and parameters;
then, the sequence is sent to a video coding device to be evaluated to generate a compressed video stream;
and then, the compressed video stream is sent back to the video code stream objective quality evaluation device and processed by the video stream receiving and analyzing unit and the video stream reconstruction and display unit to generate a decoded video sequence.
And finally, the video objective quality calculation unit refers to the original video sequence and the decoded video sequence, and calculates all-reference video objective quality evaluation parameters such as MSE, PSNR and MSSIM.
The method can conveniently realize the evaluation of the compression quality of the video encoder in the video stream generation stage in the video communication system.
Example three:
the embodiment is an evaluation method for accurately detecting the video reconstruction quality of video playing and display equipment in the video code stream reconstruction stage in a video communication system. The hardware system used by the method comprises: the objective quality evaluation device for video code streams in the embodiment of the invention is connected with an independent playing and displaying device, as shown in fig. 9.
As shown in fig. 9 and fig. 3, the method of this embodiment includes:
firstly, a video stream generating unit in the video code stream objective quality evaluation device generates an original video sequence according to the input test video category and parameters;
then, the sequence is sent to a video stream reconstruction and display unit, and the unit controls an external independent video playing and display device to play the original video sequence;
then, a video acquisition and frame information identification unit is used for acquiring, identifying and storing a video sequence acquired from an external independent video playing and displaying device;
and finally, calculating full-reference video objective quality evaluation parameters such as MSE, PSNR, MSSIM and the like by the video objective quality calculation unit by referring to the original video sequence and the stored playing video sequence.
The video serial number and the image serial number are superposed in the process of generating the video original video sequence, and the identification and alignment of the collected video sequence and the original video sequence can be realized in the objective quality evaluation of the video, so that the quality evaluation of the video generated in the video reconstruction stage in the video communication system can be conveniently realized.
The invention also provides a quality evaluation method of the video stream, which can be applied to a sending end and comprises the following steps:
acquiring an original video sequence, a sequence identifier corresponding to the original video sequence and an image sequence number corresponding to the original video sequence;
and generating an original compressed video stream corresponding to the original video sequence according to the sequence identifier and the image sequence number.
The step of generating the original compressed video stream corresponding to the original video sequence according to the sequence identifier and the image sequence number specifically includes:
and superposing the sequence identification and the image sequence number on the frames of the original video sequence to generate an original compressed video stream.
The invention also provides a quality evaluation device of video stream, which can be arranged at a sending end and comprises:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an original video sequence, a sequence identifier corresponding to the original video sequence and an image serial number corresponding to the original video sequence;
and the generating unit is used for generating an original compressed video stream corresponding to the original video sequence according to the sequence identification and the image sequence number.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.