CN113873226A

CN113873226A - Encoding and decoding quality testing method and device, computer equipment and storage medium

Info

Publication number: CN113873226A
Application number: CN202111068560.7A
Authority: CN
Inventors: 文锐烽; 于雪松; 熊磊
Original assignee: Shenzhen Huantai Technology Co Ltd
Current assignee: Shenzhen Huantai Technology Co Ltd
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2021-12-31

Abstract

The application relates to a coding and decoding quality testing method, a coding and decoding quality testing device, computer equipment and a storage medium. The method comprises the following steps: responding to a test request aiming at a product to be tested, and acquiring audio data to be analyzed and video data to be analyzed which are output by the product to be tested; extracting frames of audio data to be analyzed and video data to be analyzed, and acquiring at least one video frame to be tested with a timestamp and at least one audio frame to be tested with the timestamp; extracting a sample video frame with the same timestamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity; and if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be detected is qualified. The method can improve the efficiency of testing the coding and decoding quality.

Description

Encoding and decoding quality testing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of multimedia technologies, and in particular, to a method and an apparatus for testing encoding and decoding quality, a computer device, and a storage medium.

Background

In a conventional codec quality test, a person is arranged to watch the playing condition of an output signal of a product to be tested, so as to judge the codec quality (including coding quality or decoding quality) of the product to be tested. However, during manual testing, each person can only test one product to be tested at a time, and the efficiency is low.

Disclosure of Invention

The application provides a coding and decoding quality testing method, a coding and decoding quality testing device, computer equipment and a storage medium, which can improve coding and decoding quality testing efficiency.

A method of codec quality testing, the method comprising:

responding to a test request aiming at a product to be tested, and acquiring audio data to be analyzed and video data to be analyzed which are correspondingly output by the product to be tested based on input audio and video composite data;

extracting frames from the audio data to be analyzed and the video data to be analyzed to obtain at least one video frame to be tested with a timestamp and at least one audio frame to be tested with a timestamp;

extracting a sample video frame with the same timestamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; the sample data is data obtained by performing standard decoding on the audio and video composite data;

extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity; and if the image similarity reaches an image similarity threshold and the audio similarity reaches a first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

An apparatus for codec quality testing, the apparatus comprising:

the device comprises a to-be-analyzed data acquisition module, a to-be-analyzed data acquisition module and a to-be-analyzed data acquisition module, wherein the to-be-analyzed data acquisition module is used for responding to a test request aiming at a to-be-detected product and acquiring audio data to be analyzed and video data to be analyzed, which are correspondingly output by the to-be-detected product based on input audio and video composite data;

the frame extracting module is used for extracting frames from the audio data to be analyzed and the video data to be analyzed to obtain at least one video frame to be detected with a timestamp and at least one audio frame to be detected with a timestamp;

the image comparison module is used for extracting a sample video frame with the same timestamp as the video frame to be detected from the sample data and comparing the sample video frame with the video frame to be detected to obtain the image similarity; the sample data is data obtained by performing standard decoding on the audio and video composite data;

the audio comparison module is used for extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity;

and the evaluation module is used for determining that the coding and decoding quality of the product to be tested is qualified when the image similarity reaches an image similarity threshold and the audio similarity reaches a first audio similarity threshold.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

The coding and decoding quality testing method, the device, the computer equipment and the storage medium respond to the testing request, obtain audio data to be analyzed and video data to be analyzed which are correspondingly provided by a product to be tested based on input audio and video composite data, extract frames of the audio data to be analyzed and the video data to be analyzed, extract at least one video frame to be tested with a timestamp and at least one audio frame to be tested with a timestamp, extract a sample video frame with the same timestamp as the video frame to be tested from sample data to perform image comparison on the video frame to be tested to obtain image similarity, extract a sample audio frame with the same timestamp as the audio frame to be tested from the sample data to perform audio comparison on the audio frame to obtain audio similarity, if the image similarity reaches an image similarity threshold and the audio similarity reaches a first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified, realizing automatic and objective encoding and decoding quality test, and only analyzing the extracted frames, reducing data analysis amount and improving test efficiency.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the conventional technologies of the present application, the drawings used in the descriptions of the embodiments or the conventional technologies will be briefly introduced below, it is obvious that the drawings in the following descriptions are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram illustrating an exemplary embodiment of an application environment of a codec quality testing method;

FIG. 2 is a flowchart illustrating a method for testing encoding and decoding quality according to an embodiment;

FIG. 3 is a second flowchart of the codec quality testing method according to an embodiment;

FIG. 4 is a third flowchart illustrating a method for testing codec quality according to an embodiment;

FIG. 5 is a fourth flowchart illustrating a method for testing encoding and decoding quality according to an embodiment;

FIG. 6 is a fifth flowchart illustrating a method for testing encoding and decoding quality according to an embodiment;

FIG. 7 is a sixth flowchart illustrating a method for codec quality testing according to an embodiment;

FIG. 8 is a seventh flowchart illustrating a codec quality testing method according to an embodiment;

FIG. 9 is an eighth flowchart illustrating a codec quality testing method according to an embodiment;

FIG. 10 is a block diagram of an exemplary codec quality testing apparatus;

FIG. 11 is a block diagram of an apparatus for testing codec quality in another embodiment;

FIG. 12 is a block diagram of an embodiment of an image detection module;

FIG. 13 is a block diagram showing the structure of an apparatus for testing the quality of encoding and decoding in still another embodiment;

FIG. 14 is a block diagram of an embodiment of an audio detection module;

FIG. 15 is a block diagram of an audio detection module according to another embodiment;

FIG. 16 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Embodiments of the present application are set forth in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various features, but these features are not limited by these terms. These terms are only used to distinguish one feature from another.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

The embodiment of the application provides a coding and decoding quality testing method, which is applied to an application environment shown in fig. 1. The test terminal 101 interacts with the product 102 to be tested, obtains audio data to be analyzed and video data to be analyzed, which are output by the product 102 to be tested based on input audio and video composite data, performs data analysis, and judges the encoding and decoding quality of the product to be tested. In one embodiment, the product under test 102 may be a multimedia codec, a Mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a wearable Device (e.g., a smart watch, a smart bracelet, etc.), or an electronic Device with a multimedia coding function and/or a decoding function.

As shown in fig. 2, the present application provides a method for testing coding and decoding quality, which includes steps 201 to 208:

step 201, in response to a test request for a product to be tested, obtaining audio data to be analyzed and video data to be analyzed, which are output by the product to be tested based on input audio and video composite data.

The encoding and decoding quality test comprises an encoding quality test and a decoding quality test, and when the test request is the encoding quality test request, the finally output test result is the encoding quality test result; and when the test request is a decoding quality test request, the finally output test result is the decoding quality test result. The audio and video composite data is the coded data of the audio and video. The audio and video composite data used for testing are coded/decoded by the product to be tested, then audio data to be analyzed and video data to be analyzed are output, the test terminal responds to a test request aiming at the product to be tested, and the audio data to be analyzed and the video data to be analyzed which are output by the product to be tested are obtained and are used for coding and decoding quality analysis.

Step 202, performing frame extraction on the audio data to be analyzed and the video data to be analyzed, and acquiring at least one video frame to be detected with a timestamp and at least one audio frame to be detected with a timestamp.

And respectively extracting frames of the audio data to be analyzed and the video data to be analyzed, namely extracting at least one frame of audio frame to be detected with a time stamp from the audio data to be analyzed, and extracting at least one frame of video frame to be detected with a time stamp from the video data to be analyzed. In one embodiment, the frame extraction may be to randomly jump (fast forward or fast backward) the audio data to be analyzed or the video data to be analyzed to a certain timestamp, and extract a corresponding frame of the timestamp as the audio frame to be detected/the video frame to be detected.

Step 203, extracting a sample video frame with the same time stamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain the image similarity.

The sample data is data obtained by performing standard decoding on the audio and video composite data, the standard decoding refers to performing decoding processing by adopting equipment with qualified decoding quality, and the sample data comprises sample audio data

And extracting a sample video frame with the same time stamp as the video frame to be detected obtained by frame extraction from the sample data, and performing image similarity comparison on the sample video frame with the same time stamp and the video frame to be detected to obtain the image similarity. In one embodiment, the image similarity comparison may be implemented by using algorithms such as a hash algorithm, a feature point detection algorithm, a peak signal to noise ratio (PSNR) algorithm, an image structure similarity algorithm (SSIM) algorithm, and the like.

And step 204, extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain the audio similarity.

And extracting a sample audio frame with the same time stamp as the audio frame to be detected obtained by frame extraction from the sample data, and performing audio similarity comparison on the sample audio frame with the same time stamp and the audio frame to be detected to obtain audio similarity. In one embodiment, the audio similarity comparison may be implemented by using algorithms such as voiceprint comparison, time domain-based similarity comparison, frequency domain-based similarity comparison, and the like.

Step 205, comparing whether the image similarity reaches an image similarity threshold.

Step 206, comparing whether the audio similarity reaches a first audio similarity threshold.

The image similarity threshold is a preset reference value used for evaluating whether the similarity between the video frame to be tested and the sample video frame meets the encoding and decoding quality qualification requirement or not; the first audio similarity threshold is a preset reference value used for evaluating whether the similarity between the audio frame to be tested and the sample audio frame meets the qualified requirement of the encoding and decoding quality.

And comparing the image similarity corresponding to each video frame to be detected with an image similarity threshold, and judging that the video coding and decoding quality is qualified if the acquired image similarity corresponding to each video frame to be detected reaches the image similarity threshold. And comparing the audio similarity corresponding to each audio frame to be tested with a first audio similarity threshold, and if the acquired audio similarity corresponding to each audio frame to be tested reaches the first audio similarity threshold, judging that the audio coding and decoding quality is qualified.

And step 207, if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

When the image similarity corresponding to each video frame to be tested reaches the image similarity threshold value and the audio similarity corresponding to each audio frame to be tested reaches the first audio similarity threshold value, that is, the encoding and decoding quality of the product to be tested for the video and the audio is qualified, the encoding and decoding quality of the product to be tested can be determined to be qualified.

And 208, if the image similarity does not reach the image similarity threshold and/or the audio similarity does not reach the image similarity threshold, determining that the coding and decoding quality of the product to be detected is unqualified.

And when the image similarity corresponding to any video frame to be tested does not reach the image similarity threshold value and/or the audio similarity corresponding to any audio frame to be tested does not reach the first audio similarity threshold value, determining that the coding and decoding quality of the product to be tested is unqualified.

The coding and decoding quality testing method comprises the steps of responding to a testing request, obtaining audio data to be analyzed and video data to be analyzed which are correspondingly provided by a product to be tested based on input audio and video composite data, extracting frames of the audio data to be analyzed and the video data to be analyzed, extracting at least one video frame to be tested with a timestamp and at least one audio frame to be tested with a timestamp, extracting a sample video frame with the same timestamp as the video frame to be tested from sample data, comparing the images of the video frame to be tested to obtain image similarity, extracting a sample audio frame with the same timestamp as the audio frame to be tested from the sample data, comparing the audio frame to be tested to obtain audio similarity, and determining that the coding and decoding quality of the product to be tested is qualified if the image similarity reaches an image similarity threshold and the audio similarity reaches a first audio similarity threshold, the automatic and objective coding and decoding quality test is realized, and only the extracted frames are analyzed, so that the data analysis amount is reduced, and the test efficiency is improved.

In one embodiment, frame extraction is carried out on video data to be analyzed at intervals, image similarity comparison is carried out on each extracted video frame to be detected and a sample video frame corresponding to each extracted video frame to be detected, and if the image similarity of each video frame to be detected reaches an image similarity threshold value, the encoding and decoding quality of a product to be detected to a video part is determined to be qualified; in addition, frame extraction is carried out on the audio data to be analyzed at intervals, audio similarity comparison is carried out on each extracted audio frame to be tested and the corresponding sample audio frame, if the audio similarity of each audio frame to be tested reaches a first audio similarity threshold value, the encoding and decoding quality of the product to be tested to the audio part is determined to be qualified, and when the encoding and decoding quality of the product to be tested to the video part and the audio part is determined to be qualified, the encoding and decoding quality of the product to be tested is determined to be qualified. The interval frame extraction may be equal interval frame extraction, for example, extracting one frame every 5 frames as an audio frame to be detected/a video frame to be detected; the interval frame extraction may also be non-equal interval frame extraction, for example, the audio frame/video frame to be detected extracted for the second time is 2 frames apart from the audio frame/video frame to be detected extracted for the previous time, the audio frame/video frame to be detected extracted for the third time is 5 frames apart from the audio frame/video frame to be detected extracted for the previous time, and the number of the interval frames extracted for each time may be increased or decreased compared with the previous time.

In one embodiment, if the test request is a decoding quality test request, the audio data to be analyzed and the video data to be analyzed are output data after the audio and video composite data are directly decoded by a product to be tested; and if the test request is a coding quality test request, the audio data to be analyzed and the video data to be analyzed are output data after the audio and video composite data are decoded, coded and decoded sequentially by the product to be tested.

And for the product to be tested for the decoding quality test, directly decoding the audio and video composite data by the product to be tested, and outputting the audio data to be analyzed and the video data to be analyzed, wherein the final result of the encoding and decoding quality test is the decoding quality of the product to be tested.

For the product to be tested for testing the coding quality, the product to be tested decodes, codes and decodes the audio and video composite data in sequence to output audio data to be analyzed and video data to be analyzed. Because the audio and video composite data are coded data and need to be decoded to be used as an input resource for coding processing, in order to avoid the influence of the decoding processing on a coding quality test result, the first decoding and the second decoding are both carried out by the product to be tested, the consistency of decoding is ensured, and finally output audio data to be analyzed and video data to be analyzed can objectively reflect the coding quality of the product to be tested.

As shown in fig. 3, in one embodiment, the codec quality testing method includes steps 301 to 307:

step 301, in response to a test request for a product to be tested, obtaining audio data to be analyzed and video data to be analyzed, which are output by the product to be tested based on input audio and video composite data;

step 302, performing frame extraction on audio data to be analyzed and video data to be analyzed to obtain at least one video frame to be detected with a timestamp and at least one audio frame to be detected with a timestamp;

step 303, extracting a sample video frame with the same time stamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; the sample data is data obtained by standard decoding of the audio and video composite data;

step 304, extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity;

step 305, comparing whether the image similarity reaches an image similarity threshold.

And step 306, if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

And 307, if the image similarity does not reach the image similarity threshold, performing image quality problem detection on the video frame to be detected.

When the image similarity corresponding to any video frame to be detected does not reach the image similarity threshold, namely the video coding and decoding quality of the product to be detected is unqualified, the image quality problem detection can be carried out on the video frame to be detected, of which the image similarity does not reach the image similarity threshold, so as to determine the coding and decoding quality problem type of the product to be detected.

As shown in fig. 4, in one embodiment, the codec quality testing method includes steps 401 to 407:

step 401, in response to a test request for a product to be tested, acquiring audio data to be analyzed and video data to be analyzed, which are output by the product to be tested based on input audio and video composite data;

step 402, performing frame extraction on audio data to be analyzed and video data to be analyzed to obtain at least one video frame to be detected with a timestamp and at least one audio frame to be detected with a timestamp;

step 403, extracting a sample video frame with the same timestamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; the sample data is data obtained by standard decoding of the audio and video composite data;

step 404, extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity;

step 405, comparing whether the image similarity reaches an image similarity threshold.

And 406, if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

Step 407, if the image similarity does not reach the image similarity threshold, performing pixel point detection on the video frame to be detected, and if the black pixel point occupation ratio in the video frame to be detected is higher than the pixel point proportion threshold, determining that the video frame to be detected is a black screen frame.

The pixel point detection is to identify whether each pixel point is a black pixel point by detecting the pixel value, the gray value or the RGB value of each pixel point, and record the number of the black pixel points in the image. And comparing the number of the black pixel points with the total pixel point amount of the image, and if the occupation ratio of the black pixel points is higher than a preset pixel point proportion threshold value, determining that the video frame to be detected is a black screen frame, namely the black screen problem occurs in the video frame to be detected.

According to the method and the device, when the video frame to be detected does not meet the qualified coding and decoding quality requirement, the condition that the quality problem type is a black screen can be identified.

As shown in fig. 5, in one embodiment, the codec quality testing method includes steps 501 to 509:

step 501, responding to a test request aiming at a product to be tested, and acquiring audio data to be analyzed and video data to be analyzed which are correspondingly output by the product to be tested based on input audio and video composite data;

step 502, performing frame extraction on audio data to be analyzed and video data to be analyzed to obtain at least one video frame to be detected with a timestamp and at least one audio frame to be detected with a timestamp;

step 503, extracting a sample video frame with the same time stamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; the sample data is data obtained by standard decoding of the audio and video composite data;

step 504, extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity;

and 505, comparing whether the image similarity reaches an image similarity threshold value.

Step 506, if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

Step 507, if the image similarity does not reach the image similarity threshold, the video frame to be detected is divided into a plurality of image sub-regions, and the sample video frame with the same timestamp as the video frame to be detected is divided into the sample image sub-regions with the same number as the image sub-regions.

And step 508, comparing the similarity of each image subregion with the corresponding sample image subregion to obtain the region similarity.

In step 509, if the number of image sub-regions with the region similarity lower than the image similarity threshold is lower than the total number of image sub-regions, it is determined that the video frame to be detected is a screen-splash frame, and the image sub-regions with the region similarity lower than the image similarity threshold are screen-splash regions.

The number of the image sub-regions with the region similarity lower than the image similarity threshold is lower than the total number of the image sub-regions, that is, the image sub-regions with the region similarity lower than the image similarity threshold in the video frame to be detected still have the region similarity with the corresponding sample image sub-regions, so that the problem of screen splash of the video frame to be detected can be determined, the video frame to be detected is a screen splash frame, and the image sub-regions with the region similarity lower than the image similarity threshold are screen splash regions in the video frame to be detected.

According to the method and the device, when the video frame to be detected does not reach the qualified coding and decoding quality requirement, the condition that the type of the quality problem is the screen splash can be identified, and the area where the screen splash specifically appears is located.

As shown in fig. 6, in one embodiment, the codec quality testing method includes steps 601-607:

601, responding to a test request aiming at a product to be tested, and acquiring audio data to be analyzed and video data to be analyzed which are correspondingly output by the product to be tested based on input audio and video composite data;

step 602, performing frame extraction on audio data to be analyzed and video data to be analyzed to obtain at least one video frame to be tested with a timestamp and at least one audio frame to be tested with a timestamp;

step 603, extracting a sample video frame with the same time stamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; the sample data is data obtained by standard decoding of the audio and video composite data;

step 604, extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity;

step 605, compare whether the audio similarity reaches the first audio similarity threshold.

Step 606, if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

Step 607, if the audio similarity does not reach the first audio similarity threshold, performing audio quality problem detection on the audio frame to be detected.

When the audio similarity corresponding to any audio frame to be tested does not reach the first audio similarity threshold, namely the audio coding and decoding quality of the product to be tested is unqualified, at the moment, the audio quality problem detection can be carried out on the audio frame to be tested, of which the audio similarity does not reach the first audio similarity threshold, so as to determine the coding and decoding quality problem type of the product to be tested.

As shown in fig. 7, in one embodiment, the codec quality testing method includes steps 701 to 708:

step 701, responding to a test request aiming at a product to be tested, and acquiring audio data to be analyzed and video data to be analyzed which are correspondingly output by the product to be tested based on input audio and video composite data;

step 702, performing frame extraction on audio data to be analyzed and video data to be analyzed to obtain at least one video frame to be detected with a timestamp and at least one audio frame to be detected with a timestamp;

703, extracting a sample video frame with the same time stamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; the sample data is data obtained by standard decoding of the audio and video composite data;

step 704, extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity;

step 705, comparing whether the audio similarity reaches a first audio similarity threshold.

Step 706, if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

In step 707, if the audio similarity does not reach the first audio similarity threshold, comparing whether the audio similarity reaches a second audio similarity threshold.

In step 708, if the audio similarity reaches the second audio similarity threshold, it is determined that the audio frame to be tested is a noise-doped audio frame.

The second audio similarity threshold is lower than the first audio similarity threshold, and the second audio similarity threshold is a preset reference value used for judging whether the audio frame to be detected is correlated with the sample audio frame. If the audio similarity between the audio frame to be detected and the sample audio frame does not reach the first audio similarity threshold but reaches the second audio similarity threshold, it indicates that the audio frame to be detected and the sample audio frame have correlation, but noise is doped so that the similarity between the audio frame to be detected and the sample audio frame does not reach the first audio similarity threshold.

As shown in fig. 8, in one embodiment, the codec quality testing method includes steps 801-810:

step 801, responding to a test request aiming at a product to be tested, and acquiring audio data to be analyzed and video data to be analyzed which are correspondingly output by the product to be tested based on input audio and video composite data;

step 802, performing frame extraction on audio data to be analyzed and video data to be analyzed to obtain at least one video frame to be detected with a timestamp and at least one audio frame to be detected with a timestamp;

step 803, extracting a sample video frame with the same time stamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; the sample data is data obtained by standard decoding of the audio and video composite data;

step 804, extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity;

step 805, compare whether the audio similarity reaches a first audio similarity threshold.

Step 806, if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

In step 807, if the audio similarity does not reach the first audio similarity threshold, it is compared whether the audio similarity reaches the second audio similarity threshold.

Step 808, if the audio similarity does not reach the second audio similarity threshold, comparing the volume value of the audio frame to be tested with the volume threshold.

Step 809, if the volume value of the audio frame to be tested reaches the volume threshold, determining that the audio frame to be tested is a pure noise audio frame.

In step 810, if the volume value of the audio frame to be tested does not reach the volume threshold, it is determined that the audio frame to be tested is a mute audio frame.

And the second audio similarity threshold is lower than the first audio similarity threshold, and the volume threshold is a preset reference value for judging whether the audio frame to be detected is silent. When the audio similarity of the audio frame to be detected does not reach the second audio similarity threshold, the audio frame to be detected may be a pure noise audio frame or a mute audio frame, and the quality problem type of the audio frame to be detected can be determined by comparing the volume value of the audio frame to be detected with the volume threshold. If the volume value of the audio frame to be tested does not reach the volume threshold value, determining the audio frame to be tested as a mute audio frame; and if the volume value of the audio frame to be detected reaches the volume threshold value, determining that the audio frame to be detected is a pure noise audio frame.

As shown in fig. 9, in one embodiment, the codec quality testing method includes steps 901 to 909:

step 901, responding to a test request aiming at a product to be tested, and acquiring audio data to be analyzed and video data to be analyzed which are correspondingly output by the product to be tested based on input audio and video composite data;

step 902, performing frame extraction on audio data to be analyzed and video data to be analyzed to obtain at least one video frame to be detected with a timestamp and at least one audio frame to be detected with a timestamp;

step 903, extracting a sample video frame with the same time stamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; the sample data is data obtained by standard decoding of the audio and video composite data;

step 904, extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity;

step 905, comparing whether the audio similarity reaches a first audio similarity threshold.

Step 906, if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

In step 907, if the audio similarity does not reach the first audio similarity threshold, it is compared whether the audio similarity reaches the second audio similarity threshold.

Step 908, if the audio similarity does not reach the second audio similarity threshold, comparing the adjacent frame set of the sample audio frame with the audio frame to be tested;

in step 909, if the similarity between any frame in the adjacent frame set and the audio frame to be analyzed reaches the first audio similarity threshold, it is determined that audio offset occurs in the audio data to be analyzed.

Wherein the second audio similarity threshold is lower than the first audio similarity threshold. The adjacent frame set of the sample audio frame comprises a plurality of audio frames which are adjacent to each other and/or a plurality of audio frames which are adjacent to each other and are sequentially arranged from the sample audio frame as a starting point. When the audio similarity between the audio frame to be analyzed and the corresponding sample audio frame does not reach the first audio similarity threshold and does not reach the second audio similarity threshold, possibly because audio offset occurs to the audio data to be analyzed, namely, the time stamp of each audio frame shifts forwards or backwards, the audio offset can cause that the audio and the image are not synchronous, in order to judge whether the encoding and decoding quality is unqualified due to the audio offset, the adjacent frame set of the sample audio frame can be used for comparing with the audio frame to be analyzed, if the similarity between any one frame in the adjacent frame set and the audio frame to be analyzed reaches the first audio similarity threshold, the time stamp of the audio frame in the adjacent frame set is the correct time stamp of the audio frame to be analyzed, the audio offset of the audio data to be analyzed can be determined, and based on the difference between the sequence of the audio frame and the sample audio frame and the time stamps, it may be determined whether the audio data to be analyzed is shifted forward or backward and the shift time is determined.

It should be understood that although the various steps in the flow charts of fig. 2-9 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-9 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

As shown in fig. 10, an embodiment of the present application further provides an encoding and decoding quality testing apparatus 1000, including:

the to-be-analyzed data acquisition module 1010 is configured to respond to a test request for a to-be-analyzed product, and acquire to-be-analyzed audio data and to-be-analyzed video data, which are output by the to-be-analyzed product based on input audio and video composite data;

a frame extracting module 1020, configured to extract frames from the audio data to be analyzed and the video data to be analyzed, and obtain at least one video frame to be detected with a timestamp and at least one audio frame to be detected with a timestamp;

the image comparison module 1030 is configured to extract a sample video frame having the same timestamp as the video frame to be detected from the sample data, and compare the sample video frame with the video frame to be detected to obtain an image similarity; the sample data is data obtained by standard decoding of the audio and video composite data;

the audio comparison module 1040 is configured to extract a sample audio frame having the same timestamp as the audio frame to be detected from the sample data, and compare the sample audio frame with the audio frame to be detected to obtain audio similarity;

and the evaluation module 1050 is configured to determine that the coding and decoding quality of the product to be tested is qualified when the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold.

As shown in fig. 11, in one embodiment, the codec quality testing apparatus 1000 further includes:

the image detection module 1060 is configured to perform image quality problem detection on the video frame to be detected when the image similarity corresponding to the video frame to be detected does not reach the image similarity threshold.

In one embodiment, the image detection module 1060 is configured to perform pixel point detection on the video frame to be detected when the image similarity corresponding to the video frame to be detected does not reach the image similarity threshold, and determine that the video frame to be detected is a black screen frame if the percentage of black pixel points in the video frame to be detected is higher than the pixel point proportion threshold.

As shown in FIG. 12, in one embodiment, the image detection module 1060 includes:

an image dividing unit 1061, configured to divide a video frame to be detected into a plurality of image sub-regions, and divide a sample video frame having the same timestamp as the video frame to be detected into sample image sub-regions having the same number as the image sub-regions;

an image area comparison unit 1062, configured to compare similarity between each image sub-area and a corresponding sample image sub-area, to obtain an area similarity;

the screen-splash frame identification unit 1063 is configured to determine that the video frame to be detected is a screen-splash frame when the number of the image sub-areas with the area similarity lower than the image similarity threshold is lower than the total number of the image sub-areas, and determine that the image sub-area with the area similarity lower than the image similarity threshold is a screen-splash area.

As shown in fig. 13, in one embodiment, the codec quality testing apparatus 1000 further includes:

the audio detection module 1070 is configured to perform audio quality problem detection on the audio frame to be detected if the audio similarity corresponding to the audio frame to be detected does not reach the first audio similarity threshold.

In one embodiment, the audio detection module 1070 is configured to determine that the audio frame to be detected is a noise-doped audio frame when the audio similarity reaches the second audio similarity threshold; wherein the second audio similarity threshold is lower than the first audio similarity threshold.

As shown in fig. 14, in one embodiment, the audio detection module 1070 includes:

the volume comparison unit 1071 is configured to compare the volume value of the audio frame to be detected with the volume threshold value when the audio similarity of the audio frame to be detected does not reach the second audio similarity threshold value; wherein the second audio similarity threshold is lower than the first audio similarity threshold;

the pure noise audio frame detection unit 1072 is configured to determine that the audio frame to be detected is a pure noise audio frame when the volume value of the audio frame to be detected reaches the volume threshold;

the silent audio frame detection unit 1073 is configured to determine that the audio frame to be detected is a silent audio frame when the volume value of the audio frame to be detected does not reach the volume threshold.

As shown in fig. 15, in one embodiment, the audio detection module 1070 includes:

the adjacent frame comparison unit 1074 is configured to compare the adjacent frame set of the sample audio frame with the audio frame to be detected when the audio similarity does not reach the second audio similarity threshold; wherein the second audio similarity threshold is lower than the first audio similarity threshold;

the audio offset detection unit 1075 is configured to determine that audio offset occurs in the audio data to be analyzed when the similarity between any one frame in the adjacent frame set and the audio frame to be analyzed reaches a first audio similarity threshold.

For the specific limitations of the codec quality testing apparatus, reference may be made to the above limitations of the codec quality testing method, which is not described herein again. All or part of the modules in the coding and decoding quality testing device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, and the computer device may be a terminal, and the internal structure diagram thereof may be as shown in fig. 16. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a codec quality testing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 16 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

extracting frames of audio data to be analyzed and video data to be analyzed, and acquiring at least one video frame to be tested with a timestamp and at least one audio frame to be tested with the timestamp;

extracting a sample video frame with the same timestamp as the video frame to be detected from the sample data, and comparing the sample video frame with the video frame to be detected to obtain image similarity; the sample data is data obtained by standard decoding of the audio and video composite data;

extracting a sample audio frame with the same time stamp as the audio frame to be detected from the sample data, and comparing the sample audio frame with the audio frame to be detected to obtain audio similarity;

and if the image similarity reaches the image similarity threshold and the audio similarity reaches the first audio similarity threshold, determining that the encoding and decoding quality of the product to be detected is qualified.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

and if the image similarity does not reach the image similarity threshold, detecting the image quality problem of the video frame to be detected.

and detecting pixel points of the video frame to be detected, and if the black pixel point occupation ratio in the video frame to be detected is higher than a pixel point proportion threshold value, determining that the video frame to be detected is a black screen frame.

dividing a video frame to be detected into a plurality of image sub-areas, and dividing a sample video frame with the same time stamp as the video frame to be detected into sample image sub-areas with the same number as the image sub-areas;

comparing the similarity of each image subregion with the corresponding sample image subregion to obtain the region similarity;

if the number of the image sub-regions with the region similarity lower than the image similarity threshold is lower than the total number of the image sub-regions, determining that the video frame to be detected is a screen-splash frame, and determining that the image sub-regions with the region similarity lower than the image similarity threshold are screen-splash regions.

and if the audio similarity does not reach the first audio similarity threshold, performing audio quality problem detection on the audio frame to be detected.

if the audio similarity reaches a second audio similarity threshold, determining the audio frame to be detected as a noise-doped audio frame; wherein the second audio similarity threshold is lower than the first audio similarity threshold.

if the audio similarity of the audio frame to be detected does not reach the second audio similarity threshold value, comparing the volume value of the audio frame to be detected with the volume threshold value; wherein the second audio similarity threshold is lower than the first audio similarity threshold;

if the volume value of the audio frame to be detected reaches the volume threshold value, determining that the audio frame to be detected is a pure noise audio frame;

and if the volume value of the audio frame to be detected does not reach the volume threshold value, determining that the audio frame to be detected is a mute audio frame.

if the audio similarity does not reach the second audio similarity threshold value, comparing the adjacent frame set of the sample audio frame with the audio frame to be detected; wherein the second audio similarity threshold is lower than the first audio similarity threshold;

and if the similarity between any frame in the adjacent frame set and the audio frame to be analyzed reaches a first audio similarity threshold value, determining that audio offset occurs in the audio data to be analyzed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A coding and decoding quality testing method is characterized by comprising the following steps:

and if the image similarity reaches an image similarity threshold and the audio similarity reaches a first audio similarity threshold, determining that the encoding and decoding quality of the product to be tested is qualified.

2. The codec quality testing method of claim 1,

if the test request is a decoding quality test request, the audio data to be analyzed and the video data to be analyzed are output data after the audio and video composite data are directly decoded by the product to be tested;

and if the test request is a coding quality test request, the audio data to be analyzed and the video data to be analyzed are output data after the audio and video composite data are sequentially decoded, coded and decoded by the product to be tested.

3. The codec quality testing method of claim 1, wherein the method further comprises:

and if the image similarity does not reach the image similarity threshold, performing image quality problem detection on the video frame to be detected.

4. The codec quality testing method according to claim 3, wherein the step of detecting the image quality problem of the video frame to be tested comprises:

5. The codec quality testing method according to claim 3, wherein the step of detecting the image quality problem of the video frame to be tested comprises:

dividing the video frame to be detected into a plurality of image sub-areas, and dividing a sample video frame with the same time stamp as the video frame to be detected into sample image sub-areas with the same number as the image sub-areas;

6. The codec quality testing method of claim 1, wherein the method further comprises:

7. The codec quality testing method of claim 6, wherein the step of performing audio quality problem detection on the audio frame to be tested comprises:

if the audio similarity reaches the second audio similarity threshold, determining that the audio frame to be detected is a noise-doped audio frame; wherein the second audio similarity threshold is lower than the first audio similarity threshold.

8. The codec quality testing method of claim 6, wherein the step of performing audio quality problem detection on the audio frame to be tested further comprises:

if the audio similarity of the audio frame to be detected does not reach a second audio similarity threshold value, comparing the volume value of the audio frame to be detected with a volume threshold value; wherein the second audio similarity threshold is lower than the first audio similarity threshold;

and if the volume value of the audio frame to be tested does not reach the volume threshold value, determining that the audio frame to be tested is a mute audio frame.

9. The codec quality testing method of claim 6, wherein the step of performing audio quality problem detection on the audio frame to be tested further comprises:

if the audio similarity does not reach a second audio similarity threshold value, comparing the adjacent frame set of the sample audio frame with the audio frame to be detected; wherein the second audio similarity threshold is lower than the first audio similarity threshold;

and if the similarity between any frame in the adjacent frame set and the audio frame to be analyzed reaches the first audio similarity threshold value, determining that audio offset occurs in the audio data to be analyzed.

10. An apparatus for testing encoding and decoding quality, the apparatus comprising:

11. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.