CN110830793A

CN110830793A - Video transmission quality time domain detection method based on deep learning frequency scale identification

Info

Publication number: CN110830793A
Application number: CN201911104622.8A
Authority: CN
Inventors: 刘桂雄; 蒋晨杰
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2020-02-21
Anticipated expiration: 2039-11-13
Also published as: CN110830793B

Abstract

The invention discloses a video transmission quality time domain detection method based on deep learning frequency scale identification, which comprises the following steps: making a video for detecting video transmission time domain indexes, and calibrating a serial number and a check number of each video frame at a specific position of the video to be used as a video label, namely a frequency label for short; training an SSD target detection network, and using a video frame as the input of the SSD target detection network for detecting each target and a target frame in a frequency scale; extracting a serial number and a check number from the detected target and the target frame, wherein the serial number is used for positioning a video frame, and the check number is used for checking whether the frequency marker identification is wrong; in one detection, video frames of a video transmission sending end and a video transmission receiving end are extracted simultaneously, and are respectively input into an SSD target detection network, respective frequency labels are extracted, and whether picture freezing exists or not, picture freezing time is calculated, and picture delay is calculated.

Description

Video transmission quality time domain detection method based on deep learning frequency scale identification

Technical Field

The invention relates to the field of target detection, in particular to a video transmission quality time domain detection method based on deep learning frequency scale identification.

Background

In the process of video transmission, due to various reasons such as network conditions, channel quality, cache and the like, picture freezing and picture delay at a receiving end can be generated, the picture freezing can influence the experience of a user for watching the video, and in a specific scene such as real-time video call, the picture delay needs to be avoided as much as possible, so that the method is very important for time domain detection of the picture freezing and the picture delay in the video transmission. Most of the existing video transmission quality detection methods evaluate the video transmission quality based on the image quality, and the technical research on the aspect of video transmission quality time domain detection focuses on the connection of packet loss, frame loss and image distortion and the judgment of picture freezing by using time domain image context. The former can not fully embody the picture freezing and picture delay performance of video transmission in the time domain; and the latter is difficult to calculate the picture freeze time and the picture delay time. Therefore, the time domain detection method capable of efficiently, accurately and intelligently evaluating the video transmission quality has important practical significance.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide a video transmission quality time domain detection method based on deep learning frequency scale identification.

The purpose of the invention is realized by the following technical scheme:

a video transmission quality time domain detection method based on deep learning frequency scale identification comprises the following steps:

a, making a video for detecting video transmission time domain indexes, and calibrating each video frame sequence number N at a specific position of the video_sAnd check number N_cAs a video tag, frequency standard for short;

b, training the SSD target detection network, and using the video frame as the input of the SSD target detection network for detecting each target in the frequency scale

And an object frame

j is 1,2,3, … …, n, n is the total number of detected targets;

c from detection

And

is extracted from

And

for locating video frames,

The frequency scale recognition module is used for verifying whether the frequency scale recognition is wrong;

d, in one detection, simultaneously extracting video frames of a video transmission sending end and a video transmission receiving end, respectively inputting the video frames into the SSD target detection network, extracting respective frequency marks, judging whether a picture is frozen or not, and calculating picture freezing time T_fCalculating the picture delay T_d。

Compared with the prior art, the invention has the beneficial effects that:

the method provided by the invention can efficiently, accurately and intelligently evaluate the video transmission quality.

Drawings

Fig. 1 is a flowchart of a video transmission quality time domain detection method based on deep learning frequency scale identification.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.

As shown in fig. 1, a time domain detection method for video transmission quality based on deep learning frequency scale identification includes the following steps:

step 10, making a video for detecting the time domain index of video transmission, and calibrating the serial number N of each video frame at a specific position of the video_sAnd check number N_cAs a video tag, frequency standard for short;

step 20, training the SSD target detection network, and using the video frame as the input of the SSD target detection network for detecting each target in the frequency scale

And an object frame

j is 1,2,3, … …, n, n is the total number of detected targets;

step 30 from the detection

And

is extracted from

And

for locating video frames,

step 40, in one detection, video frames of a video transmission sending end and a video transmission receiving end are simultaneously extracted and respectively input into the SSD target detection network, respective frequency labels are extracted, whether a picture is frozen or not is judged, and picture freezing time T is calculated_fCalculating the picture delay T_d。

The step 10 specifically includes: the characters in the frequency scale are horizontally arranged from left to right, and the frequency scale comprises a serial number N_sAnd check number N_cWherein: the sequence number of the i-th frame has a character "S" as a start identifier, N_s,iI, the character "C" is used as an end identifier; the check number of the i-th frame has an end identifier "C" of the sequence number as a start identifier, N_c,iThe sum of each digit of the number i and the number of digits, and the character 'E' is used as an ending identifier; the serial number and the check number are arranged in the same row and continuously, and the serial number is before the check number.

The step 20 specifically includes: the data set of the SSD object detection network is a partial video frame image, and each element and its area in the image frequency standard are marked when the data set is created, and each element type includes characters "S", "C", "E", "0", "1", "2", "3", "4", "5", "6", "7", "8", and "9", including background, and the network has 14 object detection types in total. Because the video transmission process may bring distortion such as noise, color, blur, contrast and the like, the data enhancement is realized by randomly adding distortion interferences of different types and different degrees to the image during training. Inputting video frames into an SSD target detection network for calculation during training and detection to obtain candidate results, performing non-maximum suppression on the candidate results, eliminating repeated detection on the same target, and sequencing according to the horizontal coordinate of the center of a target frame from small to large to obtain a target prediction result corresponding to the video frames

And target frame prediction results

The step 30 specifically includes:the center coordinates of (a) are:

screening out the targets with the categories of S, C and E, and if the number of the targets with the categories of S, C and E is 1, recording the S-th target

Is the character "S", the c < th > target

Is the character "C", the e-th target

Is the character "E", if s<c, then

Objects corresponding to each digit

Satisfies the following conditions:

the order is from small to large for j.

If c is<e, then

Objects corresponding to each digit

Satisfies the following conditions:

the order is from small to large for j.

The step 30 specifically includes: suppose that

Each digit being n₁,n₂,……,n_kK is

And (4) digit, the condition that the frequency standard identification needs to meet after verification:

the frequency scale identification cannot pass the verification under the condition that any one of the following conditions is met:

and if the frequency mark identification passes the verification, continuing the subsequent detection, and if the frequency mark identification fails the verification, abandoning the video frame detection.

The step 40 specifically includes: assuming that the picture freezing time threshold is theta, the detection frame rate is f₁Picture freeze occurs when the following conditions are met:

picture freezing time T_fComprises the following steps:

the step 40 specifically includes: simultaneously extracting video frames of a video transmission sending end and a video transmission receiving end, respectively inputting the video frames into an SSD target detection network, and identifying a sending end video frame serial number

And the receiving end video frame sequence number

Assume video frame rate is f₂Then the picture is delayed by T_dComprises the following steps:

although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A video transmission quality time domain detection method based on deep learning frequency scale identification is characterized by comprising the following steps:

a, making a video for detecting video transmission time domain indexes, and calibrating each video frame sequence number N at a specific position of the video_sAnd check number N_cAs a video tag;

And an object frame

n is the total number of detected targets;

c from detection

And

is extracted from

And

for locating video frames,

d, in one detection, simultaneously extracting video frames of a video transmission sending end and a video transmission receiving end, respectively inputting the video frames into the SSD target detection network, extracting respective frequency marks, and judgingWhether the picture is frozen or not and calculating the picture freezing time T_fCalculating the picture delay T_d。

2. The video transmission quality time domain detection method based on deep learning frequency scale identification as claimed in claim 1, wherein in step a, characters in the frequency scale are horizontally arranged from left to right, and the frequency scale comprises a serial number N_sAnd check number N_cWherein: the sequence number of the i-th frame has a character "S" as a start identifier, N_s,iI, the character "C" is used as an end identifier; the check number of the i-th frame has an end identifier "C" of the sequence number as a start identifier, N_c,iThe sum of each digit of the number i and the number of digits, and the character 'E' is used as an ending identifier; the serial number and the check number are arranged in the same row and continuously, and the serial number is before the check number.

3. The video transmission quality time-domain detection method based on deep learning frequency standard identification according to claim 1, wherein in step B, the data set of the SSD object detection network is a partial video frame image, and the data set is made by labeling each element and the region in the image frequency standard, each element category includes characters "S", "C", "E", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", including the background, and the network has 14 object detection categories; inputting video frames into an SSD target detection network for calculation during training and detection to obtain candidate results, performing non-maximum suppression on the candidate results, eliminating repeated detection on the same target, and sequencing according to the horizontal coordinate of the center of a target frame from small to large to obtain a target prediction result corresponding to the video frames

And target frame prediction results

4. The method of claim 1The video transmission quality time domain detection method based on deep learning frequency scale identification is characterized in that in the step C,the center coordinates of (a) are:

screening out the targets with the categories of S, C and E, and if the number of the targets with the categories of S, C and E is 1, recording the S-th targetIs the character "S", the c < th > target

Is the character "C", the e-th target

Is the character "E", if s<c, then

Objects corresponding to each digitSatisfies the following conditions:

the sequence is arranged from small to large according to j;

if c is<e, then

Objects corresponding to each digit

Satisfies the following conditions:

the order is from small to large for j.

5. The video transmission quality time-domain detection method based on deep learning frequency scale identification as claimed in claim 1, wherein in the step C, it is assumed that

Each digit being n₁,n₂,……,n_kK is

6. The temporal detection method for video transmission quality based on deep learning frequency scale identification as claimed in claim 1, wherein in step D, assuming the picture freezing time threshold is θ, the detection frame rate is f₁Picture freeze occurs when the following conditions are met:

picture freezing time T_fComprises the following steps:

7. the video transmission quality time domain detection method based on deep learning frequency standard identification as claimed in claim 1, wherein in step D, video frames of a video transmission sending end and a video transmission receiving end are extracted at the same time and input into the SSD object detection network respectively to identify a sending end video frame sequence number

And the receiving end video frame sequence numberAssume video frame rate is f₂Then the picture is delayed by T_dComprises the following steps: