CN110830793A - Video transmission quality time domain detection method based on deep learning frequency scale identification - Google Patents

Video transmission quality time domain detection method based on deep learning frequency scale identification Download PDF

Info

Publication number
CN110830793A
CN110830793A CN201911104622.8A CN201911104622A CN110830793A CN 110830793 A CN110830793 A CN 110830793A CN 201911104622 A CN201911104622 A CN 201911104622A CN 110830793 A CN110830793 A CN 110830793A
Authority
CN
China
Prior art keywords
video
target
video transmission
detection
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911104622.8A
Other languages
Chinese (zh)
Other versions
CN110830793B (en
Inventor
刘桂雄
蒋晨杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201911104622.8A priority Critical patent/CN110830793B/en
Publication of CN110830793A publication Critical patent/CN110830793A/en
Application granted granted Critical
Publication of CN110830793B publication Critical patent/CN110830793B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • H04N17/004Diagnosis, testing or measuring for television systems or their details for digital television systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video transmission quality time domain detection method based on deep learning frequency scale identification, which comprises the following steps: making a video for detecting video transmission time domain indexes, and calibrating a serial number and a check number of each video frame at a specific position of the video to be used as a video label, namely a frequency label for short; training an SSD target detection network, and using a video frame as the input of the SSD target detection network for detecting each target and a target frame in a frequency scale; extracting a serial number and a check number from the detected target and the target frame, wherein the serial number is used for positioning a video frame, and the check number is used for checking whether the frequency marker identification is wrong; in one detection, video frames of a video transmission sending end and a video transmission receiving end are extracted simultaneously, and are respectively input into an SSD target detection network, respective frequency labels are extracted, and whether picture freezing exists or not, picture freezing time is calculated, and picture delay is calculated.

Description

Video transmission quality time domain detection method based on deep learning frequency scale identification
Technical Field
The invention relates to the field of target detection, in particular to a video transmission quality time domain detection method based on deep learning frequency scale identification.
Background
In the process of video transmission, due to various reasons such as network conditions, channel quality, cache and the like, picture freezing and picture delay at a receiving end can be generated, the picture freezing can influence the experience of a user for watching the video, and in a specific scene such as real-time video call, the picture delay needs to be avoided as much as possible, so that the method is very important for time domain detection of the picture freezing and the picture delay in the video transmission. Most of the existing video transmission quality detection methods evaluate the video transmission quality based on the image quality, and the technical research on the aspect of video transmission quality time domain detection focuses on the connection of packet loss, frame loss and image distortion and the judgment of picture freezing by using time domain image context. The former can not fully embody the picture freezing and picture delay performance of video transmission in the time domain; and the latter is difficult to calculate the picture freeze time and the picture delay time. Therefore, the time domain detection method capable of efficiently, accurately and intelligently evaluating the video transmission quality has important practical significance.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a video transmission quality time domain detection method based on deep learning frequency scale identification.
The purpose of the invention is realized by the following technical scheme:
a video transmission quality time domain detection method based on deep learning frequency scale identification comprises the following steps:
a, making a video for detecting video transmission time domain indexes, and calibrating each video frame sequence number N at a specific position of the videosAnd check number NcAs a video tag, frequency standard for short;
b, training the SSD target detection network, and using the video frame as the input of the SSD target detection network for detecting each target in the frequency scale
Figure BDA0002270919010000011
And an object frame
Figure BDA0002270919010000012
j is 1,2,3, … …, n, n is the total number of detected targets;
c from detection
Figure BDA0002270919010000021
And
Figure BDA0002270919010000022
is extracted from
Figure BDA0002270919010000023
And
Figure BDA0002270919010000024
Figure BDA0002270919010000025
for locating video frames,
Figure BDA0002270919010000026
The frequency scale recognition module is used for verifying whether the frequency scale recognition is wrong;
d, in one detection, simultaneously extracting video frames of a video transmission sending end and a video transmission receiving end, respectively inputting the video frames into the SSD target detection network, extracting respective frequency marks, judging whether a picture is frozen or not, and calculating picture freezing time TfCalculating the picture delay Td
Compared with the prior art, the invention has the beneficial effects that:
the method provided by the invention can efficiently, accurately and intelligently evaluate the video transmission quality.
Drawings
Fig. 1 is a flowchart of a video transmission quality time domain detection method based on deep learning frequency scale identification.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
As shown in fig. 1, a time domain detection method for video transmission quality based on deep learning frequency scale identification includes the following steps:
step 10, making a video for detecting the time domain index of video transmission, and calibrating the serial number N of each video frame at a specific position of the videosAnd check number NcAs a video tag, frequency standard for short;
step 20, training the SSD target detection network, and using the video frame as the input of the SSD target detection network for detecting each target in the frequency scale
Figure BDA0002270919010000027
And an object frame
Figure BDA0002270919010000028
j is 1,2,3, … …, n, n is the total number of detected targets;
step 30 from the detection
Figure BDA0002270919010000029
And
Figure BDA00022709190100000210
is extracted from
Figure BDA00022709190100000211
And
Figure BDA00022709190100000212
Figure BDA00022709190100000213
for locating video frames,
Figure BDA00022709190100000214
The frequency scale recognition module is used for verifying whether the frequency scale recognition is wrong;
step 40, in one detection, video frames of a video transmission sending end and a video transmission receiving end are simultaneously extracted and respectively input into the SSD target detection network, respective frequency labels are extracted, whether a picture is frozen or not is judged, and picture freezing time T is calculatedfCalculating the picture delay Td
The step 10 specifically includes: the characters in the frequency scale are horizontally arranged from left to right, and the frequency scale comprises a serial number NsAnd check number NcWherein: the sequence number of the i-th frame has a character "S" as a start identifier, Ns,iI, the character "C" is used as an end identifier; the check number of the i-th frame has an end identifier "C" of the sequence number as a start identifier, Nc,iThe sum of each digit of the number i and the number of digits, and the character 'E' is used as an ending identifier; the serial number and the check number are arranged in the same row and continuously, and the serial number is before the check number.
The step 20 specifically includes: the data set of the SSD object detection network is a partial video frame image, and each element and its area in the image frequency standard are marked when the data set is created, and each element type includes characters "S", "C", "E", "0", "1", "2", "3", "4", "5", "6", "7", "8", and "9", including background, and the network has 14 object detection types in total. Because the video transmission process may bring distortion such as noise, color, blur, contrast and the like, the data enhancement is realized by randomly adding distortion interferences of different types and different degrees to the image during training. Inputting video frames into an SSD target detection network for calculation during training and detection to obtain candidate results, performing non-maximum suppression on the candidate results, eliminating repeated detection on the same target, and sequencing according to the horizontal coordinate of the center of a target frame from small to large to obtain a target prediction result corresponding to the video frames
Figure BDA0002270919010000031
And target frame prediction results
The step 30 specifically includes:the center coordinates of (a) are:
Figure BDA0002270919010000034
screening out the targets with the categories of S, C and E, and if the number of the targets with the categories of S, C and E is 1, recording the S-th target
Figure BDA0002270919010000035
Is the character "S", the c < th > target
Figure BDA0002270919010000036
Is the character "C", the e-th target
Figure BDA0002270919010000037
Is the character "E", if s<c, then
Figure BDA0002270919010000038
Objects corresponding to each digit
Figure BDA0002270919010000039
Satisfies the following conditions:
Figure BDA00022709190100000310
the order is from small to large for j.
If c is<e, then
Figure BDA00022709190100000311
Objects corresponding to each digit
Figure BDA00022709190100000312
Satisfies the following conditions:
Figure BDA0002270919010000041
the order is from small to large for j.
The step 30 specifically includes: suppose that
Figure BDA0002270919010000042
Each digit being n1,n2,……,nkK is
Figure BDA0002270919010000043
And (4) digit, the condition that the frequency standard identification needs to meet after verification:
Figure BDA0002270919010000044
the frequency scale identification cannot pass the verification under the condition that any one of the following conditions is met:
Figure BDA0002270919010000045
and if the frequency mark identification passes the verification, continuing the subsequent detection, and if the frequency mark identification fails the verification, abandoning the video frame detection.
The step 40 specifically includes: assuming that the picture freezing time threshold is theta, the detection frame rate is f1Picture freeze occurs when the following conditions are met:
Figure BDA0002270919010000046
picture freezing time TfComprises the following steps:
Figure BDA0002270919010000047
the step 40 specifically includes: simultaneously extracting video frames of a video transmission sending end and a video transmission receiving end, respectively inputting the video frames into an SSD target detection network, and identifying a sending end video frame serial number
Figure BDA0002270919010000048
And the receiving end video frame sequence number
Figure BDA0002270919010000049
Assume video frame rate is f2Then the picture is delayed by TdComprises the following steps:
Figure BDA00022709190100000410
although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A video transmission quality time domain detection method based on deep learning frequency scale identification is characterized by comprising the following steps:
a, making a video for detecting video transmission time domain indexes, and calibrating each video frame sequence number N at a specific position of the videosAnd check number NcAs a video tag;
b, training the SSD target detection network, and using the video frame as the input of the SSD target detection network for detecting each target in the frequency scale
Figure FDA0002270915000000011
And an object frame
Figure FDA0002270915000000012
n is the total number of detected targets;
c from detection
Figure FDA0002270915000000013
And
Figure FDA0002270915000000014
is extracted from
Figure FDA0002270915000000015
And
Figure FDA0002270915000000016
Figure FDA0002270915000000017
for locating video frames,
Figure FDA0002270915000000018
The frequency scale recognition module is used for verifying whether the frequency scale recognition is wrong;
d, in one detection, simultaneously extracting video frames of a video transmission sending end and a video transmission receiving end, respectively inputting the video frames into the SSD target detection network, extracting respective frequency marks, and judgingWhether the picture is frozen or not and calculating the picture freezing time TfCalculating the picture delay Td
2. The video transmission quality time domain detection method based on deep learning frequency scale identification as claimed in claim 1, wherein in step a, characters in the frequency scale are horizontally arranged from left to right, and the frequency scale comprises a serial number NsAnd check number NcWherein: the sequence number of the i-th frame has a character "S" as a start identifier, Ns,iI, the character "C" is used as an end identifier; the check number of the i-th frame has an end identifier "C" of the sequence number as a start identifier, Nc,iThe sum of each digit of the number i and the number of digits, and the character 'E' is used as an ending identifier; the serial number and the check number are arranged in the same row and continuously, and the serial number is before the check number.
3. The video transmission quality time-domain detection method based on deep learning frequency standard identification according to claim 1, wherein in step B, the data set of the SSD object detection network is a partial video frame image, and the data set is made by labeling each element and the region in the image frequency standard, each element category includes characters "S", "C", "E", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", including the background, and the network has 14 object detection categories; inputting video frames into an SSD target detection network for calculation during training and detection to obtain candidate results, performing non-maximum suppression on the candidate results, eliminating repeated detection on the same target, and sequencing according to the horizontal coordinate of the center of a target frame from small to large to obtain a target prediction result corresponding to the video frames
Figure FDA0002270915000000019
And target frame prediction results
Figure FDA0002270915000000021
4. The method of claim 1The video transmission quality time domain detection method based on deep learning frequency scale identification is characterized in that in the step C,the center coordinates of (a) are:
screening out the targets with the categories of S, C and E, and if the number of the targets with the categories of S, C and E is 1, recording the S-th targetIs the character "S", the c < th > target
Figure FDA0002270915000000025
Is the character "C", the e-th target
Figure FDA0002270915000000026
Is the character "E", if s<c, then
Figure FDA0002270915000000027
Objects corresponding to each digitSatisfies the following conditions:
Figure FDA0002270915000000029
the sequence is arranged from small to large according to j;
if c is<e, then
Figure FDA00022709150000000210
Objects corresponding to each digit
Figure FDA00022709150000000211
Satisfies the following conditions:
Figure FDA00022709150000000212
the order is from small to large for j.
5. The video transmission quality time-domain detection method based on deep learning frequency scale identification as claimed in claim 1, wherein in the step C, it is assumed that
Figure FDA00022709150000000213
Each digit being n1,n2,……,nkK is
Figure FDA00022709150000000214
And (4) digit, the condition that the frequency standard identification needs to meet after verification:
Figure FDA00022709150000000215
the frequency scale identification cannot pass the verification under the condition that any one of the following conditions is met:
Figure FDA00022709150000000216
and if the frequency mark identification passes the verification, continuing the subsequent detection, and if the frequency mark identification fails the verification, abandoning the video frame detection.
6. The temporal detection method for video transmission quality based on deep learning frequency scale identification as claimed in claim 1, wherein in step D, assuming the picture freezing time threshold is θ, the detection frame rate is f1Picture freeze occurs when the following conditions are met:
Figure FDA0002270915000000031
picture freezing time TfComprises the following steps:
Figure FDA0002270915000000032
7. the video transmission quality time domain detection method based on deep learning frequency standard identification as claimed in claim 1, wherein in step D, video frames of a video transmission sending end and a video transmission receiving end are extracted at the same time and input into the SSD object detection network respectively to identify a sending end video frame sequence number
Figure FDA0002270915000000033
And the receiving end video frame sequence numberAssume video frame rate is f2Then the picture is delayed by TdComprises the following steps:
CN201911104622.8A 2019-11-13 2019-11-13 Video transmission quality time domain detection method based on deep learning frequency scale identification Active CN110830793B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911104622.8A CN110830793B (en) 2019-11-13 2019-11-13 Video transmission quality time domain detection method based on deep learning frequency scale identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911104622.8A CN110830793B (en) 2019-11-13 2019-11-13 Video transmission quality time domain detection method based on deep learning frequency scale identification

Publications (2)

Publication Number Publication Date
CN110830793A true CN110830793A (en) 2020-02-21
CN110830793B CN110830793B (en) 2021-09-03

Family

ID=69554480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911104622.8A Active CN110830793B (en) 2019-11-13 2019-11-13 Video transmission quality time domain detection method based on deep learning frequency scale identification

Country Status (1)

Country Link
CN (1) CN110830793B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909014A (en) * 2017-10-31 2018-04-13 天津大学 A kind of video understanding method based on deep learning
CN108460122A (en) * 2018-02-23 2018-08-28 武汉斗鱼网络科技有限公司 Video searching method, storage medium, equipment based on deep learning and system
CN108933935A (en) * 2017-05-22 2018-12-04 中兴通讯股份有限公司 Detection method, device, storage medium and the computer equipment of video communication system
US20190297276A1 (en) * 2018-03-20 2019-09-26 EndoVigilant, LLC Endoscopy Video Feature Enhancement Platform
CN110414574A (en) * 2019-07-10 2019-11-05 厦门美图之家科技有限公司 A kind of object detection method calculates equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108933935A (en) * 2017-05-22 2018-12-04 中兴通讯股份有限公司 Detection method, device, storage medium and the computer equipment of video communication system
CN107909014A (en) * 2017-10-31 2018-04-13 天津大学 A kind of video understanding method based on deep learning
CN108460122A (en) * 2018-02-23 2018-08-28 武汉斗鱼网络科技有限公司 Video searching method, storage medium, equipment based on deep learning and system
US20190297276A1 (en) * 2018-03-20 2019-09-26 EndoVigilant, LLC Endoscopy Video Feature Enhancement Platform
CN110414574A (en) * 2019-07-10 2019-11-05 厦门美图之家科技有限公司 A kind of object detection method calculates equipment and storage medium

Also Published As

Publication number Publication date
CN110830793B (en) 2021-09-03

Similar Documents

Publication Publication Date Title
US20220375225A1 (en) Video Segmentation Method and Apparatus, Device, and Medium
CN111161311A (en) Visual multi-target tracking method and device based on deep learning
US11914639B2 (en) Multimedia resource matching method and apparatus, storage medium, and electronic apparatus
CN111612763A (en) Mobile phone screen defect detection method, device and system, computer equipment and medium
CN110363220B (en) Behavior class detection method and device, electronic equipment and computer readable medium
CN108491845B (en) Character segmentation position determination method, character segmentation method, device and equipment
CN112766218B (en) Cross-domain pedestrian re-recognition method and device based on asymmetric combined teaching network
CN103413149B (en) Method for detecting and identifying static target in complicated background
CN112784724A (en) Vehicle lane change detection method, device, equipment and storage medium
CN109859166A (en) It is a kind of based on multiple row convolutional neural networks without ginseng 3D rendering method for evaluating quality
CN110610123A (en) Multi-target vehicle detection method and device, electronic equipment and storage medium
CN111666898A (en) Method and device for identifying class to which vehicle belongs
CN112132103A (en) Video face detection and recognition method and system
CN111401171A (en) Face image recognition method and device, electronic equipment and storage medium
CN104657721B (en) A kind of video OSD time recognition methods based on adaptive template
WO2023125119A1 (en) Spatio-temporal action detection method and apparatus, electronic device and storage medium
CN113763348A (en) Image quality determination method and device, electronic equipment and storage medium
CN110688873A (en) Multi-target tracking method and face recognition method
CN110830793B (en) Video transmission quality time domain detection method based on deep learning frequency scale identification
CN109829887B (en) Image quality evaluation method based on deep neural network
CN112149698A (en) Method and device for screening difficult sample data
CN115424253A (en) License plate recognition method and device, electronic equipment and storage medium
CN110956108B (en) Small frequency scale detection method based on characteristic pyramid
CN114550300A (en) Video data analysis method and device, electronic equipment and computer storage medium
CN112686844B (en) Threshold setting method, storage medium and system based on video quality inspection scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant