CN112153370B - Video action quality evaluation method and system based on group sensitivity contrast regression - Google Patents

Video action quality evaluation method and system based on group sensitivity contrast regression Download PDF

Info

Publication number
CN112153370B
CN112153370B CN202010857886.7A CN202010857886A CN112153370B CN 112153370 B CN112153370 B CN 112153370B CN 202010857886 A CN202010857886 A CN 202010857886A CN 112153370 B CN112153370 B CN 112153370B
Authority
CN
China
Prior art keywords
score
video
regression
example video
difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010857886.7A
Other languages
Chinese (zh)
Other versions
CN112153370A (en
Inventor
鲁继文
周杰
饶永铭
于旭敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010857886.7A priority Critical patent/CN112153370B/en
Publication of CN112153370A publication Critical patent/CN112153370A/en
Application granted granted Critical
Publication of CN112153370B publication Critical patent/CN112153370B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video motion quality evaluation method and system based on group-sensitive contrast regression, wherein the method comprises the following steps: selecting a corresponding example video and an example video score according to the current video; respectively extracting space-time characteristics of a current video and an example video by using a deep learning model, and constructing combined characteristics; and constructing a cluster sensitive regression tree network, regressing the combined characteristics to obtain a final difference score, and combining the final difference score with the example video score to obtain a current video score. According to the method, the final target action score is obtained by modeling the difference between the target action and the example action, and the action quality evaluation accuracy of the model is improved.

Description

Video action quality evaluation method and system based on group sensitivity contrast regression
Technical Field
The invention relates to the technical field of computer vision and deep learning, in particular to a video motion quality evaluation method and system based on group-sensitive contrast regression.
Background
Video quality of Action Assessment (AQA), which is intended to assess the performance of a particular Action, has received increasing attention in recent years because it plays a vital role in many real-world applications, including sports and healthcare. Unlike conventional motion analysis tasks such as motion detection and recognition, AQA is more challenging because it requires prediction of fine-grained scores from videos containing the same category of motion. Considering the difference between the difference of different videos themselves and their motion scores, we consider the key to solving this problem to find the difference between videos and predict scores from the difference.
In recent years, most are based on regression algorithms, where scores are predicted directly from a single video. Despite some promising results, AQA still faces two challenges: first, since score labels are typically annotated by human judges (e.g., scores for a diving game are calculated by aggregating scores from different judges), subjective assessment of judges is difficult to predict accurately for scores; second, the difference between videos for AQA tasks is very small, as actors typically perform the same actions in similar environments.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present invention is to provide a video motion quality evaluation method based on group-sensitive contrast regression, which improves the motion quality evaluation accuracy of the model.
Another objective of the present invention is to provide a video motion quality evaluation system based on group-sensitive contrast regression.
In order to achieve the above object, an embodiment of the present invention provides a video motion quality evaluation method based on group-sensitive contrast regression, including the following steps: step S1, selecting a corresponding example video and an example video score according to the current video; step S2, performing space-time feature extraction on the current video and the example video respectively by using a deep learning model, and constructing a merging feature; step S3, a group-sensitive regression tree network is constructed, the merged features are regressed to obtain final difference scores, and the final difference scores are combined with the example video scores to obtain current video scores.
The video action quality evaluation method based on the group-sensitive comparison regression provided by the embodiment of the invention provides a learning method of the comparison regression, and the action quality evaluation problem is modeled into a regression problem of the score difference between the regression current video and the example video, so that the action quality evaluation accuracy of the model is improved; meanwhile, a population-sensitive regression tree structure is constructed, and the traditional fractional regression is converted into two simpler sub-problems: the interpretability and evaluation capability of the regressor are improved from coarse classification to fine classification and inter-cell regression.
In addition, the video motion quality evaluation method based on the group-sensitive contrast regression according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the step S2 includes: and respectively carrying out space-time information coding on the current video and the example video through the deep learning model, splicing the current video and the example video in a feature dimension, adding the example video score, and forming the merging feature together.
Further, in an embodiment of the present invention, each leaf node in the cluster-sensitive regression tree network represents a preset difference score interval, and samples in each interval are balanced.
Further, in an embodiment of the present invention, in the cluster-sensitive regression tree network, a cluster sensitivity analysis is performed on each leaf node to obtain a classification probability and a relative position in a group.
In order to achieve the above object, another embodiment of the present invention provides a video motion quality evaluation system based on group-sensitive contrast regression, including: a selection module for selecting a corresponding example video and an example video score according to a current video; the extraction module is used for respectively extracting the space-time characteristics of the current video and the example video by utilizing a deep learning model and constructing combined characteristics; and the regression and score combination module is used for constructing a group sensitive regression tree network, performing regression on the merged features to obtain a final difference score, and combining the final difference score with the example video score to obtain the current video score.
According to the video action quality evaluation system based on the group-sensitive comparison regression, a learning method of the comparison regression is provided, an action quality evaluation problem is modeled into a regression problem of the score difference between a regression current video and an example video, and the action quality evaluation accuracy of a model is improved; meanwhile, a population-sensitive regression tree structure is constructed, and the traditional fractional regression is converted into two simpler sub-problems: the interpretability and evaluation capability of the regressor are improved from coarse classification to fine classification and inter-cell regression.
In addition, the video motion quality evaluation system based on the group-sensitive contrast regression according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the extracting module is specifically configured to: and respectively carrying out space-time information coding on the current video and the example video through the deep learning model, splicing the current video and the example video in a feature dimension, adding the example video score, and forming the merging feature together.
Further, in an embodiment of the present invention, each leaf node in the cluster-sensitive regression tree network represents a preset difference score interval, and samples in each interval are balanced.
Further, in an embodiment of the present invention, in the cluster-sensitive regression tree network, a cluster sensitivity analysis is performed on each leaf node to obtain a classification probability and a relative position in a group.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a method for evaluating motion quality of a video based on cluster-sensitive contrast regression according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating the detailed operation of a method for evaluating the motion quality of a video based on cluster-sensitive contrast regression according to an embodiment of the present invention;
FIG. 3 is a diagram of a group-sensitive regression tree structure, according to one embodiment of the present invention;
fig. 4 is a schematic structural diagram of a video motion quality evaluation system based on group-sensitive contrast regression according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Rather than learning directly a score for which prediction is unknown, the present invention re-models the AQA problem as returning difference scores with reference to other videos with the same attributes (like sports of a genre or videos with the same difficulty level). By introducing examples for score prediction, the regressor will reference the known scores given by human officials and encourage them to predict the current video score from subtle differences between the current video and the examples.
The following describes a video motion quality evaluation method and system based on group-sensitive contrast regression according to an embodiment of the present invention with reference to the drawings, and first, a video motion quality evaluation method based on group-sensitive contrast regression according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 1 is a flowchart of a video motion quality evaluation method based on group-sensitive contrast regression according to an embodiment of the present invention.
As shown in fig. 1, the method for evaluating video motion quality based on cluster-sensitive contrast regression includes the following steps:
in step S1, the corresponding example video and example video score are selected according to the current video.
Specifically, a current input video is acquired, a corresponding example video and a score of the example video are selected for the current input video, and preparation is made for later calculation.
In step S2, spatio-temporal feature extraction is performed on the current video and the example video using the deep learning model, respectively, and merged features are constructed.
Further, in one embodiment of the present invention, step S2 includes:
and respectively carrying out space-time information coding on the current video and the example video through a deep learning model, splicing the current video and the example video in a feature dimension, and adding an example video score to form a merging feature together.
Specifically, as shown in fig. 2, to model contrast difference information between a current video and a target video, in the embodiment of the present invention, two segments of video are input into a pre-trained deep learning model (e.g., I3D) to perform spatio-temporal information coding, and f of spatio-temporal features of the current input video is extracted1Extracting spatio-temporal features f of the example video2(the current input video and the example video share the weight with each other in the extraction process), the current video and the example video are spliced in the characteristic dimension, and simultaneously,and adding the example video scores during splicing to finally obtain the merging characteristics.
It should be noted that, during training, for each current video, the present invention randomly selects one video from the eligible example videos for comparison regression. During testing, N sample videos meeting the conditions are randomly selected, comparison regression is carried out one by one, and finally N evaluation results are averaged to obtain the final prediction evaluation result.
In step S3, a group-sensitive regression tree network is constructed, the merged features are regressed to obtain a final difference score, and the final difference score is combined with the example video score to obtain a current video score.
Further, in one embodiment of the present invention, each leaf node in the cluster-sensitive regression tree network represents a predetermined variance score interval, and the samples in each interval are balanced.
Further, in an embodiment of the present invention, a group sensitivity analysis is performed on each leaf node in the group-sensitive regression tree network to obtain a classification probability and a relative position in the group.
Specifically, as shown in fig. 3, in order to meet the nature of contrast and improve the interpretability of the deep learning model, the embodiment of the present invention designs a regression tree network in the form of a binary tree, that is, a group-sensitive regression tree network, and inputs the merged features in step S2 into the group-sensitive regression tree network for regression, so as to obtain the difference score.
Further, as shown in fig. 2, after obtaining the difference score, first, the whole interval of the difference score is distributed, so that each leaf node of the regression tree represents a specific difference score interval, and the sample balance in each interval is ensured. For each node of the regression tree, a comparison of the difference score to the threshold of the node is made once, resulting in a two-class, i.e. "greater" or "less". The split probabilities of each layer are multiplied to obtain the probability of a final leaf node. The leaf node with the maximum probability is taken out, so that the difference score can be restricted to a specific subinterval from the whole subinterval. Finally, the score regression in the cells is carried out, and the final difference score can be obtained.
And finally, obtaining a video score difference through regression through the output of each leaf node of the regression tree, and finally obtaining an accurate current video score by combining the score of the example video.
The video action quality evaluation method based on the group-sensitive contrast regression provided by the embodiment of the invention is based on the contrast learning strategy (contrasting) in the metric learning literature, provides a learning method of the contrast regression, and models the action quality evaluation problem into a regression problem of the score difference between the regression current video and the example video, so that the action quality evaluation accuracy of the model is improved; meanwhile, a population-sensitive regression tree structure is constructed, and the traditional fractional regression is converted into two simpler sub-problems: the interpretability and evaluation capability of the regressor are improved from coarse classification to fine classification and inter-cell regression.
Next, a video motion quality evaluation system based on group-sensitive contrast regression according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 4 is a schematic structural diagram of a video motion quality evaluation system based on group-sensitive contrast regression according to an embodiment of the present invention.
As shown in fig. 4, the system 10 includes: a selection module 100, an extraction module 200 and a regression and score combination module 300.
Wherein the selection module 100 is configured to select a corresponding example video and an example video score according to the current video. The extraction module 200 is configured to perform spatio-temporal feature extraction on the current video and the example video respectively by using a deep learning model, and construct a merged feature. The regression and score combination module 300 is configured to construct a crowd-sourced regression tree network, perform regression on the combined features to obtain a final difference score, and combine the final difference score with the example video score to obtain a current video score.
Further, in an embodiment of the present invention, the extraction module is specifically configured to: and respectively carrying out space-time information coding on the current video and the example video through a deep learning model, splicing the current video and the example video in a feature dimension, and adding an example video score to form a merging feature together.
Further, in one embodiment of the present invention, each leaf node in the cluster-sensitive regression tree network represents a predetermined variance score interval, and the samples in each interval are balanced.
Further, in an embodiment of the present invention, a group sensitivity analysis is performed on each leaf node in the group-sensitive regression tree network to obtain a classification probability and a relative position in the group.
According to the video action quality evaluation system based on the group-sensitive comparison regression, a learning method of the comparison regression is provided based on the comparison learning strategy in the metric learning literature, the action quality evaluation problem is modeled into a regression problem of the score difference between the regression current video and the example video, and the action quality evaluation accuracy of the model is improved; meanwhile, a population-sensitive regression tree structure is constructed, and the traditional fractional regression is converted into two simpler sub-problems: the interpretability and evaluation capability of the regressor are improved from coarse classification to fine classification and inter-cell regression.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (6)

1. A video motion quality evaluation method based on group-sensitive contrast regression is characterized by comprising the following steps:
step S1, selecting a corresponding example video and an example video score according to the current video;
step S2, performing spatio-temporal feature extraction on the current video and the example video respectively by using a deep learning model, and constructing a merged feature, which includes: respectively carrying out space-time information coding on the current video and the example video through the deep learning model, splicing the current video and the example video in a feature dimension, adding the example video score, and forming the merging feature together; and
step S3, a group-sensitive regression tree network is constructed, the merged features are regressed to obtain difference scores, after the difference scores are obtained, firstly, the whole range of the difference scores is distributed, each leaf node of the regression tree represents a specific difference score range, the sample balance in each range is ensured, for each node of the regression tree, comparing the difference score with the threshold value of the node once, the result is a two-class classification, the shunting probability of each layer is multiplied to obtain the final probability of the leaf node, the leaf node with the maximum probability is taken out, the difference score can be restricted to a specific subinterval from the whole interval, finally, the score regression in the subdistrict is carried out to obtain the final difference score, and combining the final difference score with the example video score to obtain a current video score.
2. The method of claim 1, wherein each leaf node in the cluster-sensitive regression tree network represents a predetermined difference score interval, and the samples in each interval are balanced.
3. The method of claim 2, wherein the cluster-sensitive regression tree network performs cluster sensitivity analysis on each leaf node to obtain a classification probability and a relative position in the group.
4. A video motion quality evaluation system based on group-sensitive contrast regression is characterized by comprising:
a selection module for selecting a corresponding example video and an example video score according to a current video;
an extraction module, configured to perform spatio-temporal feature extraction on the current video and the example video respectively by using a deep learning model, and construct a merged feature, where the extraction module includes: respectively carrying out space-time information coding on the current video and the example video through the deep learning model, splicing the current video and the example video in a feature dimension, adding the example video score, and forming the merging feature together; and
a regression and score combination module, configured to construct a cluster-sensitive regression tree network, perform regression on the merged features to obtain a difference score, after obtaining the difference score, first allocate the whole region of the difference score to make each leaf node of the regression tree represent a specific difference score region and ensure sample balance in each region, perform a comparison between the difference score and a threshold of the node once for each node of the regression tree, so that the result is a binary classification, multiply the split probability of each layer to obtain a final probability of the leaf node, extract the leaf node with the highest probability, constrain the difference score from the whole region to a specific subinterval, perform score regression in the subinterval to obtain a final difference score, and combine the final difference score with the example video score, and obtaining the current video score.
5. The system of claim 4, wherein each leaf node in the cluster-sensitive regression tree network represents a predetermined variance score interval, and the samples in each interval are balanced.
6. The system of claim 5, wherein the cluster-sensitive regression tree network performs cluster sensitivity analysis on each leaf node to obtain classification probability and relative position in the group.
CN202010857886.7A 2020-08-24 2020-08-24 Video action quality evaluation method and system based on group sensitivity contrast regression Active CN112153370B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010857886.7A CN112153370B (en) 2020-08-24 2020-08-24 Video action quality evaluation method and system based on group sensitivity contrast regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010857886.7A CN112153370B (en) 2020-08-24 2020-08-24 Video action quality evaluation method and system based on group sensitivity contrast regression

Publications (2)

Publication Number Publication Date
CN112153370A CN112153370A (en) 2020-12-29
CN112153370B true CN112153370B (en) 2021-12-24

Family

ID=73888289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010857886.7A Active CN112153370B (en) 2020-08-24 2020-08-24 Video action quality evaluation method and system based on group sensitivity contrast regression

Country Status (1)

Country Link
CN (1) CN112153370B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107318014A (en) * 2017-07-25 2017-11-03 西安电子科技大学 The video quality evaluation method of view-based access control model marking area and space-time characterisation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070283400A1 (en) * 2006-05-31 2007-12-06 Minkyu Lee Method and apparatus for performing real-time on-line video quality monitoring for digital cable and IPTV services
US9277208B2 (en) * 2013-11-12 2016-03-01 Oovoo, Llc System and method for estimating quality of video with frame freezing artifacts
CN108235001B (en) * 2018-01-29 2020-07-10 上海海洋大学 Deep sea video quality objective evaluation method based on space-time characteristics
CN108989802B (en) * 2018-08-14 2020-05-19 华中科技大学 HEVC video stream quality estimation method and system by utilizing inter-frame relation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107318014A (en) * 2017-07-25 2017-11-03 西安电子科技大学 The video quality evaluation method of view-based access control model marking area and space-time characterisation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Uncertainty-aware Score Distribution Learning for Action Quality Assessment;Yansong Tang;《ieee》;20200805;第1-10页 *

Also Published As

Publication number Publication date
CN112153370A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
Vaicenavicius et al. Evaluating model calibration in classification
JP7104244B2 (en) User tag generation method and its devices, computer programs and computer equipment
Menon et al. The cost of fairness in binary classification
US11282295B2 (en) Image feature acquisition
Sedlmair et al. Data‐driven evaluation of visual quality measures
Venanzi et al. Community-based bayesian aggregation models for crowdsourcing
Webb et al. On the application of ROC analysis to predict classification performance under varying class distributions
Raghu et al. Evaluation of causal structure learning methods on mixed data types
Hughes et al. Effective split-merge monte carlo methods for nonparametric models of sequential data
CN111784121B (en) Action quality evaluation method based on uncertainty score distribution learning
JP5214760B2 (en) Learning apparatus, method and program
JP2017102906A (en) Information processing apparatus, information processing method, and program
WO2020258598A1 (en) Image processing method, proposal evaluation method, and related device
Polat et al. A hybrid medical decision making system based on principles component analysis, k-NN based weighted pre-processing and adaptive neuro-fuzzy inference system
CN111046930A (en) Power supply service satisfaction influence factor identification method based on decision tree algorithm
Melhart et al. A study on affect model validity: Nominal vs ordinal labels
Daniel Loyal et al. A Bayesian nonparametric latent space approach to modeling evolving communities in dynamic networks
Waegeman et al. On the scalability of ordered multi-class ROC analysis
Tolochko et al. Same but different: A comparison of estimation approaches for exponential random graph models for multiple networks
CN112153370B (en) Video action quality evaluation method and system based on group sensitivity contrast regression
CN115730248A (en) Machine account detection method, system, equipment and storage medium
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
Manoju et al. Conductivity based agglomerative spectral clustering for community detection
CN113868597A (en) Regression fairness measurement method for age estimation
CN114330090A (en) Defect detection method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant