CN112613486A - Professional stereoscopic video comfort classification method based on multilayer attention and BiGRU - Google Patents
Professional stereoscopic video comfort classification method based on multilayer attention and BiGRU Download PDFInfo
- Publication number
- CN112613486A CN112613486A CN202110016985.7A CN202110016985A CN112613486A CN 112613486 A CN112613486 A CN 112613486A CN 202110016985 A CN202110016985 A CN 202110016985A CN 112613486 A CN112613486 A CN 112613486A
- Authority
- CN
- China
- Prior art keywords
- video
- frame
- layer
- attention
- stereoscopic video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000012545 processing Methods 0.000 claims abstract description 34
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000011218 segmentation Effects 0.000 claims abstract description 4
- 239000010410 layer Substances 0.000 claims description 58
- 230000006870 function Effects 0.000 claims description 28
- 230000002457 bidirectional effect Effects 0.000 claims description 22
- 238000004364 calculation method Methods 0.000 claims description 21
- 230000000007 visual effect Effects 0.000 claims description 20
- 238000013145 classification model Methods 0.000 claims description 17
- 125000004122 cyclic group Chemical group 0.000 claims description 9
- 239000002356 single layer Substances 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 230000002123 temporal effect Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 230000002779 inactivation Effects 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000010348 incorporation Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 7
- 206010052143 Ocular discomfort Diseases 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 206010067484 Adverse reaction Diseases 0.000 description 1
- 208000003556 Dry Eye Syndromes Diseases 0.000 description 1
- 206010013774 Dry eye Diseases 0.000 description 1
- 206010019233 Headaches Diseases 0.000 description 1
- 206010028813 Nausea Diseases 0.000 description 1
- 230000006838 adverse reaction Effects 0.000 description 1
- 208000003464 asthenopia Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 208000002173 dizziness Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 208000030533 eye disease Diseases 0.000 description 1
- 231100000869 headache Toxicity 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008693 nausea Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
The invention relates to a professional stereoscopic video comfort classification method based on multilayer attention and BiGRU. The method comprises the following steps: 1. carrying out scene segmentation on the training video set and the video set to be predicted and obtaining a disparity map through preprocessing; 2. performing frame level processing to obtain initial frame level characteristics; 3. performing frame level attention processing to obtain final frame level characteristics; 4. performing lens level processing to obtain a preliminary lens level characteristic; 5. performing lens level attention processing to obtain final lens level characteristics; 6. double-flow fusion, namely fusing the output of the previous step by using the attention of a channel to obtain a final hidden state; 7. the final hidden state outputs classification probabilities through a classification network and classifies the professional stereoscopic video as suitable for children to watch or only suitable for adults to watch. 8. And inputting the left view of the stereo video in the video set to be tested and the corresponding disparity map into the trained model for classification. The method can effectively distinguish whether the professional stereoscopic video is suitable for children to watch.
Description
Technical Field
The invention relates to the field of image and video processing and computer vision, in particular to a professional stereoscopic video comfort classification method based on multilayer attention and BiGRU.
Background
Stereoscopic video is also called 3D video, and unlike 2D video, the most important feature is depth information, so that the presentation of the landscape in the video is no longer limited to the screen. The vigorous development of the stereoscopic technology enables people to obtain better viewing experience and bring troubles, for example, people can feel dizzy, dry eyes, nausea and the like when watching uncomfortable stereoscopic videos for a long time, and the adverse reactions can attack the watching heat of the audiences and even influence the physiological health of the audiences. Therefore, how to evaluate the quality of the visual comfort of the stereoscopic image becomes a concern. One of the main factors affecting the visual comfort of the stereoscopic video is parallax, including excessive horizontal parallax, vertical parallax, and rapidly changing parallax, and the other main factor affecting the visual comfort of the stereoscopic video is video content, including a salient object in the video, a presentation manner of the video, and a motion of the object.
Although some good results are obtained by the current comfort evaluation methods, the binocular distance of the children is generally not considered in the work. For children, the binocular distance is narrower than that of adults, the binocular fusion mechanism is not mature as adults, and the parallax size imaged on the retina is different from that of adults, so that the stereoscopic perception of children is different from that of adults. Comfortable stereoscopic video for adults may not be suitable for children to watch. For children who have had eye disease, stereoscopic movies of visual discomfort can cause them to suffer from headaches, eye strain, and inability to see images clearly.
Disclosure of Invention
The invention aims to provide a professional stereoscopic video comfort level classification method based on multilayer attention and BiGRU, solves the problem that a current stereoscopic video comfort level evaluation algorithm does not consider children as audience objects, and can effectively distinguish whether professional stereoscopic videos are suitable for children to watch.
In order to achieve the purpose, the technical scheme of the invention is as follows: a professional stereoscopic video comfort classification method based on multilayer attention and BiGRU comprises the following steps:
step S1, carrying out scene segmentation on the training video set and the video set to be predicted and obtaining a disparity map through preprocessing;
step S2, frame level processing, namely, taking the left view of the stereoscopic video and the corresponding disparity map in the training video set as double-current input to carry out the frame level processing, and sensing the time sequence relation between frames in each shot from a plurality of time scales by using a time inference network;
step S3, frame level attention processing, wherein the final frame level characteristics are obtained by weighting and summing the time sequence relation between frames in each shot;
step S4, lens-level processing, namely, sensing frame-level characteristics of a plurality of continuous lenses by using a cyclic neural network bidirectional gating cyclic unit and outputting a hidden state set;
step S5, lens level attention processing, namely, carrying out weighted summation on the hidden state set output in the step S4 to obtain final lens level characteristics;
s6, fusing double streams, and fusing the lens level features output in the S5 by using a channel attention network to obtain a final hidden state;
s7, outputting classification probability through a classification network according to the final hidden state, classifying the professional stereo video into a model suitable for children to watch or only suitable for adults to watch, and obtaining a well-constructed professional stereo video visual comfort classification model from the step S2; training the professional stereoscopic video visual comfort degree classification model, learning the optimal parameters of the professional stereoscopic video visual comfort degree classification model by solving a minimum loss function in the training process, and storing the trained model;
and S8, inputting the left view of the video set to be tested and the corresponding disparity map into the trained model for classification prediction.
In an embodiment of the present invention, the step S1 specifically includes the following steps:
step S11, using multimedia video processing tool to divide the video into a frame image;
step S12, dividing the stereo video into video segments which are not overlapped with each other by using a shot division algorithm, wherein each segment is called a shot;
and step S13, dividing each frame into a left view and a right view, and calculating the horizontal displacement of the corresponding pixel points in the left view and the right view by using a Siftflow algorithm to serve as a disparity map.
In an embodiment of the present invention, the step S2 specifically includes the following steps:
s21, sparsely sampling frames in a lens, and randomly selecting 8 frames in sequence;
step S22, randomly extracting a sequence a frames from the 8 sampled frames, and respectively sensing the time sequence relation between the a frames by using a pre-trained time inference network, wherein the value range of a is between 2 and 8; given a video V, the temporal relationship T between two frames2(V) is represented by the following formula:
wherein ,fi and fjRespectively representing the characteristics of the ith frame and the jth frame of the video extracted by using a basic characteristic extraction network comprising AlexNet, VGG, GoogleNet, ResNet or BN-incorporation,is a two-layer multilayer perceptron, each layer has 256 units, theta is the multilayer perceptronThe parameters of (1); similarly, the timing relationship T between 3-8 frames3(V)、T4(V)、T5(V)、T6(V)、T7(V) and T8(V) are respectively represented by the following formulae:
wherein ,fi、fj、fk、fl、fm、fn、fo and fpThe characteristics of the ith frame, the jth frame, the kth frame, the ith frame, the mth frame, the nth frame, the ith frame and the pth frame of the video extracted by using a basic characteristic extraction network comprising AlexNet, VGG, GoogleLeNet, ResNet or BN-addition are shown,a two-layered multi-layered perceptron representing the temporal relationship between a-frames, each layer having 256 elements, θ beingMultilayer perceptronThe parameters of (1);
step S23, splicing the inter-frame time sequence relations of various time scales in the lens to obtain a frame level feature Tall(V), the calculation formula is as follows:
Tall(V)=[T2(V),T3(V),T4(V),T5(V),T6(V),T7(V),T8(V)]。
in an embodiment of the present invention, the step S3 specifically includes the following steps:
step S31, firstly, the time sequence relation characteristic T between a frames output by the network is reasoned for each timea(V) solving for hidden layer vector ua:
ua=tanh(WfTa(V)+bf)
wherein Wf and bfParameters of a single-layer perceptron;
step S32, in order to measure the importance of each time scale time relation, the pair uaCarrying out a standardization operation:
wherein ufThe context vector represents the importance of the time sequence relation of the corresponding time scale, and is randomly initialized during training and obtained through learning;
step S33, the final time feature x is the frame-level feature, and the calculation formula is as follows:
in an embodiment of the present invention, the step S4 specifically includes the following steps:
step S41, Using step S33, the frame level of each shot in the S consecutive shotsCharacteristic splicing; each shot has a frame level feature x, and then the frame level features of the t, t ═ 1,2tThe frame level features are used as the input of the bidirectional gating circulation unit; the input of the gating cycle unit at the t, t is 1,2t-1And the frame-level feature x of the t-th shottThe hidden layer information h is output at the next momentt(ii) a The gated cycle cell contains 2 gates: reset gate rtAnd an update gate ztThe former is used for calculating candidate hidden layersControlling how much previous hidden layer h is reservedt-1The information of (a); the latter for controlling the addition of candidate hidden layersAmount of information to obtain hidden state h of outputt;rt、zt、htThe calculation formula of (a) is as follows:
zt=σ(Wzxt+Uzht-1)
rt=σ(Wrxt+Urht-1)
wherein σ is a logic sigmoid function, Δ is an element multiplication, tanh is an activation function, Wz、Uz、Wr、UrW, U is a weight matrix learned in training;
step S42, circulating list due to bidirectional door controlThe element is composed of 2 unidirectional gate control circulation units with opposite directions, so that h is finally outputtThe hidden states of the two gating circulation units are jointly determined; at each moment, the input is simultaneously provided for the 2 gating circulation units with opposite directions, the output is determined by the 2 unidirectional gating circulation units together, and the outputs of the 2 unidirectional gating circulation units are spliced to be used as the output of the bidirectional gating circulation unit to obtain a hidden state set output by the bidirectional gating circulation unit; when the input is a sequence of video frames, the output of the bidirectional gated cyclic unit is a set h of hidden statesf(ii) a When the input is a disparity map sequence, the output of the bidirectional gating circulation unit is a hidden state set hd,hf and hdThe calculation formula of (a) is as follows:
wherein ,indicating a hidden state output at the moment t, t-1, 2.., s of the video frame sequence;this indicates a hidden state output at time t, t 1, 2.
1. The method for classifying the comfort of professional stereoscopic video based on multi-layer attention and BiGRU according to claim 5, wherein the step S5 specifically comprises the following steps:
step S51, for the video frame sequence, the model first outputs the hidden state for the gating cycle unit at each timeVector of hidden layerut:
wherein Ws and bsIs a parameter of the single layer perceptron;
step S52, in order to measure the importance of each shot, the pair utCarrying out a standardization operation:
wherein ,usThe context vector represents the importance of the corresponding shot, and is initialized randomly during training and obtained through learning;
step S53, hiding state h of video frame sequencefThe calculation formula of (2):
step S54, similarly, the hidden state h of the disparity map sequence can be obtained through the above processdH is to bef and hdSplicing to obtain ha,haThe calculation formula of (a) is as follows:
ha=[hf,hd]
so far the final shot level feature is complete.
In an embodiment of the present invention, the step S6 specifically includes the following steps:
step S61, calculating h by adopting channel attentionaThe weight of each hidden state in (1), denoted as Fscale(-) the calculation is as follows:
Fscale(·,·)=σ(W2δ(W1ha))
where, δ is the ReLU function, σ is the sigmoid function, W1 and W2Respectively two single layer perceptionsA parameter matrix of the machine is obtained through training;
step S62, useIndicating the degree of importance of each channel ultimately obtained,is Fscale(. phi) and haThe formula of the vector product of (a) is as follows:
weighted final hidden stateAnd obtaining the final classification probability through a classification network.
In an embodiment of the present invention, the step S7 specifically includes the following steps:
step S71, to prevent network overfittingInputting a first random inactivation layer of a classification network layer;
step S72, inputting the output after random inactivation into a full connection layer of the second layer of the classification network layer, converting the output of the full connection layer into the classification probability in the range of (0,1) through the normalization index function, and judging the professional stereoscopic video as being suitable for children to watch or only suitable for adults to watch;
step S73, calculating the parameter gradient of the professional stereoscopic video visual comfort classification model by using a back propagation method according to the cross entropy loss function, and updating the parameter by using a self-adaptive gradient descent method;
wherein the cross entropy loss function L is defined as follows:
n denotes the number of samples in each batch, yiLabel representing sample i, positive sample yiIs 1, represents a negative example y suitable for children to watchi0, representing suitability for adult viewing only, piRepresenting the probability that the model predicts the sample i as a positive sample;
and S74, training in batches until the L value calculated in the step S53 converges to a threshold value or the iteration times reach the threshold value, completing network training, learning the optimal parameters of the professional stereoscopic video visual comfort classification model, and storing the model parameters.
In an embodiment of the present invention, the step S8 specifically includes the following steps:
step S81, preprocessing a video set to be tested by using the step S1 to obtain a disparity map;
step S82, performing frame level processing on the left view of the stereoscopic video and the corresponding disparity map in the video set to be tested by using the step S2;
step S83, processing and predicting the video set to be tested by using the training model parameters saved in the step S7 through steps S3 to S7; and each continuous s shots is taken as a sample, and when the probability that the model predicts that the sample is a positive sample is greater than 0.5, the sample is judged to be classified as the positive sample, otherwise, the sample is a negative sample.
Compared with the prior art, the invention has the following beneficial effects: firstly, aiming at the problem that the current stereo video comfort evaluation algorithm does not consider children as audience objects, the invention provides a professional stereo video vision comfort classification method based on multilayer attention and a recurrent neural network, which can be used for distinguishing whether professional stereo videos are suitable for children to watch. Secondly, considering that main factors causing visual discomfort comprise video content and parallax, the method adopts a double-flow structure to respectively study the characteristics of a stereoscopic video frame and a parallax map sequence and the time sequence relation of the characteristics, and more comprehensively evaluates the stereoscopic vision comfort level of the stereoscopic video. Finally, because visual discomfort usually occurs in video segments and branches, the difficulty of classification is increased, and the method adopts frame level attention, lens level attention and channel attention to enable the model to pay more attention to the segments and branches causing the visual discomfort, so that the classification accuracy is improved.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a diagram of an overall structure of a professional stereoscopic video visual comfort classification model according to an embodiment of the present invention;
fig. 3 is a diagram of a frame-level processing temporal inference network model architecture in an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.
As shown in fig. 1 and fig. 2, the present embodiment provides a professional stereoscopic video comfort classification method based on multi-layer attention and BiGRU, including the following steps:
step S1, carrying out scene segmentation on the training video set and the video set to be predicted and obtaining a disparity map through preprocessing; the method specifically comprises the following steps:
step S11: segmenting the video into a frame of image using a multimedia video processing tool;
step S12: dividing the stereoscopic video into video segments which are not overlapped with each other by using a shot division algorithm, wherein each segment is called a shot;
step S13: and dividing each frame into a left view and a right view, and calculating the horizontal displacement of corresponding pixel points in the left view and the right view by using a Siftflow algorithm to serve as a disparity map.
Step S2, frame level processing, namely, taking the left view of the stereoscopic video and the corresponding disparity map in the training video set as double-current input to carry out the frame level processing, and sensing the time sequence relation between frames in each shot from a plurality of time scales by using a time inference network; the method specifically comprises the following steps:
s21, sparsely sampling frames in a lens, and randomly selecting 8 frames in sequence;
step S22, randomly extracting a frames in order from the 8 sampled frames by using pre-training timeThe reasoning network respectively senses the time sequence relation between the frames a, and the value range of a is between 2 and 8; given a video V, the temporal relationship T between two frames2(V) is represented by the following formula:
wherein ,fi and fjRespectively representing the characteristics of the ith frame and the jth frame of the video extracted by using a basic characteristic extraction network comprising AlexNet, VGG, GoogleNet, ResNet or BN-incorporation,is a two-layer multilayer perceptron, each layer has 256 units, theta is the multilayer perceptronThe parameters of (1); similarly, the timing relationship T between 3-8 frames3(V)、T4(V)、T5(V)、T6(V)、T7(V) and T8(V) are respectively represented by the following formulae:
wherein ,fi、fj、fk、fl、fm、fn、fo and fpThe characteristics of the ith frame, the jth frame, the kth frame, the ith frame, the mth frame, the nth frame, the ith frame and the pth frame of the video extracted by using a basic characteristic extraction network comprising AlexNet, VGG, GoogleLeNet, ResNet or BN-addition are shown,a two-layer multi-layer perceptron for extracting the time sequence relation between a frames is shown, each layer has 256 units, theta is the multi-layer perceptronThe parameters of (1);
step S23, splicing the inter-frame time sequence relations of various time scales in the lens to obtain a frame level feature Tall(V), the calculation formula is as follows:
Tall(V)=[T2(V),T3(V),T4(V),T5(V),T6(V),T7(V),T8(V)]。
step S3, frame level attention processing, wherein the final frame level characteristics are obtained by weighting and summing the time sequence relation between frames in each shot; the method specifically comprises the following steps:
step S31, firstly, the time sequence relation characteristic T between a frames output by the network is reasoned for each timea(V) solving for hidden layer vector ua:
ua=tanh(WfTa(V)+bf)
wherein Wf and bfParameters of a single-layer perceptron;
step S32, in order to measure the importance of each time scale time relation, the pair uaCarrying out a standardization operation:
wherein ufThe context vector represents the importance of the time sequence relation of the corresponding time scale, and is randomly initialized during training and obtained through learning;
step S33, the final time feature x is the frame-level feature, and the calculation formula is as follows:
step S4, lens-level processing, namely, sensing frame-level characteristics of a plurality of continuous lenses by using a cyclic neural network bidirectional gating cyclic unit and outputting a hidden state set; the method specifically comprises the following steps:
step S41, splicing the frame-level features of each shot in the S continuous shots by using the step S33; each shot has a frame level feature x, and then the frame level features of the t, t ═ 1,2tThe frame level features are used as the input of the bidirectional gating circulation unit; the input of the gating cycle unit at the t, t is 1,2t-1And the frame-level feature x of the t-th shottThe hidden layer information h is output at the next momentt(ii) a The gated cycle cell contains 2 gates: reset gate rtAnd an update gate ztThe former is used for calculating candidate hidden layersControlling how much previous hidden layer h is reservedt-1The information of (a); the latter for controlling the addition of candidate hidden layersAmount of information to obtain hidden state h of outputt;rt、zt、htThe calculation formula of (a) is as follows:
zt=σ(Wzxt+Uzht-1)
rt=σ(Wrxt+Urht-1)
wherein σ is a logic sigmoid function, Δ is an element multiplication, tanh is an activation function, Wz、Uz、Wr、UrW, U is a weight matrix learned in training;
step S42, since the bidirectional gating cycle unit is composed of 2 unidirectional gating cycle units with opposite directions, the last h outputtThe hidden states of the two gating circulation units are jointly determined; at each moment, the input is simultaneously provided for the 2 gating circulation units with opposite directions, the output is determined by the 2 unidirectional gating circulation units together, and the outputs of the 2 unidirectional gating circulation units are spliced to be used as the output of the bidirectional gating circulation unit to obtain a hidden state set output by the bidirectional gating circulation unit; when the input is a sequence of video frames, the output of the bidirectional gated cyclic unit is a set h of hidden statesf(ii) a When the input is a disparity map sequence, the output of the bidirectional gating circulation unit is a hidden state set hd,hf and hdThe calculation formula of (a) is as follows:
wherein ,indicating a hidden state output at the moment t, t-1, 2.., s of the video frame sequence;this indicates a hidden state output at time t, t 1, 2.
Step S5, lens level attention processing, namely, carrying out weighted summation on the hidden state set output in the step S4 to obtain final lens level characteristics; the method specifically comprises the following steps:
step S51, for the video frame sequence, the model first outputs the hidden state for the gating cycle unit at each timeSolving hidden layer vector ut:
wherein Ws and bsIs a parameter of the single layer perceptron;
step S52, in order to measure the importance of each shot, the pair utCarrying out a standardization operation:
wherein ,usThe context vector represents the importance of the corresponding shot, and is initialized randomly during training and obtained through learning;
step S53, hiding state h of video frame sequencefThe calculation formula of (2):
step S54, similarly, the hidden state h of the disparity map sequence can be obtained through the above processdH is to bef and hdSplicing to obtain ha,haThe calculation formula of (a) is as follows:
ha=[hf,hd]
so far the final shot level feature is complete.
S6, fusing double streams, and fusing the lens level features output in the S5 by using a channel attention network to obtain a final hidden state; the method specifically comprises the following steps:
step S61, calculating h by adopting channel attentionaThe weight of each hidden state in (1), denoted as Fscale(-) the calculation is as follows:
Fscale(·,·)=σ(W2δ(W1ha))
where, δ is the ReLU function, σ is the sigmoid function, W1 and W2Parameter matrixes of two single-layer perceptrons are respectively obtained through training;
step S62, useIndicating the degree of importance of each channel ultimately obtained,is Fscale(. phi) and haThe formula of the vector product of (a) is as follows:
S7, outputting classification probability through a classification network according to the final hidden state, classifying the professional stereo video into a model suitable for children to watch or only suitable for adults to watch, and obtaining a well-constructed professional stereo video visual comfort classification model from the step S2; training the professional stereoscopic video visual comfort degree classification model, learning the optimal parameters of the professional stereoscopic video visual comfort degree classification model by solving a minimum loss function in the training process, and storing the trained model; the method specifically comprises the following steps:
step S71, to prevent network overfittingInputting a first random inactivation layer of a classification network layer;
step S72, inputting the output after random inactivation into a full connection layer of the second layer of the classification network layer, converting the output of the full connection layer into the classification probability in the range of (0,1) through the normalization index function, and judging the professional stereoscopic video as being suitable for children to watch or only suitable for adults to watch;
step S73, calculating the parameter gradient of the professional stereoscopic video visual comfort classification model by using a back propagation method according to the cross entropy loss function, and updating the parameter by using a self-adaptive gradient descent method;
wherein the cross entropy loss function L is defined as follows:
n denotes the number of samples in each batch, yiLabel representing sample i, positive sample yiIs 1, represents a negative example y suitable for children to watchi0, representing suitability for adult viewing only, piRepresenting the probability that the model predicts the sample i as a positive sample;
and S74, training in batches until the L value calculated in the step S53 converges to a threshold value or the iteration times reach the threshold value, completing network training, learning the optimal parameters of the professional stereoscopic video visual comfort classification model, and storing the model parameters.
S8, inputting the left view of the video set to be tested and the corresponding disparity map into a trained model for classification prediction; the method specifically comprises the following steps:
step S81, preprocessing a video set to be tested by using the step S1 to obtain a disparity map;
step S82, performing frame level processing on the left view of the stereoscopic video and the corresponding disparity map in the video set to be tested by using the step S2;
step S83, processing and predicting the video set to be tested by using the training model parameters saved in the step S7 through steps S3 to S7; and each continuous s shots is taken as a sample, and when the probability that the model predicts that the sample is a positive sample is greater than 0.5, the sample is judged to be classified as the positive sample, otherwise, the sample is a negative sample.
Preferably, in the present embodiment, the professional stereoscopic video visual comfort classification model is composed of a network constructed in S2 to S7.
Preferably, in this embodiment, video frames and disparity maps of a plurality of consecutive shots of a professional stereoscopic video are used as input, a temporal inference network and a bidirectional gating cycle unit are used to respectively sense and evaluate the long and short time sequence relationship of the video from a frame level and a shot level, a multi-layer attention is used to integrate information of video segments and branches causing visual discomfort, and finally the professional stereoscopic video is judged to be suitable for children to watch or only suitable for adults to watch.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.
Claims (9)
1. A professional stereoscopic video comfort classification method based on multilayer attention and BiGRU is characterized by comprising the following steps:
step S1, carrying out scene segmentation on the training video set and the video set to be predicted and obtaining a disparity map through preprocessing;
step S2, frame level processing, namely, taking the left view of the stereoscopic video and the corresponding disparity map in the training video set as double-current input to carry out the frame level processing, and sensing the time sequence relation between frames in each shot from a plurality of time scales by using a time inference network;
step S3, frame level attention processing, wherein the final frame level characteristics are obtained by weighting and summing the time sequence relation between frames in each shot;
step S4, lens-level processing, namely, sensing frame-level characteristics of a plurality of continuous lenses by using a cyclic neural network bidirectional gating cyclic unit and outputting a hidden state set;
step S5, lens level attention processing, namely, carrying out weighted summation on the hidden state set output in the step S4 to obtain final lens level characteristics;
s6, fusing double streams, and fusing the lens level features output in the S5 by using a channel attention network to obtain a final hidden state;
s7, outputting classification probability through a classification network according to the final hidden state, classifying the professional stereo video into a model suitable for children to watch or only suitable for adults to watch, and obtaining a well-constructed professional stereo video visual comfort classification model from the step S2; training the professional stereoscopic video visual comfort degree classification model, learning the optimal parameters of the professional stereoscopic video visual comfort degree classification model by solving a minimum loss function in the training process, and storing the trained model;
and S8, inputting the left view of the video set to be tested and the corresponding disparity map into the trained model for classification prediction.
2. The method for classifying the comfort of professional stereoscopic video based on multi-layer attention and BiGRU as claimed in claim 1, wherein the step S1 specifically comprises the following steps:
step S11, using multimedia video processing tool to divide the video into a frame image;
step S12, dividing the stereo video into video segments which are not overlapped with each other by using a shot division algorithm, wherein each segment is called a shot;
and step S13, dividing each frame into a left view and a right view, and calculating the horizontal displacement of the corresponding pixel points in the left view and the right view by using a Siftflow algorithm to serve as a disparity map.
3. The method for classifying the comfort of the professional stereoscopic video based on multi-layer attention and BiGRU according to claim 2, wherein the step S2 specifically comprises the following steps:
s21, sparsely sampling frames in a lens, and randomly selecting 8 frames in sequence;
step S22, randomly extracting a sequence a frames from the 8 sampled frames, and respectively sensing the time sequence relation between the a frames by using a pre-trained time inference network, wherein the value range of a is between 2 and 8; given a video V, the temporal relationship T between two frames2(V) is represented by the following formula:
wherein ,fi and fjRespectively representing the characteristics of the ith frame and the jth frame of the video extracted by using a basic characteristic extraction network comprising AlexNet, VGG, GoogleNet, ResNet or BN-incorporation,is a two-layer multilayer perceptron, each layer has 256 units, theta is the multilayer perceptronThe parameters of (1); in a similar manner to that described above,3-8 interframe time sequence relation T3(V)、T4(V)、T5(V)、T6(V)、T7(V) and T8(V) are respectively represented by the following formulae:
wherein ,fi、fj、fk、fl、fm、fn、fo and fpThe characteristics of the ith frame, the jth frame, the kth frame, the ith frame, the mth frame, the nth frame, the ith frame and the pth frame of the video extracted by using a basic characteristic extraction network comprising AlexNet, VGG, GoogleLeNet, ResNet or BN-addition are shown,a two-layered multi-layered perceptron representing the temporal relationship between a-frames, each layer having 256 elements,theta is a multilayer perceptronThe parameters of (1);
step S23, splicing the inter-frame time sequence relations of various time scales in the lens to obtain a frame level feature Tall(V), the calculation formula is as follows:
Tall(V)=[T2(V),T3(V),T4(V),T5(V),T6(V),T7(V),T8(V)]。
4. the method for classifying the comfort of professional stereoscopic video based on multi-layer attention and BiGRU according to claim 3, wherein the step S3 specifically comprises the following steps:
step S31, firstly, the time sequence relation characteristic T between a frames output by the network is reasoned for each timea(V) solving for hidden layer vector ua:
ua=tanh(WfTa(V)+bf)
wherein Wf and bfParameters of a single-layer perceptron;
step S32, in order to measure the importance of each time scale time relation, the pair uaCarrying out a standardization operation:
wherein ufThe context vector represents the importance of the time sequence relation of the corresponding time scale, and is randomly initialized during training and obtained through learning;
step S33, the final time feature x is the frame-level feature, and the calculation formula is as follows:
5. the method for classifying the comfort of professional stereoscopic video based on multi-layer attention and BiGRU according to claim 4, wherein the step S4 specifically comprises the following steps:
step S41, splicing the frame-level features of each shot in the S continuous shots by using the step S33; each shot has a frame level feature x, and then the frame level features of the t, t ═ 1,2tThe frame level features are used as the input of the bidirectional gating circulation unit; the input of the gating cycle unit at the t, t is 1,2t-1And the frame-level feature x of the t-th shottThe hidden layer information h is output at the next momentt(ii) a The gated cycle cell contains 2 gates: reset gate rtAnd an update gate ztThe former is used for calculating candidate hidden layersControlling how much previous hidden layer h is reservedt-1The information of (a); the latter for controlling the addition of candidate hidden layersAmount of information to obtain hidden state h of outputt;rt、zt、htThe calculation formula of (a) is as follows:
zt=σ(Wzxt+Uzht-1)
rt=σ(Wrxt+Urht-1)
wherein σ is a logic sigmoid function, Δ is an element multiplication, tanh is an activation function, Wz、Uz、Wr、UrW, U is a weight matrix learned in training;
step S42, since the bidirectional gating cycle unit is composed of 2 unidirectional gating cycle units with opposite directions, the last h outputtThe hidden states of the two gating circulation units are jointly determined; at each moment, the input is simultaneously provided for the 2 gating circulation units with opposite directions, the output is determined by the 2 unidirectional gating circulation units together, and the outputs of the 2 unidirectional gating circulation units are spliced to be used as the output of the bidirectional gating circulation unit to obtain a hidden state set output by the bidirectional gating circulation unit; when the input is a sequence of video frames, the output of the bidirectional gated cyclic unit is a set h of hidden statesf(ii) a When the input is a disparity map sequence, the output of the bidirectional gating circulation unit is a hidden state set hd,hf and hdThe calculation formula of (a) is as follows:
6. The method for classifying the comfort of professional stereoscopic video based on multi-layer attention and BiGRU according to claim 5, wherein the step S5 specifically comprises the following steps:
step S51, for the video frame sequence, the model first outputs the hidden state for the gating cycle unit at each timeSolving hidden layer vector ut:
wherein Ws and bsIs a parameter of the single layer perceptron;
step S52, in order to measure the importance of each shot, the pair utCarrying out a standardization operation:
wherein ,usThe context vector represents the importance of the corresponding shot, and is initialized randomly during training and obtained through learning;
step S53, hiding state h of video frame sequencefThe calculation formula of (2):
step S54, similarly, the hidden state h of the disparity map sequence can be obtained through the above processdH is to bef and hdSplicing to obtain ha,haThe calculation formula of (a) is as follows:
ha=[hf,hd]
so far the final shot level feature is complete.
7. The method for classifying the comfort of professional stereoscopic video based on multi-layer attention and BiGRU according to claim 6, wherein the step S6 specifically comprises the following steps:
step S61, calculating h by adopting channel attentionaThe weight of each hidden state in (1), denoted as Fscale(-) the calculation is as follows:
Fscale(·,·)=σ(W2δ(W1ha))
where, δ is the ReLU function, σ is the sigmoid function, W1 and W2Parameter matrixes of two single-layer perceptrons are respectively obtained through training;
step S62, useIndicating the degree of importance of each channel ultimately obtained,is Fscale(. phi) and haThe formula of the vector product of (a) is as follows:
8. The method for classifying the comfort of professional stereoscopic video based on multi-layer attention and BiGRU as claimed in claim 7, wherein the step S7 specifically comprises the following steps:
step S71, to prevent network overfittingInputting a first random inactivation layer of a classification network layer;
step S72, inputting the output after random inactivation into a full connection layer of the second layer of the classification network layer, converting the output of the full connection layer into the classification probability in the range of (0,1) through the normalization index function, and judging the professional stereoscopic video as being suitable for children to watch or only suitable for adults to watch;
step S73, calculating the parameter gradient of the professional stereoscopic video visual comfort classification model by using a back propagation method according to the cross entropy loss function, and updating the parameter by using a self-adaptive gradient descent method;
wherein the cross entropy loss function L is defined as follows:
n denotes the number of samples in each batch, yiLabel representing sample i, positive sample yiIs 1, represents a negative example y suitable for children to watchi0, representing suitability for adult viewing only, piRepresenting the probability that the model predicts the sample i as a positive sample;
and S74, training in batches until the L value calculated in the step S53 converges to a threshold value or the iteration times reach the threshold value, completing network training, learning the optimal parameters of the professional stereoscopic video visual comfort classification model, and storing the model parameters.
9. The method for classifying the comfort of the professional stereoscopic video based on multi-layer attention and BiGRU according to claim 8, wherein the step S8 specifically comprises the following steps:
step S81, preprocessing a video set to be tested by using the step S1 to obtain a disparity map;
step S82, performing frame level processing on the left view of the stereoscopic video and the corresponding disparity map in the video set to be tested by using the step S2;
step S83, processing and predicting the video set to be tested by using the training model parameters saved in the step S7 through steps S3 to S7; and each continuous s shots is taken as a sample, and when the probability that the model predicts that the sample is a positive sample is greater than 0.5, the sample is judged to be classified as the positive sample, otherwise, the sample is a negative sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110016985.7A CN112613486B (en) | 2021-01-07 | 2021-01-07 | Professional stereoscopic video comfort level classification method based on multilayer attention and BiGRU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110016985.7A CN112613486B (en) | 2021-01-07 | 2021-01-07 | Professional stereoscopic video comfort level classification method based on multilayer attention and BiGRU |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112613486A true CN112613486A (en) | 2021-04-06 |
CN112613486B CN112613486B (en) | 2023-08-08 |
Family
ID=75253406
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110016985.7A Active CN112613486B (en) | 2021-01-07 | 2021-01-07 | Professional stereoscopic video comfort level classification method based on multilayer attention and BiGRU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112613486B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113807318A (en) * | 2021-10-11 | 2021-12-17 | 南京信息工程大学 | Action identification method based on double-current convolutional neural network and bidirectional GRU |
CN116935292A (en) * | 2023-09-15 | 2023-10-24 | 山东建筑大学 | Short video scene classification method and system based on self-attention model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101896A (en) * | 2018-07-19 | 2018-12-28 | 电子科技大学 | A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism |
CN109508642A (en) * | 2018-10-17 | 2019-03-22 | 杭州电子科技大学 | Ship monitor video key frame extracting method based on two-way GRU and attention mechanism |
CN111860691A (en) * | 2020-07-31 | 2020-10-30 | 福州大学 | Professional stereoscopic video visual comfort degree classification method based on attention and recurrent neural network |
WO2020221278A1 (en) * | 2019-04-29 | 2020-11-05 | 北京金山云网络技术有限公司 | Video classification method and model training method and apparatus thereof, and electronic device |
-
2021
- 2021-01-07 CN CN202110016985.7A patent/CN112613486B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101896A (en) * | 2018-07-19 | 2018-12-28 | 电子科技大学 | A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism |
CN109508642A (en) * | 2018-10-17 | 2019-03-22 | 杭州电子科技大学 | Ship monitor video key frame extracting method based on two-way GRU and attention mechanism |
WO2020221278A1 (en) * | 2019-04-29 | 2020-11-05 | 北京金山云网络技术有限公司 | Video classification method and model training method and apparatus thereof, and electronic device |
CN111860691A (en) * | 2020-07-31 | 2020-10-30 | 福州大学 | Professional stereoscopic video visual comfort degree classification method based on attention and recurrent neural network |
Non-Patent Citations (4)
Title |
---|
XUANZHEN FENG ET.AL: "Sentiment Classification of Reviews Based on BiGRU Neural Network and Fine-grained Attention", 《JOURNAL OF PHYSICS: CONFERENCE SERIES》 * |
李钊光: "基于深度学习和迁移学习的体育视频分类研究", 《电子测量技术》 * |
桑海峰;赵子裕;何大阔;: "基于循环区域关注和视频帧关注的视频行为识别网络设计", 电子学报, no. 06 * |
魏乐松 等: "基于边缘和结构的无参考屏幕内容图像质量评估", 《北京航空航天大学学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113807318A (en) * | 2021-10-11 | 2021-12-17 | 南京信息工程大学 | Action identification method based on double-current convolutional neural network and bidirectional GRU |
CN113807318B (en) * | 2021-10-11 | 2023-10-31 | 南京信息工程大学 | Action recognition method based on double-flow convolutional neural network and bidirectional GRU |
CN116935292A (en) * | 2023-09-15 | 2023-10-24 | 山东建筑大学 | Short video scene classification method and system based on self-attention model |
CN116935292B (en) * | 2023-09-15 | 2023-12-08 | 山东建筑大学 | Short video scene classification method and system based on self-attention model |
Also Published As
Publication number | Publication date |
---|---|
CN112613486B (en) | 2023-08-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109902546B (en) | Face recognition method, face recognition device and computer readable medium | |
CN111860691B (en) | Stereo video visual comfort degree classification method based on attention and recurrent neural network | |
WO2021093468A1 (en) | Video classification method and apparatus, model training method and apparatus, device and storage medium | |
Das et al. | Where to focus on for human action recognition? | |
CN112149459B (en) | Video saliency object detection model and system based on cross attention mechanism | |
CN112446476A (en) | Neural network model compression method, device, storage medium and chip | |
Saputra et al. | Learning monocular visual odometry through geometry-aware curriculum learning | |
CN110532871A (en) | The method and apparatus of image procossing | |
CN112668366B (en) | Image recognition method, device, computer readable storage medium and chip | |
CN109919221B (en) | Image description method based on bidirectional double-attention machine | |
CN114529984B (en) | Bone action recognition method based on learning PL-GCN and ECLSTM | |
CN112434608B (en) | Human behavior identification method and system based on double-current combined network | |
CN112613486B (en) | Professional stereoscopic video comfort level classification method based on multilayer attention and BiGRU | |
CN113469958A (en) | Method, system, equipment and storage medium for predicting development potential of embryo | |
CN114360073B (en) | Image recognition method and related device | |
CN110599443A (en) | Visual saliency detection method using bidirectional long-term and short-term memory network | |
CN112507920A (en) | Examination abnormal behavior identification method based on time displacement and attention mechanism | |
CN114842542A (en) | Facial action unit identification method and device based on self-adaptive attention and space-time correlation | |
CN114677730A (en) | Living body detection method, living body detection device, electronic apparatus, and storage medium | |
CN113239866B (en) | Face recognition method and system based on space-time feature fusion and sample attention enhancement | |
CN116402811B (en) | Fighting behavior identification method and electronic equipment | |
Zhong | A convolutional neural network based online teaching method using edge-cloud computing platform | |
CN111611852A (en) | Method, device and equipment for training expression recognition model | |
CN116452472A (en) | Low-illumination image enhancement method based on semantic knowledge guidance | |
Saif et al. | Aggressive action estimation: a comprehensive review on neural network based human segmentation and action recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |