CN112766147B - Error action positioning method based on deep learning - Google Patents

Error action positioning method based on deep learning Download PDF

Info

Publication number
CN112766147B
CN112766147B CN202110058400.8A CN202110058400A CN112766147B CN 112766147 B CN112766147 B CN 112766147B CN 202110058400 A CN202110058400 A CN 202110058400A CN 112766147 B CN112766147 B CN 112766147B
Authority
CN
China
Prior art keywords
action
layer
recognition model
network
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110058400.8A
Other languages
Chinese (zh)
Other versions
CN112766147A (en
Inventor
李冬生
李子奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202110058400.8A priority Critical patent/CN112766147B/en
Publication of CN112766147A publication Critical patent/CN112766147A/en
Application granted granted Critical
Publication of CN112766147B publication Critical patent/CN112766147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer vision, and relates to a method for positioning an error action based on deep learning. The method specifically comprises the steps of training an Action recognition model, building an Action-CAM network, inserting the Action-CAM network into the Action recognition model, and testing the error Action. The invention utilizes the deep learning technology to carry out error action positioning, and human error action correction in the traditional method needs manpower. The invention is a method for positioning the error Action without additionally training the error Action, and the error Action is positioned by using the Action recognition model with the trained correct Action and the Action-CAM network combination provided by the invention, thereby saving a large amount of training cost.

Description

Error action positioning method based on deep learning
Technical Field
The invention belongs to the field of computer vision, and relates to a method for positioning an error action based on deep learning.
Background
Video software is increasingly popular due to the popularization of video equipment. Since videos are large in size and various in variety, many techniques have been developed to automatically determine human actions in videos. However, the current methods for judging human actions are all focused on judging the correct actions of human bodies, and how to detect wrong actions and locate wrong positions is not involved.
The positioning of the wrong actions is a very meaningful work, taking a gymnastics as an example, in the training at ordinary times, if the wrong action part of the gymnastics athlete can be automatically detected through a video, the correction of the gymnastics athlete can be helped, and the sports result is improved. In the building engineering, if constructors use irregular actions for construction and accumulate for a long time, occupational diseases of bodies can be caused.
Therefore, a fast and accurate method for locating a malfunction is required.
Disclosure of Invention
In order to solve the above problems, the present invention provides a new method for positioning an error action based on deep learning.
The technical scheme of the invention is as follows:
a method for positioning error actions based on deep learning comprises the following specific steps:
step one, training action recognition model
Taking correct actions as a data set, wherein the data set comprises a plurality of correct and standard actions, and training an action recognition model by using the data set;
the motion recognition model is input as a motion video, the classification network is a 3DCNN (3D convolutional neural network) -based motion recognition network, comprises a convolutional layer, a pooling layer, an activation layer and the like, and is output as a video marked with motion types.
Step two, building an Action-CAM network
In the action recognition model in the step one, let c be the type of the error action,
Figure BDA0002901530740000021
is the (i, j) th point pixel value on the n-th layer characteristic diagram,
Figure BDA0002901530740000022
obtaining the hot spot diagram of the nth layer for the weight corresponding to the characteristic diagram of the kth layer of the nth layer in the action recognition model
Figure BDA0002901530740000023
Comprises the following steps:
Figure BDA0002901530740000024
wherein
Figure BDA0002901530740000025
Is weighted by pixel
Figure BDA0002901530740000026
And pixel gradient are weighted multiplied by the activation value of nrellu to:
Figure BDA0002901530740000027
where NReLU expression is equation (3), the function image is shown in fig. 2:
NReLU=f(x)=min(x,0) (3)
because:
Figure BDA0002901530740000028
wherein Y is c The confidence score representing the final target action is determined by calculating Y c About
Figure BDA0002901530740000029
Partial derivative, obtaining
Figure BDA00029015307400000210
Expression:
Figure BDA00029015307400000211
wherein (a, b) is a pixel point different from (i, j) in the same characteristic diagram, and when the action recognition model is activated by Softmax, the confidence score Y of the final target action c Comprises the following steps:
Figure BDA0002901530740000031
wherein S c Score, S, representing Softmax input layer target action k Indicating the score for the Softmax input layer class k action.
Due to the chain rule:
Figure BDA0002901530740000032
and Y c About
Figure BDA0002901530740000033
Higher-order partial derivatives:
first derivative:
Figure BDA0002901530740000034
second derivative:
Figure BDA0002901530740000035
third derivative:
Figure BDA0002901530740000036
at this time, it is obtained
Figure BDA0002901530740000037
Hot spot map for each layer
Figure BDA0002901530740000038
And accumulating after unifying the scales to obtain a final hotspot graph.
Step three, inserting the Action-CAM network into the Action recognition model in the step one
And in the Action recognition model in the step I, the Action-CAM network is inserted in a manner that the Action-CAM network is added behind each pooling layer in the network of the Action recognition model to obtain an error Action positioning model, and the final hotspot graph obtained by the Action-CAM network in the step II is merged with each frame of the input Action video.
Step four, testing the error action
And (4) shooting the wrong action as a video, inputting the video into the wrong action positioning model obtained in the step three, wherein the video output by the model comprises the original video and a hot spot area, and the hot spot area is the part of the wrong action.
The invention has the beneficial effects that:
(1) The invention utilizes the deep learning technology to carry out error action positioning, and human error action correction in the traditional method needs manpower.
(2) The invention is a method for positioning the error Action without additionally training the error Action, and the error Action is positioned by using the Action recognition model with the trained correct Action and the Action-CAM network combination provided by the invention, thereby saving a large amount of training cost.
Drawings
FIG. 1 is a flow chart of the present invention;
fig. 2 is a functional diagram of the activation function nrellu referred to in the present invention.
Detailed Description
The following description will illustrate embodiments of the present invention with reference to the drawings and technical solutions.
The invention relates to a method for positioning a malfunction based on deep learning, which has a flow shown in figure 1 and comprises the following specific steps:
step one, training a C3D action recognition model
In this embodiment, taking a C3D motion recognition model as an example, 3 correct motions are photographed by a mobile phone, which are squat, wave and walk, respectively, a video of the three correct motions is made into a data set, a video format is 320 × 240 pixels, the three motions are marked in a way that the three motions are marked, the marking is named as a motion name of a video folder, and then a C3D network is trained, where the C3D training parameter is epichnumber =10 learnigore =10 -4
Step two, building an Action-CAM network
And (4) establishing an Action-CAM network according to the formulas (1) to (10). And inserting an Action-CAM network into a C3D Action recognition model, wherein the Action recognition model comprises 5 network layers, and inserting an Action-CAM behind a pooling layer in each network layer. At this time, 5 hot spot maps were obtained, with sizes of 4 × 4,7 × 7, 14 × 14, 28 × 28, and 56 × 56, respectively. Unifying the sizes of the 5 hot spot graphs into 320 multiplied by 240 pixels by a linear interpolation method, accumulating the hot spot graphs with unified sizes, and combining the accumulated hot spot graphs with corresponding frames in the video. At this point, a fault location model is obtained.
Step three, testing
And shooting a wrong action video, wherein when the wrong action is performed during squatting starting, the arm is opened wrongly. Inputting the error action video into the error action positioning model, outputting the error action positioning video, and displaying a finally obtained hot spot area as an arm, wherein the hot spot area is a part of the error action.

Claims (1)

1. A method for positioning a fault action based on deep learning is characterized by comprising the following specific steps:
step one, training a motion recognition model
Taking correct actions as a data set, wherein the data set comprises a plurality of correct and standard actions, and training an action recognition model by using the data set;
the motion recognition model is input as a motion video, the classification network is a 3 DCNN-based motion recognition network which comprises a convolution layer, a pooling layer, an activation layer and the like, and the motion recognition model is output as a video marked with motion types;
step two, establishing an Action-CAM network
In the action recognition model in the step one, let c be the type of the error action,
Figure FDA0002901530730000011
is the (i, j) th point pixel value on the n-th layer characteristic diagram,
Figure FDA0002901530730000012
obtaining the hot spot diagram of the n layer for the weight corresponding to the k layer characteristic diagram of the n layer network in the action recognition model
Figure FDA0002901530730000013
Comprises the following steps:
Figure FDA0002901530730000014
wherein
Figure FDA0002901530730000015
Is weighted by pixel
Figure FDA0002901530730000016
And pixel gradient are weighted multiplied by the activation value of nrellu to obtain:
Figure FDA0002901530730000017
where NReLU expression is equation (3), the function image is shown in fig. 2:
NReLU=f(x)=min(x,0) (3)
because:
Figure FDA0002901530730000018
wherein Y is c The confidence score representing the final target action is determined by calculating Y c About
Figure FDA0002901530730000019
Partial derivative, obtaining
Figure FDA00029015307300000110
Expression:
Figure FDA0002901530730000021
wherein (a, b) is a different pixel point in the same characteristic diagram than (i, j), and is used for action recognitionWhen the model is activated by Softmax, the confidence score Y of the final target action c Comprises the following steps:
Figure FDA0002901530730000022
wherein S c Score, S, representing Softmax input layer target action k A score representing a class k action of the Softmax input layer;
due to the chain rule:
Figure FDA0002901530730000023
and Y c About
Figure FDA0002901530730000024
Higher order partial derivatives:
first derivative:
Figure FDA0002901530730000025
second derivative:
Figure FDA0002901530730000026
third derivative:
Figure FDA0002901530730000027
at this time, it is obtained
Figure FDA0002901530730000028
Hot spot map for each layer
Figure FDA0002901530730000029
After the scales are unified, accumulating to obtain a final hotspot graph;
step three, inserting the Action-CAM network into the Action recognition model in the step one
In the Action-CAM network insertion step I in the step II, the insertion mode is that an Action-CAM network is added behind each pooling layer in the network of the Action recognition model to obtain an error Action positioning model, and a final hot point diagram obtained by the Action-CAM network in the step II is merged with each frame of the input Action video;
step four, testing the error action
And (4) shooting the wrong action as a video, and inputting the video into the wrong action positioning model obtained in the third step, wherein the video output by the model comprises the original video and a hot spot area, and the hot spot area is the part of the wrong action.
CN202110058400.8A 2021-01-16 2021-01-16 Error action positioning method based on deep learning Active CN112766147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110058400.8A CN112766147B (en) 2021-01-16 2021-01-16 Error action positioning method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110058400.8A CN112766147B (en) 2021-01-16 2021-01-16 Error action positioning method based on deep learning

Publications (2)

Publication Number Publication Date
CN112766147A CN112766147A (en) 2021-05-07
CN112766147B true CN112766147B (en) 2022-10-14

Family

ID=75702169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110058400.8A Active CN112766147B (en) 2021-01-16 2021-01-16 Error action positioning method based on deep learning

Country Status (1)

Country Link
CN (1) CN112766147B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663661A (en) * 2022-04-13 2022-06-24 中国科学院空间应用工程与技术中心 Space life science experimental object semantic segmentation method and device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909060A (en) * 2017-12-05 2018-04-13 前海健匠智能科技(深圳)有限公司 Gymnasium body-building action identification method and device based on deep learning
CN109766934A (en) * 2018-12-26 2019-05-17 北京航空航天大学 A kind of images steganalysis method based on depth Gabor network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909060A (en) * 2017-12-05 2018-04-13 前海健匠智能科技(深圳)有限公司 Gymnasium body-building action identification method and device based on deep learning
CN109766934A (en) * 2018-12-26 2019-05-17 北京航空航天大学 A kind of images steganalysis method based on depth Gabor network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习的人体行为识别网络设计;叶青等;《中国科技信息》;20200515(第10期);全文 *

Also Published As

Publication number Publication date
CN112766147A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
JP7097641B2 (en) Loop detection method based on convolution perception hash algorithm
US11417095B2 (en) Image recognition method and apparatus, electronic device, and readable storage medium using an update on body extraction parameter and alignment parameter
CN107316035A (en) Object identifying method and device based on deep learning neutral net
CN107680053A (en) A kind of fuzzy core Optimized Iterative initial value method of estimation based on deep learning classification
WO2021051526A1 (en) Multi-view 3d human pose estimation method and related apparatus
CN110223382B (en) Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning
CN112053308B (en) Image deblurring method and device, computer equipment and storage medium
CN107247952B (en) Deep supervision-based visual saliency detection method for cyclic convolution neural network
WO2021169049A1 (en) Method for glass detection in real scene
CN108108443A (en) Character marking method of street view video, terminal equipment and storage medium
CN110197183A (en) A kind of method, apparatus and computer equipment of Image Blind denoising
CN112766147B (en) Error action positioning method based on deep learning
CN111507184B (en) Human body posture detection method based on parallel cavity convolution and body structure constraint
CN107948586A (en) Trans-regional moving target detecting method and device based on video-splicing
CN116030498A (en) Virtual garment running and showing oriented three-dimensional human body posture estimation method
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN114155556B (en) Human body posture estimation method and system based on stacked hourglass network added with channel shuffling module
CN113516697B (en) Image registration method, device, electronic equipment and computer readable storage medium
CN115797808A (en) Unmanned aerial vehicle inspection defect image identification method, system, device and medium
CN107729885B (en) Face enhancement method based on multiple residual error learning
CN110222704A (en) A kind of Weakly supervised object detection method and device
CN108810319A (en) Image processing apparatus and image processing method
CN111160262A (en) Portrait segmentation method fusing human body key point detection
CN105956606A (en) Method for re-identifying pedestrians on the basis of asymmetric transformation
CN113012072A (en) Image motion deblurring method based on attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant