CN112766147B

CN112766147B - Error action positioning method based on deep learning

Info

Publication number: CN112766147B
Application number: CN202110058400.8A
Authority: CN
Inventors: 李冬生; 李子奇
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2021-01-16
Filing date: 2021-01-16
Publication date: 2022-10-14
Anticipated expiration: 2041-01-16
Also published as: CN112766147A

Abstract

The invention belongs to the field of computer vision, and relates to a method for positioning an error action based on deep learning. The method specifically comprises the steps of training an Action recognition model, building an Action-CAM network, inserting the Action-CAM network into the Action recognition model, and testing the error Action. The invention utilizes the deep learning technology to carry out error action positioning, and human error action correction in the traditional method needs manpower. The invention is a method for positioning the error Action without additionally training the error Action, and the error Action is positioned by using the Action recognition model with the trained correct Action and the Action-CAM network combination provided by the invention, thereby saving a large amount of training cost.

Description

Error action positioning method based on deep learning

Technical Field

The invention belongs to the field of computer vision, and relates to a method for positioning an error action based on deep learning.

Background

Video software is increasingly popular due to the popularization of video equipment. Since videos are large in size and various in variety, many techniques have been developed to automatically determine human actions in videos. However, the current methods for judging human actions are all focused on judging the correct actions of human bodies, and how to detect wrong actions and locate wrong positions is not involved.

The positioning of the wrong actions is a very meaningful work, taking a gymnastics as an example, in the training at ordinary times, if the wrong action part of the gymnastics athlete can be automatically detected through a video, the correction of the gymnastics athlete can be helped, and the sports result is improved. In the building engineering, if constructors use irregular actions for construction and accumulate for a long time, occupational diseases of bodies can be caused.

Therefore, a fast and accurate method for locating a malfunction is required.

Disclosure of Invention

In order to solve the above problems, the present invention provides a new method for positioning an error action based on deep learning.

The technical scheme of the invention is as follows:

a method for positioning error actions based on deep learning comprises the following specific steps:

step one, training action recognition model

Taking correct actions as a data set, wherein the data set comprises a plurality of correct and standard actions, and training an action recognition model by using the data set;

the motion recognition model is input as a motion video, the classification network is a 3DCNN (3D convolutional neural network) -based motion recognition network, comprises a convolutional layer, a pooling layer, an activation layer and the like, and is output as a video marked with motion types.

Step two, building an Action-CAM network

In the action recognition model in the step one, let c be the type of the error action,

is the (i, j) th point pixel value on the n-th layer characteristic diagram,

obtaining the hot spot diagram of the nth layer for the weight corresponding to the characteristic diagram of the kth layer of the nth layer in the action recognition model

Comprises the following steps:

wherein

Is weighted by pixel

And pixel gradient are weighted multiplied by the activation value of nrellu to:

where NReLU expression is equation (3), the function image is shown in fig. 2:

NReLU＝f(x)＝min(x，0) (3)

because:

wherein Y is ^c The confidence score representing the final target action is determined by calculating Y ^c About

Partial derivative, obtaining

Expression:

wherein (a, b) is a pixel point different from (i, j) in the same characteristic diagram, and when the action recognition model is activated by Softmax, the confidence score Y of the final target action ^c Comprises the following steps:

wherein S ^c Score, S, representing Softmax input layer target action ^k Indicating the score for the Softmax input layer class k action.

Due to the chain rule:

and Y ^c About

Higher-order partial derivatives:

first derivative:

second derivative:

third derivative:

at this time, it is obtained

Hot spot map for each layer

And accumulating after unifying the scales to obtain a final hotspot graph.

Step three, inserting the Action-CAM network into the Action recognition model in the step one

And in the Action recognition model in the step I, the Action-CAM network is inserted in a manner that the Action-CAM network is added behind each pooling layer in the network of the Action recognition model to obtain an error Action positioning model, and the final hotspot graph obtained by the Action-CAM network in the step II is merged with each frame of the input Action video.

Step four, testing the error action

And (4) shooting the wrong action as a video, inputting the video into the wrong action positioning model obtained in the step three, wherein the video output by the model comprises the original video and a hot spot area, and the hot spot area is the part of the wrong action.

The invention has the beneficial effects that:

(1) The invention utilizes the deep learning technology to carry out error action positioning, and human error action correction in the traditional method needs manpower.

(2) The invention is a method for positioning the error Action without additionally training the error Action, and the error Action is positioned by using the Action recognition model with the trained correct Action and the Action-CAM network combination provided by the invention, thereby saving a large amount of training cost.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a functional diagram of the activation function nrellu referred to in the present invention.

Detailed Description

The following description will illustrate embodiments of the present invention with reference to the drawings and technical solutions.

The invention relates to a method for positioning a malfunction based on deep learning, which has a flow shown in figure 1 and comprises the following specific steps:

step one, training a C3D action recognition model

In this embodiment, taking a C3D motion recognition model as an example, 3 correct motions are photographed by a mobile phone, which are squat, wave and walk, respectively, a video of the three correct motions is made into a data set, a video format is 320 × 240 pixels, the three motions are marked in a way that the three motions are marked, the marking is named as a motion name of a video folder, and then a C3D network is trained, where the C3D training parameter is epichnumber =10 learnigore =10 ^-4 。

Step two, building an Action-CAM network

And (4) establishing an Action-CAM network according to the formulas (1) to (10). And inserting an Action-CAM network into a C3D Action recognition model, wherein the Action recognition model comprises 5 network layers, and inserting an Action-CAM behind a pooling layer in each network layer. At this time, 5 hot spot maps were obtained, with sizes of 4 × 4,7 × 7, 14 × 14, 28 × 28, and 56 × 56, respectively. Unifying the sizes of the 5 hot spot graphs into 320 multiplied by 240 pixels by a linear interpolation method, accumulating the hot spot graphs with unified sizes, and combining the accumulated hot spot graphs with corresponding frames in the video. At this point, a fault location model is obtained.

Step three, testing

And shooting a wrong action video, wherein when the wrong action is performed during squatting starting, the arm is opened wrongly. Inputting the error action video into the error action positioning model, outputting the error action positioning video, and displaying a finally obtained hot spot area as an arm, wherein the hot spot area is a part of the error action.

Claims

1. A method for positioning a fault action based on deep learning is characterized by comprising the following specific steps:

step one, training a motion recognition model

the motion recognition model is input as a motion video, the classification network is a 3 DCNN-based motion recognition network which comprises a convolution layer, a pooling layer, an activation layer and the like, and the motion recognition model is output as a video marked with motion types;

step two, establishing an Action-CAM network

is the (i, j) th point pixel value on the n-th layer characteristic diagram,

obtaining the hot spot diagram of the n layer for the weight corresponding to the k layer characteristic diagram of the n layer network in the action recognition model

Comprises the following steps:

wherein

Is weighted by pixel

And pixel gradient are weighted multiplied by the activation value of nrellu to obtain:

where NReLU expression is equation (3), the function image is shown in fig. 2:

NReLU＝f(x)＝min(x，0) (3)

because:

Partial derivative, obtaining

Expression:

wherein (a, b) is a different pixel point in the same characteristic diagram than (i, j), and is used for action recognitionWhen the model is activated by Softmax, the confidence score Y of the final target action ^c Comprises the following steps:

wherein S ^c Score, S, representing Softmax input layer target action ^k A score representing a class k action of the Softmax input layer;

due to the chain rule:

and Y ^c About

Higher order partial derivatives:

first derivative:

second derivative:

third derivative:

at this time, it is obtained

Hot spot map for each layer

After the scales are unified, accumulating to obtain a final hotspot graph;

In the Action-CAM network insertion step I in the step II, the insertion mode is that an Action-CAM network is added behind each pooling layer in the network of the Action recognition model to obtain an error Action positioning model, and a final hot point diagram obtained by the Action-CAM network in the step II is merged with each frame of the input Action video;

step four, testing the error action

And (4) shooting the wrong action as a video, and inputting the video into the wrong action positioning model obtained in the third step, wherein the video output by the model comprises the original video and a hot spot area, and the hot spot area is the part of the wrong action.