CN111753795A - Action recognition method and device, electronic equipment and storage medium - Google Patents

Action recognition method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111753795A
CN111753795A CN202010623952.4A CN202010623952A CN111753795A CN 111753795 A CN111753795 A CN 111753795A CN 202010623952 A CN202010623952 A CN 202010623952A CN 111753795 A CN111753795 A CN 111753795A
Authority
CN
China
Prior art keywords
target
images
image group
action
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010623952.4A
Other languages
Chinese (zh)
Inventor
刘思阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing IQIYI Science and Technology Co Ltd
Original Assignee
Beijing IQIYI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing IQIYI Science and Technology Co Ltd filed Critical Beijing IQIYI Science and Technology Co Ltd
Priority to CN202010623952.4A priority Critical patent/CN111753795A/en
Publication of CN111753795A publication Critical patent/CN111753795A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a method and a device for recognizing actions, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of continuous infrared images; the infrared image is an image which is shot by an infrared camera and contains a designated part of a target object, and the target object is provided with a plurality of light capturing balls, wherein each light capturing ball corresponds to one designated part of the target object; determining a target image group containing multiple frames of target images from multiple frames of continuous infrared images; and inputting the multi-frame target images in the target image group into a pre-trained motion recognition model to obtain the motion types of the target objects corresponding to the target image group. By adopting the method provided by the embodiment of the invention, the processing of the action recognition of the target object is simplified, the requirement of the use scene of the action recognition is reduced, and meanwhile, the higher action recognition precision can be achieved under the use scene with lower requirement.

Description

Action recognition method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer vision technologies, and in particular, to a method and an apparatus for motion recognition, an electronic device, and a storage medium.
Background
At present, there are many techniques for recognizing human body actions. For example: the motion of the target object in the video picture is recognized only by an image processing technique or by an optical capturing technique.
At present, the method of recognizing the motion of a target object by using a light capture technology is mainly applied to movie and television production. The identification process needs to be completed in a professional studio, as shown in fig. 1:
a plurality of infrared cameras 103 are installed at various positions in the studio, and the actor wears a special light-catching suit 101 on which a plurality of light-catching balls 102 having a strong reflection power are disposed. During shooting, the infrared camera 103 emits infrared light and receives the infrared light reflected by the light-capturing ball 102, and infrared video images in different directions are shot. After the infrared video images in different directions are obtained, the spatial position of the light capture ball 102 is calculated through image processing technologies such as image fusion, and the movement of the actor is further identified. However, this method is not only costly but also requires processing of multiple video images, and the algorithm is complex and requires high requirements for use scenes.
The motion of the target object in the video image is recognized by an image processing technique, and the motion of the target object included in the visible light image, for example, an RGB (red, green, blue) image, is mainly recognized based on the image processing technique. However, since the visible light image is greatly affected by the environment, the image quality is unstable, and the accuracy of motion recognition is easily affected. For example, when motion recognition is performed on a target object included in a strongly exposed visible light image, the accuracy of the recognition result is lowered due to low image quality.
Disclosure of Invention
An object of embodiments of the present invention is to provide a motion recognition method, a motion recognition apparatus, an electronic device, and a storage medium, so as to improve the accuracy of motion recognition while simplifying the process of motion recognition.
In order to achieve the above object, an embodiment of the present invention provides a motion recognition method, including:
acquiring a plurality of continuous infrared images; the infrared image is an image which is shot by an infrared camera and contains a designated part of a target object, and the target object is provided with a plurality of light capturing balls, wherein each light capturing ball corresponds to one designated part of the target object;
determining a target image group containing multiple frames of target images from multiple frames of continuous infrared images;
inputting multi-frame target images in the target image group into a pre-trained motion recognition model to obtain motion types of the target object corresponding to the target image group; wherein the motion recognition model is obtained by training based on a training sample set, and the training sample set comprises: the motion type of the sample object corresponding to each sample image group comprises a plurality of sample image groups, each sample image group comprises a plurality of frames of sample images, and the sample images in the sample image groups are images containing the designated parts of the sample objects.
Further, the determining a target image group including a plurality of frames of target images from a plurality of frames of continuous infrared images includes:
selecting a frame of infrared image from a plurality of continuous infrared images at preset frame number intervals as a target image to obtain a target image group consisting of a plurality of target images.
Further, the pre-trained motion recognition model includes: the system comprises a feature extraction network layer, a difference feature calculation layer, a feature splicing layer, an action classification network layer and an output layer;
the step of inputting the multi-frame target images in the target image group into a pre-trained motion recognition model to obtain the motion types of the target objects corresponding to the target image group includes:
inputting multi-frame target images in the target image group into a feature extraction network layer of a pre-trained action recognition model;
the feature extraction network layer is used for respectively extracting the light capture features of the multi-frame target image to obtain a plurality of light capture feature information;
the difference characteristic calculation layer calculates the difference value of the light capture characteristics of two adjacent target images in the target image group according to the light capture characteristic information to obtain a plurality of difference characteristic information;
the characteristic splicing layer splices the light capture characteristic information and the difference characteristic information to obtain splicing characteristic information;
the action classification network layer determines the probability that the action corresponding to the splicing characteristic information belongs to each preset action type;
and the output layer outputs the action type with the maximum probability as the action type of the target object corresponding to the target image group.
Further, the feature extraction network layer is:
a visual geometry group network VGG, or a residual neural network ResNet, or a lightweight deep neural network MobileNet.
Further, the action classification network layer in the action recognition model comprises: a preset number of fully-connected layers; wherein the input feature dimension of a first fully-connected layer of the action classification network layers is sx (2N-1); the output characteristic dimension of the last full-connection layer of the action classification network layer is 1 multiplied by n; n represents the number of target images, and N represents the number of action types;
the output layer in the motion recognition model comprises: softmax layer.
Further, the motion recognition model is obtained by training based on a training sample set by adopting the following steps:
collecting the training samples, inputting multi-frame sample images of the sample image group into a neural network model to be trained, and obtaining the action types of sample objects corresponding to the sample image group as output results;
adjusting parameters of the current neural network model to be trained based on the output result to obtain a new neural network model to be trained, completing one iteration, returning to the step of collecting the training samples, and inputting multi-frame sample images of the sample image group into the neural network model to be trained;
and when the iteration times reach the preset iteration times or the loss function value of the current neural network model to be trained is smaller than the preset loss function threshold value, ending the training, and determining the current neural network model to be trained as the motion recognition model.
Further, the feature extraction network layer of the neural network model to be trained is a predetermined image feature extraction network layer;
the adjusting the parameters of the current neural network model to be trained based on the output result comprises:
and adjusting the parameters of the action classification network layer of the current neural network model to be trained based on the output result.
Further, the action type of the target object includes: kicking, lifting hands, running, walking, pushing, pulling, jumping, and nonsense movements.
In order to achieve the above object, an embodiment of the present invention further provides a motion recognition apparatus, including:
the infrared image acquisition module is used for acquiring a plurality of continuous infrared images; the infrared image is an image which is shot by an infrared camera and contains a designated part of a target object, and the target object is provided with a plurality of light capturing balls, wherein each light capturing ball corresponds to one designated part of the target object;
the image group determining module is used for determining a target image group containing a plurality of frames of target images from a plurality of frames of continuous infrared images;
the action recognition module is used for inputting the multi-frame target images in the target image group into a pre-trained action recognition model to obtain the action types of the target objects corresponding to the target image group; wherein the motion recognition model is obtained by training based on a training sample set, and the training sample set comprises: the motion type of the sample object corresponding to each sample image group comprises a plurality of sample image groups, each sample image group comprises a plurality of frames of sample images, and the sample images in the sample image groups are images containing the designated parts of the sample objects.
Further, the image group determining module is specifically configured to select one infrared image frame at every preset frame number from multiple continuous infrared images as a target image, and obtain a target image group composed of multiple target images.
Further, the pre-trained motion recognition model includes: the system comprises a feature extraction network layer, a difference feature calculation layer, a feature splicing layer, an action classification network layer and an output layer;
the action recognition module is specifically used for inputting the multi-frame target images in the target image group into a feature extraction network layer of a pre-trained action recognition model; the feature extraction network layer is used for respectively extracting the light capture features of the multi-frame target image to obtain a plurality of light capture feature information; the difference characteristic calculation layer calculates the difference value of the light capture characteristics of two adjacent target images in the target image group according to the light capture characteristic information to obtain a plurality of difference characteristic information; the characteristic splicing layer splices the light capture characteristic information and the difference characteristic information to obtain splicing characteristic information; the action classification network layer determines the probability that the action corresponding to the splicing characteristic information belongs to each preset action type; and the output layer outputs the action type with the maximum probability as the action type of the target object corresponding to the target image group.
Further, the feature extraction network layer is:
a visual geometry group network VGG, or a residual neural network ResNet, or a lightweight deep neural network MobileNet.
Further, the action classification network layer in the action recognition model comprises: a preset number of fully-connected layers; wherein the input feature dimension of a first fully-connected layer of the action classification network layers is sx (2N-1); the output characteristic dimension of the last full-connection layer of the action classification network layer is 1 multiplied by n; n represents the number of target images, N represents the number of motion types, and s represents the dimension of light capture characteristic information;
the output layer in the motion recognition model comprises: softmax layer.
Further, the apparatus further includes: a model training module;
the model training module is used for training based on a training sample set to obtain the action recognition model by adopting the following steps:
collecting the training samples, inputting multi-frame sample images of the sample image group into a neural network model to be trained, and obtaining the action types of sample objects corresponding to the sample image group as output results;
adjusting parameters of the current neural network model to be trained based on the output result to obtain a new neural network model to be trained, completing one iteration, returning to the step of collecting the training samples, and inputting multi-frame sample images of the sample image group into the neural network model to be trained;
and when the iteration times reach the preset iteration times or the loss function value of the current neural network model to be trained is smaller than the preset loss function threshold value, ending the training, and determining the current neural network model to be trained as the motion recognition model.
Further, the feature extraction network layer of the neural network model to be trained is a predetermined image feature extraction network layer;
and the model training module adjusts the parameters of the action classification network layer of the current neural network model to be trained based on the output result.
Further, the action type of the target object includes: kicking, lifting hands, running, walking, pushing, pulling, jumping, and nonsense movements.
In order to achieve the above object, an embodiment of the present invention provides an electronic device, which includes a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface are configured to complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any one of the steps of the action recognition method when executing the program stored in the memory.
In order to achieve the above object, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above steps of the motion recognition method.
In order to achieve the above object, an embodiment of the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to perform any of the steps of the motion recognition method described above.
The embodiment of the invention has the following beneficial effects:
by adopting the method provided by the embodiment of the invention, a plurality of optical capture balls are deployed on the target object, only one infrared camera is needed to shoot a plurality of continuous infrared images aiming at the target object, a target image group containing a plurality of target images is obtained, and then the action of the target object in the plurality of target images is identified through a pre-trained action identification model, so that the action type of the target object is determined. Therefore, compared with the existing action recognition method, the method provided by the embodiment of the invention simplifies the action recognition processing of the target object, and reduces the requirements of the use scene of the action recognition, namely the target object does not need to wear specific light-catching clothes and use scenes with higher requirements, and only needs to stick a plurality of light-catching balls on the target object, acquire the image aiming at the target object by one infrared camera, and then process the acquired image to realize the action recognition of the target object. Meanwhile, the method provided by the embodiment of the invention uses the light capture technology, so that the method can achieve higher action recognition precision under the use scene with lower requirements.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a diagram illustrating a professional optical capture data acquisition in the prior art;
FIG. 2 is a flowchart of a method for recognizing actions according to an embodiment of the present invention;
FIG. 3 is another flow chart of a method for recognizing actions according to an embodiment of the present invention;
fig. 4a is a schematic diagram of a target object with an optical capture ball deployed in the motion recognition method according to the embodiment of the present invention;
fig. 4b is a schematic diagram of a target object with light-trapping balls deployed and an infrared image collected for the target object with light-trapping balls deployed according to an embodiment of the present invention;
FIG. 5a is a schematic structural diagram of a motion recognition model according to an embodiment of the present invention;
FIG. 5b is a diagram of a target image processed by the motion recognition model according to the embodiment of the present invention;
fig. 5c is a schematic structural diagram of a motion classification network layer in the motion recognition model according to the embodiment of the present invention;
FIG. 6 is a flowchart of training a motion recognition model according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an action recognition device according to an embodiment of the present invention;
fig. 8 is another schematic structural diagram of the motion recognition device according to the embodiment of the present invention;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
Because the existing action recognition method has complex algorithm and is difficult to be applied to other fields, in order to simplify the action recognition processing and expand the application scenario of action recognition, the embodiment of the invention provides an action recognition method, as shown in fig. 2, comprising:
step 201, acquiring multiple continuous infrared images; the infrared image is an image including a designated portion of a target object photographed by one infrared camera, and the target object is disposed with a plurality of light-capturing balls, wherein each light-capturing ball corresponds to one designated portion of the target object. Wherein, the target object may be: humans and animals, and the like. The light-trapping spheres deployed for the target object may be light-trapping spheres.
Step 202, determining a target image group containing multi-frame target images from multi-frame continuous infrared images.
Step 203, inputting multi-frame target images in the target image group into a pre-trained motion recognition model to obtain motion types of target objects corresponding to the target image group; wherein, the action recognition model is obtained for training based on the training sample set, and the training sample set contains: the motion type of the sample object corresponding to a plurality of sample image groups and each sample image group, wherein each sample image group comprises a plurality of frames of sample images, and the sample images in the sample image groups are images containing the designated parts of the sample objects.
By adopting the method provided by the embodiment of the invention, a plurality of optical capture balls are deployed on the target object, only one infrared camera is needed to shoot a plurality of continuous infrared images aiming at the target object, a target image group containing a plurality of target images is obtained, and then the action of the target object in the plurality of target images is identified through a pre-trained action identification model, so that the action type of the target object is determined. Therefore, compared with the existing action recognition method, the method provided by the embodiment of the invention simplifies the action recognition processing of the target object, and reduces the requirements of the use scene of the action recognition, namely the target object does not need to wear specific light-catching clothes and use scenes with higher requirements, and only needs to stick a plurality of light-catching balls on the target object, acquire the image aiming at the target object by one infrared camera, and then process the acquired image to realize the action recognition of the target object. Meanwhile, the method provided by the embodiment of the invention uses the light capture technology, so that the method can achieve higher action recognition precision under the use scene with lower requirements.
The following describes in detail the motion recognition method and apparatus provided in the embodiments of the present invention with specific embodiments.
In an embodiment of the present application, as shown in fig. 3, another flow of the motion recognition method includes the following steps:
step 301, acquiring multiple continuous infrared images.
In this step, a plurality of light-capturing balls may be disposed at a designated portion of the target object, each light-capturing ball corresponding to a designated portion of the target object. The light-catching ball may be a light-reflecting light-catching ball, and the target object may be a person, an animal, or the like. If the target object is a person, the designated portion of the target object may include: wrist, elbow, ankle, knee, foot, shoulder, etc. A plurality of light-catching balls may be attached to a plurality of designated portions of the target object. Referring specifically to fig. 4a, designated parts of the target object 401, such as the wrist, elbow, ankle, knee, foot, and shoulder, are all deployed with a light-trapping ball 402. Specifically, referring to fig. 4a, the light-catching ball 402 may be sequentially attached to each designated portion of the target object 401 in the order of the arrow with respect to the target object 401. For example, one light-trapping ball 402 may be pasted on the pelvis portion of the target object 401, and one light-trapping ball 402 may be pasted on each of the "spine 1", "spine 2", "spine 3", "neck", "head", "left clavicle", "left shoulder", "left elbow", "left wrist", "left hand", "right clavicle", "right shoulder", "right elbow", "right wrist", "right hand", and the like in the upward direction of the arrow in fig. 4a, with the pelvis portion as the starting portion; with the pelvis region as the starting region, a light-catching ball 402 is stuck to each of the regions "left hip", "left knee", "left ankle", "left foot", "right hip", "right knee", "right ankle", and "right foot" in the downward direction in the arrow direction in fig. 4 a. Finally, 24 light-trapping balls 402 may be stuck at the above-mentioned 24 body parts of the target object 401.
An infrared camera, such as a Kineck DK camera, may be used to take a picture of a target object with a plurality of light trapping balls deployed therein, to obtain a plurality of consecutive frames of images including a designated portion of the target object. As shown in fig. 4b, the target object 403 has the optical ball capture 402 disposed on the left wrist, the right wrist, the left elbow, and the right elbow, and an infrared image 410 can be acquired by using an infrared camera for the target object 403 with the optical ball capture disposed thereon.
Step 302, selecting a frame of infrared image from the multiple continuous frames of infrared images at intervals of a preset number of frames as a target image, and obtaining a target image group consisting of multiple frames of target images.
The preset frame number can be specifically set according to practical application, for example, the preset frame number can be set to 2, 3 or 4, and the like.
For example, if 10 consecutive infrared images are acquired and the preset number of frames is 2, starting from the step of determining the 1 st infrared image as the target image, every 2 infrared images are determined to determine one infrared image as the target image, and finally, the 1 st, 3 rd, 5 th, 7 th and 9 th frames of the 10 consecutive infrared images are determined as the target images. And, the determined 5 frames of target images can be taken as a target image group.
In the embodiment of the invention, all the multiple continuous infrared images acquired by the infrared camera can be used as target images.
And step 303, inputting a plurality of frames of target images in the target image group into a feature extraction network layer of a pre-trained motion recognition model.
In the embodiment of the present invention, referring to fig. 5a, the pre-trained motion recognition model includes: the system comprises a feature extraction network layer, a difference feature calculation layer, a feature splicing layer, an action classification network layer and an output layer.
Wherein, the feature extraction network layer may be: VGG (visual geometry group network), or ResNet (residual neural network), or MobileNet (lightweight deep neural network).
And step 304, the feature extraction network layer respectively extracts the light capture features of the multiple frames of target images to obtain multiple pieces of light capture feature information.
In the embodiment of the invention, after a plurality of frames of target images in the target image group are input into the feature extraction network layer of the pre-trained motion recognition model, the feature extraction network layer can be used for extracting the features of each frame of target image to be used as the light capture feature information. And, the extracted light capture characteristic information is a multi-dimensional vector.
For example, if the target image group includes 5 target images, and the 5 target images are respectively the t-2 frame infrared image, the t-1 frame infrared image, the t +1 frame infrared image and the t +2 frame infrared image in the multiple continuous infrared images collected by the infrared camera, the w × h × 1 dimensional pixel matrix of each target image can be input into the feature extraction network layer, and the light capture features of the 5 target images are respectively It-2、It-1、It、It+1And It+2. And the extracted light capture features are all preset vectors with s dimension, and s is not equal to 0. Wherein w is the number of horizontal pixel points of the pixel matrix of the target image, and w is the number of vertical pixel points of the pixel matrix of the target image.
And 305, the difference characteristic calculation layer calculates the difference value of the light capture characteristics of two adjacent target images in the target image group according to the light capture characteristic information to obtain a plurality of difference characteristic information.
In this step, if the number of the extracted light capture features is N, N-1 difference feature information can be calculated.
For example, referring to fig. 5b, if the target image group includes 5 target images, the 5 target images are respectively in the plurality of consecutive infrared images collected by the infrared camera: the method comprises the following steps of (1) respectively obtaining a t-2 th frame infrared image, a t-1 th frame infrared image, a t +1 th frame infrared image and a t +2 th frame infrared image, wherein light capture characteristic information of 5 frames of target images extracted by a characteristic extraction network layer is as follows: i ist-2、It-1、It、It+1And It+2The difference feature calculation layer may respectively calculate difference feature information according to the extracted light capture feature information:
Mt=It-It-1
Mt-1=It-1-It-2
Mt+1=It+1-It
Mt+2=It+2-It+1
according to the 5 pieces of light capture characteristic information, 4 pieces of difference characteristic information can be calculated: mt、Mt-1、Mt+1And Mt+2. Wherein, the difference feature information is also a vector of s dimension.
And step 306, the characteristic splicing layer splices the plurality of light capture characteristic information and the plurality of difference characteristic information to obtain splicing characteristic information.
In this step, the plurality of pieces of light capture characteristic information and the plurality of pieces of difference characteristic information may be spliced into one piece of splicing characteristic information. If there are N pieces of light capture characteristic information and N-1 pieces of difference characteristic information, and the light capture characteristic information and the difference characteristic information are both vectors of s dimension, then s × (2N-1) dimension splicing characteristic information can be obtained.
For example, referring to fig. 5b, if there are 5 pieces of light capture characteristic information: i ist-2、It-1、It、It+1And It+24 pieces of difference feature information: mt、Mt-1、Mt+1And Mt+2The 5 pieces of light capture characteristic information and the 4 pieces of difference characteristic information can be superposed to obtain splicing characteristic information of s × 9 dimensions.
And 307, the action classification network layer determines the probability that the action corresponding to the splicing characteristic information belongs to each preset action type.
In the embodiment of the present invention, the action classification network layer in the action recognition model may include a preset number of full connection layers. On the premise of ensuring that the output dimension of the last full connection layer of the action classification network layer is 1 × n, the preset number may be specifically determined according to the actual application situation, for example, the action classification network layer may set 6 full connection layers or 10 full connection layers, and the like. n represents the number of action types of the target object.
For example, referring to fig. 5c, if the action classification network layer in the action recognition model includes 6 fully connected layers:
first fully-connected layer: the input characteristic dimension is s (2N-1), the number of neurons is (2N-1) s, and the output characteristic dimension is 1 (2N-1) s; n represents the number of target images; n represents the number of target images, and N also represents the number of light capture characteristic information;
second full connection layer: the input characteristic dimension is 1 x (2N-1) s, the number of neurons is 2(2N-1) s, and the output characteristic dimension is 1 x 2(2N-1) s;
a third fully-connected layer: the input characteristic dimension is 1 multiplied by 2(2N-1) s, the number of neurons is 2(2N-1) s, the output characteristic dimension is 1 multiplied by 2N, and N represents the number of action types;
fourth full connection layer: the input characteristic dimension is 1 x 2n, the number of neurons is 2n, and the output characteristic dimension is 1 x 2 n;
a fifth fully-connected layer: the input characteristic dimension is 1 multiplied by 2n, the number of neurons is 2n, and the output characteristic dimension is 1 multiplied by n;
sixth full connection layer: the input characteristic dimension is n, the number of neurons is n, and the output characteristic dimension is 1 × n.
For example, referring to FIG. 5b, the characteristic dimension of the sixth fully-connected layer output is 1 × n, which indicates the probability value corresponding to each of the n preset action types of the target object1,p2,…,pn-1,pn]Wherein p is1、p2、…、pn-1And pnN preset action types of the target object respectively, wherein the probability value corresponding to each preset action type is p1+p2+…+pn-1+pn=1。
And step 308, outputting the action type with the maximum probability as the action type of the target object corresponding to the target image group by the output layer.
The output layer in the action recognition model comprises a softmax layer and an output characteristic dimension of 1 × n, wherein the input characteristic dimension of the softmax layer is 1 × n, the input of the softmax layer is the output of the last full connection layer of the action classification network layer in the action recognition model, namely the probability value corresponding to each preset action type in n preset action types representing the target object, for example, the probability value is referred toFig. 5b, the sixth fully-connected layer output is characterized by: [ p ]1,p2,…,pn-1,pn]。
Based on the probability value corresponding to each preset action type in the n preset action types of the input target object, the softmax layer may generate a classification vector that retains the maximum probability value and sets other probability values to 0.
For example, the input softmax layer is characterized by: [ p ]1,p2,…,pn-1,pn]Wherein p is1、p2、…、pn-1And pnAnd respectively obtaining a probability value corresponding to each preset action type in the n preset action types of the target object. If p is1Is p1、p2、…、pn-1And pnThe softmax layer may generate a classification vector [1, 0, …, 0 ] with the reserved maximum probability value set to 1 and the other probability values set to 0]。
Based on the classification vector output by the softmax layer, the preset action type corresponding to the classification vector can be used as the action type of the target object corresponding to the target image group.
The preset action type corresponding to the classification vector is the action type of the preset target object corresponding to the maximum probability value. For example, the classification vector [1, 0, …, 0]The action type of the corresponding preset target object is the maximum probability value p1And the action type of the corresponding preset target object.
Referring to fig. 5b, in the n preset action types of the target object, a probability value corresponding to each preset action type is obtained: [ p ]1,p2,…,pn-1,pn]Then, the output layer may output, as the action type of the identified target object, the preset action type with the highest probability value among the probability values of the preset action types, that is, max { p [ ]1,p2,...,pnOutputting the corresponding preset action type as the action type of the identified target object.
In the embodiment of the present invention, the preset action type of the target object includes: kicking, lifting hands, running, walking, pushing, pulling, jumping, and nonsense movements, among others.
For example, if the action types of the target object include 8 types: kicking, lifting hands, running, walking, pushing, pulling, jumping and meaningless movements, wherein n is 8; if the output of the softmax layer is characterized by: [0.1, 0.1, 0.01, 0.03, 0.04, 0.05, 0.6, 0.06 ]; wherein, the probability corresponding to the kicking action is 0.1, the probability corresponding to the lifting action is 0.1, the probability corresponding to the running action is 0.01, the probability corresponding to the walking action is 0.03, the probability corresponding to the pushing action is 0.04, the probability corresponding to the pulling action is 0.05, the probability corresponding to the jumping action is 0.6, and the probability corresponding to the meaningless action is 0.04. The output layer can output the action type with the highest probability: and jumping motion as the motion type of the target object corresponding to the target image group.
By adopting the method provided by the embodiment of the invention, a plurality of optical capture balls are deployed on the target object, only one infrared camera is needed to shoot a plurality of continuous infrared images aiming at the target object, a target image group containing a plurality of target images is obtained, then the optical capture characteristic information is respectively extracted from the plurality of target images in the target image group through a pre-trained action recognition model, the difference characteristic information is obtained through calculation according to the extracted optical capture characteristic information, and then the optical capture characteristic information and the difference characteristic information are spliced to obtain the spliced characteristic information. And then determining the probability that the action corresponding to the splicing characteristic information belongs to each preset action type, and determining the action type with the maximum probability as the action type of the target object. Compared with the existing action recognition method, the method provided by the embodiment of the invention has the advantages that the action is recognized by extracting the light capture characteristic information of continuous multi-frame target images and combining the difference characteristic information, so that on one hand, the action recognition processing of the target object is simplified, the requirement on the use scene of the action recognition is reduced, and on the other hand, the action recognition accuracy of the target object is improved. In addition, a professional studio is not needed, a target object does not need to wear specific light-catching clothes, only a plurality of light-catching balls are needed to be pasted on the target object, the action recognition of the target object can be realized through one infrared camera, and the use scene of the action recognition is expanded.
In the embodiment of the present invention, referring to fig. 6, a process for training a motion recognition model includes:
step 601, collecting training samples, inputting multi-frame sample images of a sample image group into a neural network model to be trained, and obtaining action types of sample objects corresponding to the sample image group as output results.
The training sample set comprises a plurality of sample image groups. The multi-frame sample image of each sample image group is an infrared image collected by one infrared camera, and a preset number of frames are arranged between every two adjacent frame sample images in each sample image group.
The neural network model to be trained comprises: the system comprises a feature extraction network layer, a difference feature calculation layer, a feature splicing layer, an action classification network layer and an output layer. Wherein the feature extraction network layer may use a predetermined image feature extraction network layer, such as a VGG network, or a ResNet network, or a MobileNet network.
In the step, after the sample images are input into the neural network model to be trained, the feature extraction network layer can extract the light capture feature information of each sample image; the difference characteristic calculation layer calculates the difference value of the light capture characteristic information of two adjacent frames of sample images in the same sample image group according to the light capture characteristic information of the extracted sample images to obtain difference characteristic information; the characteristic splicing layer splices the extracted light capture characteristic information and the difference characteristic information of the sample image to obtain splicing characteristic information of the sample image group; the action classification network layer is used for determining the probability that the action corresponding to the splicing characteristic information of the sample image group belongs to each preset action type; and the output layer outputs the action type with the highest probability as the action type of the sample object corresponding to the sample image group as an output result.
And step 602, adjusting parameters of the current neural network model to be trained based on the output result to obtain a new neural network model to be trained, completing one iteration, and returning to the step of inputting the multi-frame sample images of the sample image group in the training sample set into the neural network model to be trained.
In the embodiment of the invention, the parameters of the action classification network layer of the current neural network model to be trained are adjusted based on the output result.
Step 603, when the iteration number reaches a preset iteration number, or if the loss function value of the current neural network model to be trained is smaller than a preset loss function threshold value, ending the training, and determining the current neural network model to be trained as the motion recognition model.
The preset iteration times and the preset loss function threshold value can be set according to the actual training condition, wherein the preset iteration times meet the following setting requirements: after the iteration of the preset times, the current neural network model to be trained is converged; the preset loss function threshold is set to satisfy the following conditions: and if the loss function value of the current neural network model to be trained is smaller than the preset loss function threshold value, the current neural network model to be trained is converged.
Based on the same inventive concept, according to the motion recognition method provided in the above embodiment of the present invention, correspondingly, another embodiment of the present invention further provides a motion recognition apparatus, a schematic structural diagram of which is shown in fig. 7, specifically including:
an infrared image acquisition module 701, configured to acquire multiple frames of continuous infrared images; the infrared image is an image including a designated portion of a target object photographed by one infrared camera, and the target object is disposed with a plurality of light-capturing balls, wherein each light-capturing ball corresponds to one designated portion of the target object.
An image group determining module 702, configured to determine a target image group including multiple frames of target images from multiple frames of consecutive infrared images.
The action recognition module 703 is configured to input a plurality of frames of target images in the target image group into a pre-trained action recognition model, so as to obtain an action type of a target character object corresponding to the target image group; wherein, the action recognition model is obtained for training based on the training sample set, and the training sample set contains: the motion type of the sample object corresponding to a plurality of sample image groups and each sample image group, wherein each sample image group comprises a plurality of frames of sample images, and the sample images in the sample image groups are images containing the designated parts of the sample objects.
Therefore, by adopting the motion recognition device provided by the embodiment of the invention, a plurality of optical capture balls are deployed on the target object, only one infrared camera is required to shoot a plurality of continuous infrared images aiming at the target object, a target image group containing a plurality of target images is obtained, and then the motion of the target object in the plurality of target images is recognized through a pre-trained motion recognition model, so that the motion type of the target object is determined. Therefore, compared with the existing action recognition method, the method provided by the embodiment of the invention simplifies the action recognition processing of the target object, and reduces the requirements of the use scene of the action recognition, namely the target object does not need to wear specific light-catching clothes and use scenes with higher requirements, and only needs to stick a plurality of light-catching balls on the target object, acquire the image aiming at the target object by one infrared camera, and then process the acquired image to realize the action recognition of the target object. Meanwhile, the method provided by the embodiment of the invention uses the light capture technology, so that the method can achieve higher action recognition precision under the use scene with lower requirements.
Further, the image group determining module 702 is specifically configured to select one frame of infrared image from multiple frames of continuous infrared images at preset frame intervals, and use the selected frame of infrared image as a target image to obtain a target image group consisting of multiple frames of target images.
Further, the pre-trained motion recognition model comprises: the system comprises a feature extraction network layer, a difference feature calculation layer, a feature splicing layer, an action classification network layer and an output layer;
the action recognition module 703 is specifically configured to input a plurality of frames of target images in the target image group into a feature extraction network layer of a pre-trained action recognition model; respectively extracting the light capture characteristics of multiple frames of target images to obtain a plurality of light capture characteristic information, wherein the light capture characteristic information is an s-dimensional vector; the difference characteristic calculation layer is used for calculating the difference value of the light capture characteristics of two adjacent target images in the target image group according to the light capture characteristic information to obtain a plurality of difference characteristic information; the characteristic splicing layer splices the multiple pieces of light capture characteristic information and the multiple pieces of difference characteristic information to obtain spliced characteristic information; the action classification network layer determines the probability that the action corresponding to the splicing characteristic information belongs to each preset action type; and the output layer outputs the action type with the maximum probability as the action type of the target object corresponding to the target image group.
Further, the feature extraction network layer is as follows: VGG, or ResNet, or MobileNet.
Further, the action classification network layer in the action recognition model comprises: a preset number of fully-connected layers; wherein the input feature dimension of a first fully-connected layer of the action classification network layers is sx (2N-1); the output characteristic dimension of the last full-connection layer of the action classification network layer is 1 multiplied by n; n represents the number of target images;
the output layer in the action recognition model comprises: softmax layer.
Further, the action classification network layer in the action recognition model comprises: first to sixth fully-connected layers;
first fully-connected layer: the input characteristic dimension is s (2N-1), the number of neurons is (2N-1) s, and the output characteristic dimension is 1 (2N-1) s;
second full connection layer: the input characteristic dimension is 1 x (2N-1) s, the number of neurons is 2(2N-1) s, and the output characteristic dimension is 1 x 2(2N-1) s;
a third fully-connected layer: the input characteristic dimension is 1 multiplied by 2(2N-1) s, the number of neurons is 2(2N-1) s, the output characteristic dimension is 1 multiplied by 2N, and N represents the number of action types;
fourth full connection layer: the input characteristic dimension is 1 x 2n, the number of neurons is 2n, and the output characteristic dimension is 1 x 2 n;
a fifth fully-connected layer: the input characteristic dimension is 1 multiplied by 2n, the number of neurons is 2n, and the output characteristic dimension is 1 multiplied by n;
sixth full connection layer: the input characteristic dimension is n, the number of neurons is n, and the output characteristic dimension is 1 multiplied by n;
the output layer in a motion recognition model, comprising: a softmax layer;
and in the softmax layer, the input characteristic dimension is 1 multiplied by n, and the output characteristic dimension is 1 multiplied by n.
Further, referring to fig. 8, the motion recognition apparatus further includes: a model training module 801;
a model training module 801, configured to obtain an action recognition model based on training sample set training by using the following steps:
collecting training samples, inputting multi-frame sample images of a sample image group into a neural network model to be trained, and obtaining action types of sample objects corresponding to the sample image group as output results;
adjusting parameters of the current neural network model to be trained based on the output result to obtain a new neural network model to be trained, completing one iteration, and returning to the step of collecting the training samples and inputting multi-frame sample images of the sample image group into the neural network model to be trained;
and when the iteration times reach the preset iteration times or the loss function value of the current neural network model to be trained is smaller than the preset loss function threshold value, ending the training, and determining the current neural network model to be trained as the motion recognition model.
Further, a feature extraction network layer of the neural network model to be trained is a predetermined image feature extraction network layer;
the model training module 801 adjusts parameters of the action classification network layer of the current neural network model to be trained based on the output result.
Further, the action type of the target object includes: kicking, lifting hands, running, walking, pushing, pulling, jumping, and nonsense movements.
Therefore, by adopting the device provided by the embodiment of the invention, a plurality of optical capture balls are deployed on the target object, only one infrared camera is required to shoot a plurality of continuous infrared images aiming at the target object, a target image group containing a plurality of target images is obtained, then the optical capture characteristic information is respectively extracted from the plurality of target images in the target image group through a pre-trained action recognition model, the difference characteristic information is obtained through calculation according to the extracted optical capture characteristic information, and then the optical capture characteristic information and the difference characteristic information are spliced to obtain the spliced characteristic information. And then determining the probability that the action corresponding to the splicing characteristic information belongs to each preset action type, and determining the action type with the maximum probability as the action type of the target object. Compared with the existing action recognition method, the method provided by the embodiment of the invention has the advantages that the action is recognized by extracting the light capture characteristic information of continuous multi-frame target images and combining the difference characteristic information, so that on one hand, the action recognition processing of the target object is simplified, the requirement on the use scene of the action recognition is reduced, and on the other hand, the action recognition accuracy of the target object is improved. In addition, a professional studio is not needed, a target object does not need to wear specific light-catching clothes, only a plurality of light-catching balls are needed to be pasted on the target object, the action recognition of the target object can be realized through one infrared camera, and the use scene of the action recognition is expanded.
An embodiment of the present invention further provides an electronic device, as shown in fig. 9, which includes a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904,
a memory 903 for storing computer programs;
the processor 901 is configured to implement the following steps when executing the program stored in the memory 903:
acquiring a plurality of continuous infrared images; the infrared image is an image which is shot by an infrared camera and contains a designated part of a target object, and the target object is provided with a plurality of light capturing balls, wherein each light capturing ball corresponds to one designated part of the target object;
determining a target image group containing multiple frames of target images from multiple frames of continuous infrared images;
inputting multi-frame target images in the target image group into a pre-trained motion recognition model to obtain motion types of the target object corresponding to the target image group; wherein the motion recognition model is obtained by training based on a training sample set, and the training sample set comprises: the motion type of the sample object corresponding to each sample image group comprises a plurality of sample image groups, each sample image group comprises a plurality of frames of sample images, and the sample images in the sample image groups are images containing the designated parts of the sample objects.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program realizes the steps of any one of the above-mentioned action recognition methods when being executed by a processor.
In yet another embodiment, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform any of the above-described method for motion recognition.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, the electronic device and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A motion recognition method, comprising:
acquiring a plurality of continuous infrared images; the infrared image is an image which is shot by an infrared camera and contains a designated part of a target object, and the target object is provided with a plurality of light capturing balls, wherein each light capturing ball corresponds to one designated part of the target object;
determining a target image group containing multiple frames of target images from multiple frames of continuous infrared images;
inputting multi-frame target images in the target image group into a pre-trained motion recognition model to obtain motion types of the target object corresponding to the target image group; wherein the motion recognition model is obtained by training based on a training sample set, and the training sample set comprises: the motion type of the sample object corresponding to each sample image group comprises a plurality of sample image groups, each sample image group comprises a plurality of frames of sample images, and the sample images in the sample image groups are images containing the designated parts of the sample objects.
2. The method according to claim 1, wherein the determining a target image group containing a plurality of frames of target images from a plurality of frames of continuous infrared images comprises:
selecting a frame of infrared image from a plurality of continuous infrared images at preset frame number intervals as a target image to obtain a target image group consisting of a plurality of target images.
3. The method of claim 1, wherein the pre-trained motion recognition model comprises: the system comprises a feature extraction network layer, a difference feature calculation layer, a feature splicing layer, an action classification network layer and an output layer;
the step of inputting the multi-frame target images in the target image group into a pre-trained motion recognition model to obtain the motion types of the target objects corresponding to the target image group includes:
inputting multi-frame target images in the target image group into a feature extraction network layer of a pre-trained action recognition model;
the feature extraction network layer is used for respectively extracting the light capture features of the multi-frame target image to obtain a plurality of light capture feature information;
the difference characteristic calculation layer calculates the difference value of the light capture characteristics of two adjacent target images in the target image group according to the light capture characteristic information to obtain a plurality of difference characteristic information;
the characteristic splicing layer splices the light capture characteristic information and the difference characteristic information to obtain splicing characteristic information;
the action classification network layer determines the probability that the action corresponding to the splicing characteristic information belongs to each preset action type;
and the output layer outputs the action type with the maximum probability as the action type of the target object corresponding to the target image group.
4. The method of claim 3, wherein the action classification network layer in the action recognition model comprises: a preset number of fully-connected layers; wherein the input feature dimension of a first fully-connected layer of the action classification network layers is sx (2N-1); the output characteristic dimension of the last full-connection layer of the action classification network layer is 1 multiplied by n; n represents the number of target images, N represents the number of motion types, and s represents the dimension of light capture characteristic information;
the output layer in the motion recognition model comprises: softmax layer.
5. The method of claim 3, wherein the motion recognition model is trained based on a training sample set using the steps of:
collecting the training samples, inputting multi-frame sample images of the sample image group into a neural network model to be trained, and obtaining the action types of sample objects corresponding to the sample image group as output results;
adjusting parameters of the current neural network model to be trained based on the output result to obtain a new neural network model to be trained, completing one iteration, returning to the step of collecting the training samples, and inputting multi-frame sample images of the sample image group into the neural network model to be trained;
and when the iteration times reach the preset iteration times or the loss function value of the current neural network model to be trained is smaller than the preset loss function threshold value, ending the training, and determining the current neural network model to be trained as the motion recognition model.
6. The method of claim 5, wherein the feature extraction network layer of the neural network model to be trained is a predetermined image feature extraction network layer;
the adjusting the parameters of the current neural network model to be trained based on the output result comprises:
and adjusting the parameters of the action classification network layer of the current neural network model to be trained based on the output result.
7. An action recognition device, comprising:
the infrared image acquisition module is used for acquiring a plurality of continuous infrared images; the infrared image is an image which is shot by an infrared camera and contains a designated part of a target object, and the target object is provided with a plurality of light capturing balls, wherein each light capturing ball corresponds to one designated part of the target object;
the image group determining module is used for determining a target image group containing a plurality of frames of target images from a plurality of frames of continuous infrared images;
the action recognition module is used for inputting the multi-frame target images in the target image group into a pre-trained action recognition model to obtain the action types of the target objects corresponding to the target image group; wherein the motion recognition model is obtained by training based on a training sample set, and the training sample set comprises: the motion type of the sample object corresponding to each sample image group comprises a plurality of sample image groups, each sample image group comprises a plurality of frames of sample images, and the sample images in the sample image groups are images containing the designated parts of the sample objects.
8. The apparatus of claim 7, wherein the image group determining module is specifically configured to select one infrared image from multiple consecutive infrared images every preset number of frames as the target image, so as to obtain a target image group consisting of multiple target images.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.
CN202010623952.4A 2020-06-30 2020-06-30 Action recognition method and device, electronic equipment and storage medium Pending CN111753795A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010623952.4A CN111753795A (en) 2020-06-30 2020-06-30 Action recognition method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010623952.4A CN111753795A (en) 2020-06-30 2020-06-30 Action recognition method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111753795A true CN111753795A (en) 2020-10-09

Family

ID=72680354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010623952.4A Pending CN111753795A (en) 2020-06-30 2020-06-30 Action recognition method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111753795A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926427A (en) * 2021-02-18 2021-06-08 浙江智慧视频安防创新中心有限公司 Target user dressing attribute identification method and device
CN116912947A (en) * 2023-08-25 2023-10-20 东莞市触美电子科技有限公司 Intelligent screen, screen control method, device, equipment and storage medium thereof
WO2024002238A1 (en) * 2022-06-30 2024-01-04 影石创新科技股份有限公司 Jump recognition method and apparatus, and electronic device and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926427A (en) * 2021-02-18 2021-06-08 浙江智慧视频安防创新中心有限公司 Target user dressing attribute identification method and device
WO2024002238A1 (en) * 2022-06-30 2024-01-04 影石创新科技股份有限公司 Jump recognition method and apparatus, and electronic device and storage medium
CN116912947A (en) * 2023-08-25 2023-10-20 东莞市触美电子科技有限公司 Intelligent screen, screen control method, device, equipment and storage medium thereof
CN116912947B (en) * 2023-08-25 2024-03-12 东莞市触美电子科技有限公司 Intelligent screen, screen control method, device, equipment and storage medium thereof

Similar Documents

Publication Publication Date Title
CN109145784B (en) Method and apparatus for processing video
CN111753795A (en) Action recognition method and device, electronic equipment and storage medium
CN109583340B (en) Video target detection method based on deep learning
Tran et al. Two-stream flow-guided convolutional attention networks for action recognition
Yang et al. Single image haze removal via region detection network
Xu et al. Two-stream region convolutional 3D network for temporal activity detection
CN111767866B (en) Human body model creation method and device, electronic equipment and storage medium
CN113850248B (en) Motion attitude evaluation method and device, edge calculation server and storage medium
CN109960962B (en) Image recognition method and device, electronic equipment and readable storage medium
CN110427900B (en) Method, device and equipment for intelligently guiding fitness
CN110942006A (en) Motion gesture recognition method, motion gesture recognition apparatus, terminal device, and medium
US20190311186A1 (en) Face recognition method
CN113065645A (en) Twin attention network, image processing method and device
Tsagkatakis et al. Goal!! event detection in sports video
CN112200057A (en) Face living body detection method and device, electronic equipment and storage medium
CN112084952B (en) Video point location tracking method based on self-supervision training
CN111738202A (en) Key point identification method and device, electronic equipment and storage medium
CA3061908C (en) Ball trajectory tracking
CN112070181A (en) Image stream-based cooperative detection method and device and storage medium
CN111753796A (en) Method and device for identifying key points in image, electronic equipment and storage medium
KR102203109B1 (en) Method and apparatus of processing image based on artificial neural network
CN112560618A (en) Behavior classification method based on skeleton and video feature fusion
US20220273984A1 (en) Method and device for recommending golf-related contents, and non-transitory computer-readable recording medium
Bibi et al. Human interaction anticipation by combining deep features and transformed optical flow components
JP7253967B2 (en) Object matching device, object matching system, object matching method, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination