WO2022166344A1 - 动作计数方法、装置、设备及存储介质 - Google Patents

动作计数方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2022166344A1
WO2022166344A1 PCT/CN2021/134033 CN2021134033W WO2022166344A1 WO 2022166344 A1 WO2022166344 A1 WO 2022166344A1 CN 2021134033 W CN2021134033 W CN 2021134033W WO 2022166344 A1 WO2022166344 A1 WO 2022166344A1
Authority
WO
WIPO (PCT)
Prior art keywords
action
video frame
gaussian
video
sequence
Prior art date
Application number
PCT/CN2021/134033
Other languages
English (en)
French (fr)
Inventor
葛成伟
关涛
童俊文
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022166344A1 publication Critical patent/WO2022166344A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the embodiments of the present application relate to the technical field of video recognition, and in particular, to an action counting method, apparatus, device, and storage medium.
  • Video action count refers to counting the occurrences of a certain repetitive action for a given video.
  • Video action counting has important applications in industrial production, agricultural production and daily life. For example, in the industrial process production process, the completion degree of certain processes is directly related to the quality of the final product. Too few or too many process actions directly lead to The quality of the product is not good, and it is even unusable; in sports activities, the number of movements needs to be counted when players are assessed for rope skipping, sit-ups, pull-ups and other items.
  • the action counting method for video either simply uses the periodicity of the action to count the action, or simply uses a single frame image to perform the action classification and recognition, and then realizes the action count.
  • embodiments of the present application provide an action counting method, apparatus, device, and storage medium.
  • an action counting method which includes: using a video frame action recognition model obtained by pre-training to identify the video to be counted, and obtaining a Gaussian regression output sequence; according to the Gaussian regression output sequence Gaussian modeling is performed to obtain a Gaussian model; actions are counted according to the number of Gaussian distributions in the Gaussian model.
  • the embodiment of the present application also provides an action counting device, including: a network model inference module, configured to use a video frame action recognition model obtained by pre-training to identify the video to be counted, and obtain a Gaussian regression output sequence; a Gaussian modeling processing module , is set to perform Gaussian modeling according to the Gauss regression output sequence to obtain a Gaussian model; the action counting module is set to count actions according to the number of Gaussian distributions in the Gaussian model.
  • a network model inference module configured to use a video frame action recognition model obtained by pre-training to identify the video to be counted, and obtain a Gaussian regression output sequence
  • a Gaussian modeling processing module is set to perform Gaussian modeling according to the Gauss regression output sequence to obtain a Gaussian model
  • the action counting module is set to count actions according to the number of Gaussian distributions in the Gaussian model.
  • An embodiment of the present application further provides an action counting device, comprising: a memory connected in communication with the at least one processor; wherein the memory stores an instruction executable by the at least one processor, and the instruction is executed by the at least one processor.
  • the at least one processor executes to enable the at least one processor to execute the action counting method as described above.
  • Embodiments of the present application further provide a computer-readable storage medium storing a computer program.
  • the computer program implements the above-described action counting method when executed by the processor.
  • FIG. 1 is a flowchart of an action counting method provided by a first embodiment of the present application
  • FIG. 2 is a schematic diagram of the network structure of a video frame action recognition model involved in the action counting method provided by the first embodiment of the present application;
  • FIG. 3 is a schematic diagram of a Gaussian model obtained by Gaussian modeling in the action counting method provided by the first embodiment of the present application;
  • FIG. 4 is a flowchart of an action counting method provided by a second embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an action counting device provided by a third embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an action counting device provided by a fourth embodiment of the present application.
  • the first embodiment of the present application relates to an action counting method.
  • a video frame action recognition model obtained by pre-training is used to identify the video to be counted, and a Gaussian regression output sequence is obtained; then, according to the Gaussian regression output sequence, the Gaussian modeling is performed to obtain a Gaussian model; finally, actions are counted according to the number of Gaussian distributions in the Gaussian model. Because this method counts once for each complete action interval, compared with the method that simply uses the periodicity of the action or the single-frame image to count the action, the video frame action recognition model has better robustness and counts more precise.
  • the action counting method provided in this implementation is specifically applied to any terminal device that can execute the method.
  • the terminal device may be a client device, such as a personal computer, tablet computer, smart phone, etc., or a server device, such as a server, etc. , which is not limited in this embodiment.
  • this embodiment takes an example of counting actions performed by an operator in an operation performed on a certain process in an industrial process production process using the action counting method as an example.
  • Step 101 Identify the video to be counted by using a video frame action recognition model obtained by pre-training, and obtain a Gaussian regression output sequence.
  • this embodiment uses a Gaussian distribution to represent a complete action interval from the perspective of probability statistics, and then counts each complete action interval by counting action interval, the action counting of the video to be counted can be realized.
  • the network model structure on which the video frame action recognition model obtained by training in this embodiment is based at least needs to include a Gaussian regression output branch.
  • the acquired action video samples are videos of known actions, the start frame moment, end frame moment, key frame moment, and timing length factor of each action.
  • the action intervals in the action video samples are marked with a Gaussian distribution. of.
  • the action start frame moment in the action video sample is marked as t s
  • the action end frame moment is marked as te
  • the key frame moment is marked as t m
  • the timing scale factor is marked as s.
  • this embodiment makes the Gaussian distribution value of the action satisfy the following formula (1):
  • step (3) After completing the labeling of the action video samples through the above steps (1) and (2), the model training link mentioned in step (3) can be entered.
  • continuous video frames with a preset length are selected from the marked action video samples to obtain a continuous video frame sequence.
  • the video frames read from the marked action video samples for the first time are the 0th frame to the 31st frame, these 32 consecutive video frames.
  • the continuous video frame sequence is input into the Gaussian regression output branch in the network model structure.
  • each group of continuous video frame sequences read may be sequentially input into the Gaussian regression output branch in the network model structure, and the Gaussian regression output branch is used for each group of continuous video frames.
  • Each video frame in the frame sequence is analyzed and processed.
  • a cache queue can also be preset in the network model structure, and each group of input video frame sequences can be added to the cache queue in sequence, so that the Gauss regression output branch can complete a group of continuous video frame sequences. After the analysis and processing, the next group of continuous video frame sequences are sequentially taken out from the cache queue for analysis and processing, so as to avoid thread blocking and affect the processing speed.
  • the starting position of the continuous video frame sequence is randomly selected, and the Gaussian regression output branch is iteratively trained by the stochastic gradient descent method with momentum until the preset convergence condition is satisfied, and the video frame action recognition model is obtained. .
  • the preset convergence condition may be set according to actual business requirements. For example, for a business scenario requiring high convergence, the set convergence condition may be a relatively high number of training sessions.
  • the set convergence condition may be a lower number of training times.
  • the number of training times is set to 200,000 times, that is, after the Gauss regression output branch is iteratively trained for 200,000 times using the stochastic gradient descent method with momentum
  • the The current network model structure can be used as a video frame action recognition model.
  • the video to be counted by reading the video to be counted, it can also be read consecutive video frames from the video to be counted according to a preset length and input the video frame action recognition model obtained by training based on the above method. After the video frame action After the recognition model processing, the output result is the Gaussian distribution value of each video frame. By combining the Gaussian distribution values in sequence, the Gaussian regression output sequence mentioned in this embodiment can be obtained.
  • the network model structure used for training the video frame action recognition model may also include an action classification output branch.
  • the action classification output branch is mainly used to determine whether a video frame belongs to an action.
  • whether the video frame belongs to an action can be determined by judging the confidence level of each video frame. For example, when the confidence level is set higher than a certain threshold, it is determined that the video frame belongs to an action.
  • the action classification output branch can also be used to determine the specific action type of the video frame, such as running, jumping, walking, etc. List them one by one, which is not limited in this embodiment.
  • the pre-designed network model structure is iteratively trained to the network model using the marked action video samples.
  • the structure satisfies the preset convergence condition, and the video frame action recognition model is obtained, specifically:
  • continuous video frames with a preset length are selected from the marked action video samples to obtain a continuous video frame sequence.
  • the continuous video frame sequence is input into the Gaussian regression output branch in the network model structure.
  • the command outputs 1, otherwise the command outputs 0, so by identifying whether the binary label is specifically 0 or 1, it can be determined whether the video frame is an action.
  • the starting position of the continuous video frame sequence is randomly selected, and the Gauss regression output branch and the action classification output branch are iteratively trained by adopting the momentum stochastic gradient descent method until the preset convergence conditions are met, The video frame action recognition model is obtained.
  • the network model structure adopted by the video frame action recognition model may also include a 3D convolutional trunk.
  • the 3D convolution trunk is used for time series feature extraction.
  • the action video samples are labeled with a Gaussian distribution according to the sample labeling methods given in the above steps (1) and (2), firstly select from the labeled action video samples.
  • a continuous video frame of a preset length is obtained to obtain a continuous video frame sequence; then, the continuous video frame sequence is input into the 3D convolution trunk in FIG.
  • the obtained timing features need to be input into the Gaussian regression output branch and the action classification output branch in Figure 2 respectively; finally, the starting position of the continuous video frame sequence is randomly selected, and the stochastic gradient descent method with momentum is used for all the parameters.
  • the Gaussian regression output branch and the motion classification output branch are iteratively trained until a preset convergence condition is satisfied, and the video frame motion recognition model is obtained.
  • the final input continuous video frame sequence of the Gauss regression output branch and the action classification output branch is the continuous video frame sequence processed by the 3D convolution trunk.
  • the feature greatly reduces the complexity of the training to obtain the action recognition model of the video frame, and the complexity of the features extracted when the video frame action recognition model obtained by the training is used for recognition in the later stage, thereby reducing the final computational complexity.
  • this embodiment selects the 3D convolution version of the 18-layer residual network, that is, ResNet8-3D, as the 3D convolution backbone for timing feature privileges.
  • both the action classification output branch and the Gauss regression output branch include a fully connected layer, and a loss function is used for corresponding processing after the fully connected layer.
  • the loss function used is specifically the softmax cross-entropy loss function;
  • the Gauss regression output branch in order to facilitate subsequent processing according to the Gaussian distribution value output by the Gauss regression output branch, this embodiment stipulates that The output range of the Gauss regression output branch is between 0.0 and 1.0, so for the Gauss regression output branch, the loss function used is specifically the sigmoid cross entropy loss function.
  • the action classification output branch is to convert the Gaussian distribution label (“ Mr” in FIG. 3 ) into a binary label (“—” in FIG. 3 ), and the Gaussian distribution in this embodiment
  • the range of the Gaussian distribution value output by the regression output branch is limited between 0.0 and 1.0, so it can be set when the value of the Gaussian label is greater than 0, indicating an action, then the result output by the action classification output branch is 1, otherwise The output result is 0.
  • the video frame action recognition model is obtained by training, so that the video frame action recognition model can achieve the effect of low complexity, high robustness, convergence, and accurate calculation.
  • Step 102 Perform Gaussian modeling according to the Gaussian regression output sequence to obtain a Gaussian model.
  • Gaussian fitting that is, Gaussian modeling
  • Gaussian modeling is performed according to the Gaussian regression output sequence, and the result as shown in FIG. 3 is obtained.
  • Gauss regression output sequence specifically:
  • step d If y max ⁇ val , end the piecewise Gaussian fitting, return parameters k and ⁇ , otherwise go to step d);
  • ( ⁇ , ⁇ ) are the parameters to be estimated, and the logarithm of both sides can be obtained:
  • Parameter estimates can be obtained using the least squares method
  • Gaussian modeling is performed based on the action video sample shown in FIG. 3 , and the obtained Gaussian model is the Gaussian fitting in FIG. 3 . Results ("." in Figure 3).
  • Step 103 Count actions according to the number of Gaussian distributions in the Gaussian model.
  • a Gaussian distribution that is, from the lowest point to the highest point, and then back to the lowest point, can represent a complete action interval, and a complete action interval, that is, corresponding to a time action. Therefore, in this embodiment, when counting actions according to the number of Gaussian distributions in the Gaussian model shown in FIG. 3 , it essentially estimates each complete Gaussian distribution in the Gaussian model (“... ), and then the number of Gaussian distributions obtained by statistics is taken as the number of actions included in the video to be counted, so as to realize the action counting for the video to be counted.
  • Fig. 3 it can be determined by statistics that the Gaussian model shown in Fig. 3 includes 4 complete Gaussian distributions, so the final number of actions is 4.
  • the action counting method uses a Gaussian distribution to represent a complete action interval from the perspective of probability and statistics.
  • the video frame action recognition model of the Gaussian regression value of the video frame and then when the action counting of the video to be counted is performed, the video frame action recognition model is used to identify the video to be counted, and the Gauss regression output sequence that can represent the entire video to be counted can be obtained. Then, by performing Gaussian modeling according to the output sequence of Gaussian regression, a Gaussian model that records the Gaussian distribution corresponding to each action interval in the video to be counted is obtained, and finally the number of Gaussian distributions in the Gaussian model is counted.
  • the action count of the video to be counted can be realized by taking the count as the number of actions included in the video to be counted. Because this method counts once for each complete action interval, compared with the method that simply uses the periodicity of actions or single-frame images to count actions, the video frame action recognition model has better robustness and counts more precise.
  • the action counting method provided by this embodiment is based on the action counting method of Gaussian distribution, while outputting the number of actions, for any complete action, according to the Gaussian distribution, it is described as N( ⁇ i , ⁇ i ), 1 ⁇ i ⁇ k, then the start time point of the action is ⁇ i -3 ⁇ i , and the end time point of the action is ⁇ i +3 ⁇ i .
  • the action counting method provided in this embodiment can not only accurately predict the number of actions according to the number of fitted Gaussians, but also give the start time point and end time point of the action according to the Gaussian distribution, so as to locate the time series actions. has important guiding significance.
  • the second embodiment of the present application relates to an action counting method.
  • the second embodiment makes further improvements on the basis of the first embodiment.
  • the main improvements are: based on a video frame action recognition model that can determine whether a video frame is an action, the video to be counted is identified according to a preset overlapping strategy, In order to ensure the accuracy of the recognition results, and then to ensure the accuracy of the final action counting results.
  • the action counting method involved in the second embodiment includes the following steps:
  • Step 401 according to a preset overlapping strategy, the video frame action recognition model obtained by pre-training is used to identify the video to be counted, and a Gaussian regression output sequence is obtained.
  • the overlapping strategy in this embodiment specifies that the N-1th continuous video frame sequence of length T contains the same L video frames as the Nth continuous video frame sequence of length T, And L and T satisfy the following relationship: 0 ⁇ L ⁇ T.
  • the action classification output branch in the video frame action recognition model obtained by training is trained according to the binary labels, and the Gauss regression output branch is trained according to the Gaussian distribution labels, it is necessary to carry out the continuous action counting.
  • the action classification output branch in the video frame action recognition model will output the corresponding binary number for each video frame, that is, either 0 or 1; and Gaussian regression
  • the Gaussian distribution value output by the output branch will be distributed between [0, 1].
  • the Gaussian distribution value output by the Gauss regression output branch is between [0, 1], while the action classification output branch outputs the value of the Gaussian distribution.
  • the result is 1, otherwise, that is, when the video frame is not an action, the result output by the action classification output branch is 0.
  • the Gaussian distribution value of the repeated video frame is determined, which effectively ensures the accuracy of the Gaussian distribution value, and further increases the Gaussian model constructed based on the Gaussian distribution value. The accuracy of the count of actions performed.
  • the action classification result with higher confidence is selected as the target action classification result of the video frame identified by the video frame recognition model.
  • the continuous video frame sequence read for the first time is the video frame from the 0th frame to the 31st frame
  • the continuous video frame sequence read for the second time is the video frame from the 16th frame to the 47th frame.
  • the continuous video frame sequence read for the third time is the video frames from the 32nd frame to the 63rd frame... and so on, that is, the continuous video frame sequence read for the Nth time contains the The same L video frames in a sequence of consecutive video frames.
  • the Gauss regression output sequence of the final Gaussian model can be obtained.
  • Step 402 Perform Gaussian modeling according to the Gaussian regression output sequence to obtain a Gaussian model.
  • Step 403 Count actions according to the number of Gaussian distributions in the Gaussian model.
  • step 402 and step 403 in this embodiment are substantially the same as step 102 and step 103 in the first embodiment, which will not be repeated here.
  • the action counting method when the video frame action recognition model obtained by pre-training is used to identify the to-be-counted video to obtain a Gaussian regression output sequence, an overlapping strategy based on overlapping single-frame prediction is used from the to-be-counted video.
  • the input video frame action recognition model is selected for recognition, and finally the recognition result with higher confidence in the action classification output result in the overlapping video frame is selected as the prediction result of the video frame, which can not only correct the recognition error, but also can make the best use of it. It is possible to reduce misrecognition, thereby further ensuring the accuracy of the final action counting result of the video to be counted.
  • the action counting method provided in this embodiment does not have any limitation on the length of the video to be counted.
  • intensive prediction of all video frames in the video to be counted can be completed, and the entire counting process can be completed. It is convenient, simple, and easy to implement, so that it can be better adapted to various practical application scenarios.
  • the third embodiment of the present application relates to an action counting device, as shown in FIG. 5 , including: a network model inference module 501 , a Gaussian modeling processing module 502 and an action counting module 503 .
  • the network model inference module 501 is set to use the video frame action recognition model obtained by pre-training to identify the video to be counted, and obtain a Gaussian regression output sequence;
  • the Gaussian modeling processing module 502 is set to according to the Gaussian regression output sequence Gaussian modeling is performed to obtain a Gaussian model;
  • the action counting module 503 is set to count actions according to the number of Gaussian distributions in the Gaussian model.
  • the action counting device further includes: a network model training module.
  • the network model training module is used to perform iterative training by using the labeled action video samples according to the pre-designed network model structure to obtain a video frame action recognition model.
  • the video frame action recognition model obtained by training the network model training model may only be used to realize the recognition of Gaussian distribution values.
  • the designed network model structure can only include the Gauss regression output branch.
  • the network model training module is specifically used to train and obtain the video frame action recognition model according to the following process:
  • the pre-designed network model structure is iteratively trained until a preset convergence condition is satisfied, and the video frame action recognition model is obtained.
  • the performing Gaussian distribution annotation on the action interval in the action video sample is specifically:
  • the network model training module is specifically used to:
  • the starting position of the continuous video frame sequence is randomly selected, and the Gauss regression output branch is iteratively trained by using the momentum stochastic gradient descent method until a preset convergence condition is satisfied, and the video frame action recognition model is obtained.
  • the pre-designed network model structure may also include an action classification output branch.
  • the action classification output branch is used to determine whether the video frame belongs to an action.
  • the network model training module is specifically used to train and obtain the video frame action recognition model according to the following process:
  • the starting position of the continuous video frame sequence is randomly selected, and the Gaussian regression output branch and the action classification output branch are iteratively trained by the stochastic gradient descent method with momentum until the preset convergence conditions are met, and the obtained result is obtained.
  • the pre-designed network model structure may also include a 3D convolution trunk.
  • the network model training module is specifically used to train and obtain the video frame action recognition model according to the following process:
  • the continuous video frame sequence is input into the 3D convolution trunk in the network model structure, and the 3D convolution trunk is used for timing feature extraction, and the extracted timing features are input into the network as required.
  • the continuous video frame sequence output by the 3D convolution trunk is respectively input into the Gauss regression output branch and the action classification output branch in the network model structure;
  • the starting position of the continuous video frame sequence is randomly selected, and the Gaussian regression output branch and the action classification output branch are iteratively trained by the stochastic gradient descent method with momentum until the preset convergence conditions are met, and the obtained result is obtained.
  • the network model inference module 501 uses the video frame action recognition model obtained by pre-training to identify the video to be counted and obtains the Gauss regression output sequence, specifically: :
  • the video frame action recognition model obtained by pre-training is used to identify the video to be counted, and the Gauss regression output sequence is obtained.
  • the overlapping strategy specifies that the N-1th continuous video frame sequence of length T contains the same L video frames as the Nth continuous video frame sequence of length T, 0 ⁇ L ⁇ T.
  • the network model inference module 501 uses the video frame action recognition model obtained by pre-training to identify the video to be counted according to the preset overlapping strategy, and obtains the operation of the Gaussian regression output sequence, specifically:
  • the action classification result with higher confidence is selected as the target action classification result of the video frame identified by the video frame recognition model
  • the target Gaussian distribution values are sequentially arranged to obtain the Gauss regression output sequence.
  • Gaussian regression output sequence reflecting the actual situation.
  • the Gaussian modeling processing module 502 performs the Gaussian regression output sequence according to the Gaussian regression output sequence.
  • the Gaussian model is obtained, specifically:
  • Gaussian modeling is performed according to the Gaussian regression output sequence to obtain a Gaussian model.
  • the action counting module 503 when the action counting module 503 counts actions according to the number of Gaussian distributions in the Gaussian model, it is specifically:
  • the number of the Gaussian distribution is taken as the number of actions included in the video to be counted.
  • the action counting device provided by the embodiment of the present application has at least the following advantages:
  • this application uses Gaussian distribution to characterize a complete action interval, and the number of Gaussian distributions represents the number of actions, and then uses an efficient piecewise Gaussian fitting algorithm to perform Gaussian fitting to obtain Gaussian number, the application is more accurate and more robust in the realization of scene action counting application;
  • the application can output the number of actions and also give the start time and end time of the action according to the 3 ⁇ criterion of the Gaussian distribution, which is important for timing action positioning. guiding significance;
  • this embodiment is a device embodiment corresponding to the first or second embodiment, and this embodiment can be implemented in cooperation with the first or second embodiment.
  • the related technical details mentioned in the first or second embodiment are still valid in this embodiment, and are not repeated here in order to reduce repetition.
  • the related technical details mentioned in this embodiment can also be applied in the first or second embodiment.
  • a logical unit may be a physical unit, a part of a physical unit, or multiple physical units.
  • a composite implementation of the unit in order to highlight the innovative part of the present application, this embodiment does not introduce units that are not closely related to solving the technical problem raised by the present application, but this does not mean that there are no other units in this embodiment.
  • the fourth embodiment of the present application relates to an action counting device, as shown in FIG. 6 , comprising: including at least one processor 601 ; and a memory 602 communicatively connected to the at least one processor 601 ; wherein the memory 602 stores a Instructions executed by the at least one processor 601, the instructions are executed by the at least one processor 601, so that the at least one processor 601 can execute the action counting method described in the above method embodiments.
  • the memory 602 and the processor 601 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 601 and various circuits of the memory 602 together.
  • the bus may also connect together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein.
  • the bus interface provides the interface between the bus and the transceiver.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium.
  • the data processed by the processor 601 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 601 .
  • Processor 601 is responsible for managing the bus and general processing, and may also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions.
  • the memory 602 may be used to store data used by the processor 601 when performing operations.
  • a fifth embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the action counting method described in the above method embodiment is implemented.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
  • the action counting method, device, equipment and storage medium proposed in this application from the perspective of probability and statistics, use Gaussian distribution to represent a complete action interval, and pre-training based on this characteristic can identify each video in the video to be counted.
  • the video frame action recognition model of the Gaussian regression value of the frame and then when the action counting of the video to be counted is performed, the video frame action recognition model is used to identify the video to be counted, and the Gauss regression output sequence that can represent the entire video to be counted can be obtained.
  • Gaussian modeling according to the output sequence of Gaussian regression, a Gaussian model that records the Gaussian distribution corresponding to each action interval in the video to be counted is obtained.
  • the number of statistical Gaussian distributions is calculated.
  • the action count of the video to be counted can be realized as the number of actions included in the video to be counted. Because this method counts once for each complete action interval, compared with the method that simply uses the periodicity of actions or single-frame images to count actions, the video frame action recognition model has better robustness and counts more precise.
  • the action counting method, device, device and storage medium proposed in this application can also give the start time point and the end time point of the action according to the Gaussian distribution, It has important guiding significance for timing action positioning.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

一种动作计数方法、装置、设备及存储介质,动作计数方法包括:利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列(101);根据所述高斯回归输出序列进行高斯建模,得到高斯模型(102);根据所述高斯模型中高斯分布的个数,进行动作计数(103)。

Description

动作计数方法、装置、设备及存储介质
相关申请的交叉引用
本申请基于申请号为202110144646.7、申请日为2021年02月02日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请实施例涉及视频识别技术领域,特别涉及一种动作计数方法、装置、设备及存储介质。
背景技术
视频动作计数,是指对给定的视频,统计其中某个重复动作的出现次数。视频动作计数在工业生产、农业生产以及日常生活中具有重要的应用,例如在工业工艺生产过程中,某些工序的完成程度直接关系到最终产品的质量,工序动作次数过少或者过多直接导致产品质量欠佳,甚至无法使用;在体育活动中,对选手进行跳绳、仰卧起坐、引体向上等项目考核时,均需要进行动作次数计数。
然而,在一些情形下针对视频的动作计数方法,要么是单纯的利用动作的周期性进行动作计数,要么是单纯利用单帧图像进行动作分类识别,进而实现动作计数。
虽然,这两种方式可以实现动作计数,但是由于在实际应用中同一重复动作的周期性、频率、完整程度等均会呈现较大差异,这就使得按照固定周期提取到的特征存在不完整的情况,进而导致仅根据周期性来进行动作计数的方案缺乏鲁棒性;而通过单帧图像进行动作分类识别,由于关键动作姿态并不能完全定义完整动作类型,因而仅通过单帧图像进行动作分类识别的方案容易出现误判、计数不准确的问题。
发明内容
有鉴于此,本申请实施例提供一种动作计数方法、装置、设备及存储介质。
为解决上述技术问题,本申请的实施例提供了一种动作计数方法,包括:利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列;根据所述高斯回归输出序列进行高斯建模,得到高斯模型;根据所述高斯模型中高斯分布的个数,进行动作计数。
本申请实施例还提供了一种动作计数装置,包括:网络模型推理模块,被设置为利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列;高斯建模处理模块,被设置为根据所述高斯回归输出序列进行高斯建模,得到高斯模型;动作计数模块,被设置为根据所述高斯模型中高斯分布的个数,进行动作计数。
本申请实施例还提供了一种动作计数设备,包括:与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上所述的动作计数方法。
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序。所述计算机程序被处理器执行时实现上述所述的动作计数方法。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。
图1是本申请第一实施例提供的动作计数方法的流程图;
图2是本申请第一实施例提供的动作计数方法中涉及的视频帧动作识别模型的网络结构示意图;
图3是本申请第一实施例提供的动作计数方法中高斯建模得到的高斯模型的示意图;
图4是本申请第二实施例提供的动作计数方法的流程图;
图5是本申请第三实施例提供的动作计数装置的结构示意图;
图6是本申请第四实施例提供的动作计数设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
本申请的第一实施例涉及一种动作计数方法,该方法首先,利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列;然后,根据所述高斯回归输出序列进行高斯建模,得到高斯模型;最后,根据所述高斯模型中高斯分布的个数,进行动作计数。由于该方式是针对每一个完整的动作区间,进行一次计数,因而相较于单纯利用动作的周期性或单帧图像进行动作计数的方法,视频帧动作识别模型的鲁棒性更好,计数更加准确。
下面对本实施例的动作计数方法的实现细节进行说明,以下内容仅为方便理解而提供的实现细节,并非实施本方案的必须。
本实施提供的动作计数方法具体是应用于能够执行该方法的任意终端设备,该终端设备可以是客户端设备,比如个人计算机、平板电脑、智能手机等,也可以是服务端设备,比如服务器等,本实施例对此不做限制。
此外,为了便于说明,本实施例以动作计数方法应用于工业工艺生产过程中,对操作人员对某项工序进行的操作进行动作计数为例进行具体说明。
本实施例的具体流程如图1所示,具体包括以下步骤:
步骤101,利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列。
具体的说,在实际应用中,为了保证动作计数的顺利进行,需要预先训练获得所述视频 帧动作识别模型。
为了使最终基于训练获得的所述视频帧动作识别模型,统计的动作次数更加精准,本实施例从概率统计学的角度出发,用高斯分布来表征一个完整的动作区间,进而通过统计每一个完整的动作区间,便可以实现对待计数视频的动作计数。
故而,本实施例训练获得的视频帧动作识别模型所基于的网络模型结构至少需要包括高斯回归输出支路。
关于基于这种网络模型结构的训练,具体如下:
(1)获取动作视频样本。
可理解的,在实际训练中,获取的动作视频样本为已知动作次数,以及每一动作的起始帧时刻、终止帧时刻、关键帧时刻,以及时序长度因子的视频。
(2)对所述动作视频样本中的动作区间进行高斯分布标注。
具体的说,为了确保高斯分布的3σ(拉依达)区域均位于动作区间内,本实施例是根据高斯分布的拉依达3σ准则,对所述动作视频样本中的动作区间进行高斯分布标注的。
关于对所述动作视频样本中的动作区间进行高斯分布标注的操作,具体为:
将所述动作视频样本中的动作起始帧时刻标记为t s,动作终止帧时刻标记为t e,关键帧时刻标记为t m,时序尺度因子标记为s。
基于上述标注信息,本实施例令动作的高斯分布值满足如下公式(1):
Figure PCTCN2021134033-appb-000001
其中,μ=st m
Figure PCTCN2021134033-appb-000002
由此,在经上述步骤(1)和步骤(2)完成对动作视频样本的标注后,便可以进入步骤(3)所说的模型训练环节。
(3)利用标记好的所述动作视频样本,对预先设计好的网络模型结构进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
可理解的,由于动作视频样本往往是有多帧视频组成的,为了实现对动作视频样本中每一视频帧的识别训练,同时又避免一次性将整个动作视频样本输入网络模型结构进行训练,影响训练速度。本实施例在利用标记好的动作视频样本,对预先设计好的网络模型结构,即包括了高斯回归输出支路的网络模型结构进行迭代训练时,具体是通过以下流程实现:
首先,从标记好的所述动作视频样本中选择预设长度的连续视频帧,得到连续视频帧序列。
为了便于理解,以下结合实例进行说明:
比如,对于由100帧视频帧组成的动作视频样本,规定每次读取连续的32帧视频帧,作为输入网络模型结构中的高斯回归输出支路的连续视频帧序列。
则首次从标记好的动作视频样本中读取的视频帧为第0帧至第31帧,这32帧连续的视频帧。
然后,将所述连续视频帧序列输入所述网络模型结构中的所述高斯回归输出支路。
具体的,在实际应用中,可以按序将读取的每一组连续视频帧序列输入所述网络模型结构中的所述高斯回归输出支路,由高斯回归输出支路对每一组连续视频帧序列中的每一帧视 频帧进行分析处理。
进一步地,在实际应用中,还可以在网络模型结构中预置缓存队列,将输入的每一组连视频帧序列按序添加到缓存队列,以便高斯回归输出支路完成一组连续视频帧序列的分析处理后,再按序从缓存队列中取出下一组连续视频帧序列进行分析处理,从而避免线程阻塞,影响处理速度。
最后,随机选择所述连续视频帧序列的起始位置,并采用带动量的随机梯度下降法对所述高斯回归输出支路进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
具体的说,关于所述预设收敛条件,可以根据实际业务需求进行设置,比如对于要求高收敛的业务场景,设置的收敛条件可以是一个较高的训练次数。
相应地,对于收敛要求相对低的业务场景,设置的收敛条件可以是一个较低的训练次数。
在本实施例中,为了既保证收敛效果,又保证训练速度,设置的训练次数为200000次,即采用带动量的随机梯度下降法对所述高斯回归输出支路进行迭代训练200000次后,便可以将当前的网络模型结构作为视频帧动作识别模型了。
应当理解的是,上述示例仅是为了更好的理解本实施例的技术方案而列举的示例,不作为对本实施例的唯一限制。
由此,在实际应用中,通过将待计数视频,同样可以是按照预设长度从待计数视频中读取连续的视频帧输入基于上述方式训练获得的视频帧动作识别模型,经该视频帧动作识别模型处理,输出的结果便是每一视频帧的高斯分布值,通过按序将高斯分布值组合,便可以得到本实施例中所说的高斯回归输出序列了。
此外,在实际应用中,为了尽可能降低高斯回归输出支路的训练难度,同时加速网络收敛,训练获得视频帧动作识别模型所采用的网络模型结构还可以包括动作分类输出支路。
具体的说,所述动作分类输出支路主要用于判别视频帧是否属于动作。
具体的,在实际应用中,可以通过判断每一视频帧的的置信度来确定该帧视频是否属于动作,比如设置置信度高于某一阈值时,确定该帧视频帧属于动作。
此外,在实际应用中,除了可以借助动作分类输出支路判别视频帧是否属于动作,还可以借助动作分类输出支路判别视频帧的具体动作类型,比如跑、跳、走等,此处不再一一列举,本实施例对此也不做限制。
相应地,针对网络模型结构包括动作分类输出支路和高斯回归输出支路的情况,所述利用标记好的所述动作视频样本,对预先设计好的网络模型结构进行迭代训练至所述网络模型结构满足预设收敛条件,得到所述视频帧动作识别模型,具体为:
首先,从标记好的所述动作视频样本中选择预设长度的连续视频帧,得到连续视频帧序列。
然后,将所述连续视频帧序列输入所述网络模型结构中的所述高斯回归输出支路。
接着,将所述连续视频帧序列中每一视频帧对应的高斯分布标签转换为二值标签,并将转换后的所述连续视频帧序列输入所述网络模型结构中的所述动作分类输出支路。
即,在对动作分类输出支路进行训练之前,需要将动作视频样本中每一视频帧对应的高斯标签(gaussian label),即采用高斯分布标注的的高斯分布标签,转换为二值标签(binary label)。
具体的,可以规定高斯标签的取值大于0的,表示动作,则该指令输出1,否则该指令 输出0,故而通过识别二值标签具体是0还是1,便可以确定视频帧是否为动作。
最后,随机选择所述连续视频帧序列的起始位置,并采用带动量的随机梯度下降法对所述高斯回归输出支路和所述动作分类输出支路进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
即,对于包括动作分类输出支路和高斯回归输出支路的网络模型结构,在将连续视频帧输入网络模型结构时,具体是分别输入给这两个支路,这样高斯回归输出支路的训练过程既可以参考动作分类输出支路,又可以避免完全依赖动作分类输出支路的输出结果进行训练,会存在因动作分类输出结果异常导致高斯回归输出支路的收敛性、鲁棒性差的问题。
此外,在实际应用中,为了解决在一些情形下,进行深度特征提取时使用空间RGB图形与时序光流序列图进行时序特征提取,导致提取能力不强,并且计数复杂度高的问题,训练获得视频帧动作识别模型所采用的网络模型结构还可以包括3D卷积干路。
具体的说,所述3D卷积干路用于时序特征提取。
为了更好的理解本实施例中,基于包括3D卷积干路、动作分类输出支路和高斯回归输出支路的网络模型结构训练获得视频帧动作识别模型的过程,以下结合图2进行说明:
具体的说,在实际应用中,在按照上述步骤(1)和步骤(2)给出的样本标注方式,对动作视频样本进行高斯分布标注后,首先从标记好的所述动作视频样本中选择预设长度的连续视频帧,得到连续视频帧序列;接着,将所述连续视频帧序列输入图2中的3D卷积干路,由所述3D卷积干路进行时序特征提取,并将提取到的时序特征作为需要分别输入图2中的高斯回归输出支路和动作分类输出支路;最后,随机选择所述连续视频帧序列的起始位置,并采用带动量的随机梯度下降法对所述高斯回归输出支路和所述动作分类输出支路进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
可理解的,由于动作分类输出支路的训练是基于二值标签的因此3D卷积干路输出的连续视频帧序列在输入动作分类输出支路时,同样需要进行上述所说的将高斯标签转换为二值标签的操作。
通过上述描述可知,最终输入高斯回归输出支路和动作分类输出支路的连续视频帧序列是3D卷积干路处理过的连续视频帧序列,基于3D卷积提取能力强,计算复杂度低的特性,大大降低了训练获得视频帧动作识别模型,以及后期利用训练获得的视频帧动作识别模型进行识别时提取的特征的复杂度,进而降低了最终的计算复杂度。
可理解的,在实际应用中,3D卷积主干具体选择包括多少层的残差网络,可以根据实际业务需求决定,比如说对于要求收敛性、鲁棒性较高,对训练时间要求较低的情况,可以选择层相对多的残差网络,反之则选择层相对少的残差网络。
结合上述两点,本实施例选择了18层残差网络的3D卷积版本,即ResNet8-3D作为3D卷积主干进行时序特征特权。
此外,可理解的,在实际应用中,不论是动作分类输出支路,还是高斯回归输出支路,均包括全连接层,并在全连接层后采用损失函数进行了相应处理。
具体的,对于动作分类输出支路,采用的损失函数具体为softmax交叉熵损失函数;对于高斯回归输出支路,为了便于根据高斯回归输出支路输出的高斯分布值进行后续处理,本实施例规定高斯回归输出支路的输出范围在0.0至1.0之间,故而对于高斯回归输出支路,采用的损失函数具体为sigmoid交叉熵损失函数。
此外,需要说明的是,由于动作分类输出支路是为了将高斯分布标签(图3中的“о”)转换为二值标签(图3中的“——”),而本实施例中高斯回归输出支路输出的高斯分布值的的范围是限定在0.0至1.0之间的,故而可以设置在高斯标签的取值大于0时,表示动作,则动作分类输出分支输出的结果为1,否则输出的结果为0。
应当理解的是,以上给出的是针对三种业务需求,预设设计的网络模型结构包括的具体干路、支路,在实际应用中,本领域技术人员可以根据需要选择合适的网络模型结构训练获得视频帧动作识别模型,以使得视频帧动作识别模型能够实现低复杂度、高鲁棒性、收敛性,以及计算准确的效果。
步骤102,根据所述高斯回归输出序列进行高斯建模,得到高斯模型。
具体的说,在实际应用中,对于高斯回归输出序列,若使用整体最小二乘法进行拟合很容易陷入局部最优解,进而导致高斯拟合失败。故而,本实施例在进行高斯拟合,即高斯建模时,具体是基于分段高斯拟合的启发式方法,根据所述高斯回归输出序列进行高斯建模,进而得到如图3所示的高斯模型。
关于基于分段高斯拟合的启发式方法,根据所述高斯回归输出序列进行高斯建模的操作,具体为:
a)给定高斯回归输出序列y(t),设定最小高斯取值阈值ε val=0.1,最小高斯采样点数阈值ε num=4,初始高斯个数k=0,高斯参数
Figure PCTCN2021134033-appb-000003
b)计算序列y(t)的最大值y max并记录最大值对应的索引序号为t max
c)若y maxval,结束分段高斯拟合,返回参数k及Ω,否则进入步骤d);
d)以t max作为起始点,往左查找y(t)中第一个小于ε val的值,记其对应的索引序号为t l
e)以t max作为起始点,往右查找y(t)中第一个小于ε val的值,记其对应的索引序号为t r
f)若|t r-t l|<ε num则将y(t l:t r)置0,返回步骤b),否则对y(t l:t r)进行单高斯拟合,置高斯个数k=k+1,高斯参数(μ kk),高斯参数集合Ω=Ω∪(μ kk),将y(t l:t r)置0,返回步骤b)。
关于上述所说的单高斯拟合,具体过程如下:
对单个高斯模型,可以表示为如下公式(2):
Figure PCTCN2021134033-appb-000004
其中,(μ,σ)为待估计参数,两边取对数可以得到:
Figure PCTCN2021134033-appb-000005
其中,
Figure PCTCN2021134033-appb-000006
利用最小二乘法可以得到参数估计
Figure PCTCN2021134033-appb-000007
由此,实现了根据所述高斯回归输出序列进行高斯建模,得到高斯模型的操作,如基于图3所示的动作视频样本进行高斯建模,得到的高斯模型即为图3中高斯拟合结果(图3中的“....”)。
步骤103,根据所述高斯模型中高斯分布的个数,进行动作计数。
具体的说,从概率统计学的角度出发,一个高斯分布,即从最低点到最高点,再回到最低点,能够表征一个完整的动作区间,而一个完整的动作区间,即对应的是一次动作。故而,本实施例在根据图3所示的高斯模型中的高斯分布的个数进行动作计数时,实质是通过估计高斯模型中每一个完整的高斯分布(图3中的“....”),进而将统计得到的高斯分布个数作为待计数视频中包括的动作次数,实现针对待计数视频的动作计数。
仍以图3为例,通过统计可以确定,图3所示的高斯模型中包括了4个完整的高斯分布,故而最终的的动作次数即为4。
通过上述描述不难发现,本实施例提供的动作计数方法,从概率统计学的角度出发,用高斯分布来表征一个完整的动作区间,基于这一特性预先训练能够识别出待计数视频中每一视频帧的高斯回归值的视频帧动作识别模型,进而在对待计数视频进行动作计数时,利用视频帧动作识别模型对待计数视频进行识别,便可以得到能够表征整个待计数视频的高斯回归输出序列,进而通过根据高斯回归输出序列进行高斯建模,得到记录了待计数视频中每一个动作区间对应的高斯分布的高斯模型,最终通过统计高斯模型中高斯分布的个数,将统计的高斯分布的个数作为待计数视频中包括的动作次数即可实现对待计数视频的动作计数。该方式由于是针对每一个完整的动作区间,进行一次计数,因而相较于单纯利用动作的周期性或单帧图像进行动作计数的方法,视频帧动作识别模型的鲁棒性更好,计数更加准确。
此外,本实施例提供的动作计数方法,基于高斯分布的动作计数方式,在输出动作次数的同时,对任一完整动作,根据高斯分布描述为N(μ ii),1≤i≤k,那么动作的起始时间点即为μ i-3σ i,动作的终止时间点即为μ i+3σ i
也就是说,本实施例提供的动作计数方法,除了可以根据拟合的高斯个数准确的预测动作次数,还可以根据高斯分布给出动作的起始时间点和终止时间点,对时序动作定位具有重要的指导意义。
本申请的第二实施例涉及一种动作计数方法。第二实施例在第一实施例的基础上做了进一步改进,主要改进之处为:基于能够判别视频帧是否为动作的视频帧动作识别模型,按照预设的重叠策略对待计数视频进行识别,以保证识别结果的准确性,进而保证最终动作计数结果的准确性。
如图4所示,第二实施例涉及的动作计数方法,包括如下步骤:
步骤401,按照预设的重叠策略,利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列。
具体的说,本实施例中所说的重叠策略规定了第N-1个长度为T的连续视频帧序列中含有与第N个长度为T的连续视频帧序列中相同的L个视频帧,并且L和T满足如下关系:0<L<T。
关于按照上述重叠策略,利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列的操作,具体如下:
(1)从所述待计数视频中选择固定长度T的连续视频帧,得到N个长度为T的连续视频帧序列。
(2)依次将N个长度为T的连续视频帧序列输入所述视频帧动作识别模型,得到每一个长度为T的连续视频帧序列中每一视频帧的动作分类输出结果和高斯分布值,所述动作分类 输出结果为1或0,所述高斯分布值的取值在[0,1]之间。
可理解的,由于训练获得的视频帧动作识别模型中动作分类输出支路是根据二值标签训练的,而高斯回归输出支路是根据高斯分布标签训练的,故而在将需要进行动作计数的连续视频帧输入到训练获得的视频帧动作识别模型后,视频帧动作识别模型中的动作分类输出支路会针对每一视频帧输出对应的二进制数,即要么是0,要么是1;而高斯回归输出支路输出的高斯分布值则会分布在[0,1]之间。
(3)对于每一所述视频帧,判断对应的动作分类输出结果是否为1。
(4)如果为1,确定所述视频帧是动作,并获取所述视频帧的动作分类输出结果的置信度。
可理解的,由于在训练视频帧动作识别模型时,规定了对于对应动作的视频帧,高斯回归输出支路输出的高斯分布值在[0,1]之间,而动作分类输出支路输出的结果为1,反之,即视频帧不是动作时,动作分类输出支路输出的结果为0。
因此,基于这一关系,并结合动作分类输出结果的置信度值,来确定重复的视频帧的高斯分布值,有效保证了高斯分布值的准确性,进而增加了基于高斯分布值构建的高斯模型进行的动作计数结果的准确性。
(5)将第N-1个长度为T的连续视频帧序列与第N个长度为T的连续视频帧序列中相同视频帧的动作分类输出结果的置信度进行比较。
(6)根据比较结果,选择置信度较高的动作分类结果作为所述视频帧识别模型识别出的所述视频帧的目标动作分类结果。
(7)将所述目标动作分类结果对应的高斯分布值作为所述述视频帧识别模型识别出的所述视频帧的目标高斯分布值。
(8)根据所述视频帧在所述待计数视频中出现的时间,顺序排列所述目标高斯分布值,得到所述高斯回归输出序列。
为了更好的理解上述步骤(1)至步骤(8)的操作,以下结合实例进行说明:
假设待计数视频包括100帧视频帧,固定长度T=32,即每次从待计数视频中读取连续的32帧视频帧,重叠长度L=16,即相邻两组连续视频帧序列中,有16帧视频帧是相同的。
基于上述规定,第一次读取到的连续视频帧序列为第0帧至第31帧的视频帧,第二次读取到的连续视频帧序列为第16帧至第47帧的视频帧,第三次读取到的连续视频帧序列为第32帧至第63帧的视频帧...以此类推,即第N次读取到的连续视频帧序列含有与N-1次读取到的连续视频帧序列中相同的L帧视频帧。
对于相邻两次读取到的连续视频帧序列中相同的L帧视频帧,选择两次输出的动作分类输出结果的置信度较高的一个对应的动作分类输出结果和高斯分布值作为这L帧视频帧对应的目标动作分类输出结果和目标高斯分布值。
最终,按序排列待计数视频中每一个视频帧的目标高斯分布值,便可以得到最终构建高斯模型的高斯回归输出序列。
步骤402,根据所述高斯回归输出序列进行高斯建模,得到高斯模型。
步骤403,根据所述高斯模型中高斯分布的个数,进行动作计数。
不难发现,本实施例中的步骤402和步骤403与第一实施例中的步骤102和步骤103大致相同,在此就不再赘述。
由此,本实施例提供的动作计数方法,在利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,以获得高斯回归输出序列时,通过基于有重叠的单帧预测的重叠策略从待计数视频中选择输入视频帧动作识别模型进行识别,最终通过选取重叠的视频帧中动作分类输出结果的置信度较高的识别结果作为该视频帧的预测结果,不仅可以纠正识别错误,还能够尽可能减少误识别,从而进一步保证了最终对待计数视频进行的动作计数结果的准确性。
此外,本实施例提供的动作计数方法,对待计数视频的长度没有任何限制,通过循环输入固定帧数到视频帧动作识别模型,便能够完成对待计数视频中所有视频帧的密集预测,整个计数过程方便简单,易于实现,从而能够更好的适应于各种实际应用场景。
此外,应当理解的是,上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本申请的第三实施例涉及一种动作计数装置,如图5所示,包括:网络模型推理模块501、高斯建模处理模块502和动作计数模块503。
其中,网络模型推理模块501,被设置为利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列;高斯建模处理模块502,被设置为根据所述高斯回归输出序列进行高斯建模,得到高斯模型;动作计数模块503,被设置为根据所述高斯模型中高斯分布的个数,进行动作计数。
此外,在另一个例子中,动作计数装置,还包括:网络模型训练模块。
具体的,网络模型训练模块,用于根据预先设计好的网络模型结构,利用标注好的动作视频样本进行迭代训练,以获得视频帧动作识别模型。
此外,在另一个例子中,经网络模型训练模型训练获得的视频帧动作识别模型可以仅用于实现高斯分布值的识别。
故而,对应这种情况,实现设计好的网络模型结构可以仅包括高斯回归输出支路。
相应地,网络模型训练模块,具体用于按照如下流程训练获得视频帧动作识别模型:
获取动作视频样本;
对所述动作视频样本中的动作区间进行高斯分布标注;
利用标记好的所述动作视频样本,对预先设计好的网络模型结构进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
此外,在另一个例子中,所述对所述动作视频样本中的动作区间进行高斯分布标注,具体为:
将所述动作视频样本中的动作起始帧时刻标记为t s,动作终止帧时刻标记为t e,关键帧时刻标记为t m,时序尺度因子标记为s;
令动作的高斯分布值满足如下公式:
Figure PCTCN2021134033-appb-000008
其中,μ=st m
Figure PCTCN2021134033-appb-000009
此外,在另一个例子中,网络模型训练模块具体用于:
从标记好的所述动作视频样本中选择预设长度的连续视频帧,得到连续视频帧序列;
将所述连续视频帧序列输入所述网络模型结构中的所述高斯回归输出支路;
随机选择所述连续视频帧序列的起始位置,并采用带动量的随机梯度下降法对所述高斯回归输出支路进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
此外,在另一个例子中,为了提视频帧动作识别模型的收敛效果,同时降低训练过程中高斯回归输出支路的训练难度,预先设计好的网络模型结构还可以包括动作分类输出支路。
具体的,动作分类输出支路,用于判断视频帧是否属于动作。
相应地,网络模型训练模块,具体用于按照如下流程训练获得视频帧动作识别模型:
从标记好的所述动作视频样本中选择预设长度的连续视频帧,得到连续视频帧序列;
将所述连续视频帧序列输入所述网络模型结构中的所述高斯回归输出支路;
将所述连续视频帧序列中每一视频帧对应的高斯分布标签转换为二值标签,并将转换后的所述连续视频帧序列输入所述网络模型结构中的所述动作分类输出支路;
随机选择所述连续视频帧序列的起始位置,并采用带动量的随机梯度下降法对所述高斯回归输出支路和所述动作分类输出支路进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
此外,在另一个例子中,为了进一步降低训练获得视频帧动作识别模型的复杂度,预先设计好的网络模型结构还可以包括3D卷积干路。
相应地,网络模型训练模块,具体用于按照如下流程训练获得视频帧动作识别模型:
从标记好的所述动作视频样本中选择预设长度的连续视频帧,得到连续视频帧序列;
将所述连续视频帧序列输入所述网络模型结构中的所述3D卷积干路,由所述3D卷积干路进行时序特征提取,并将提取到的时序特征作为需要分别输入所述网络模型结构中的所述高斯回归输出支路和所述动作分类输出支路的连续视频帧序列;
将3D卷积干路输出的连续视频帧序列分别输入所述网络模型结构中的所述高斯回归输出支路和所述动作分类输出支路;
随机选择所述连续视频帧序列的起始位置,并采用带动量的随机梯度下降法对所述高斯回归输出支路和所述动作分类输出支路进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
此外,在另一个例子中,为了保证最终统计的动作次数的准确性,网络模型推理模块501在利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列时,具体为:
按照预设的重叠策略,利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列。
需要说明的,在本实施例中,所述重叠策略规定第N-1个长度为T的连续视频帧序列中含有与第N个长度为T的连续视频帧序列中相同的L个视频帧,0<L<T。
此外,在另一个例子中,网络模型推理模块501按照预设的重叠策略,利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列的操作,具体为:
从所述待计数视频中选择固定长度T的连续视频帧,得到N个长度为T的连续视频帧序列;
依次将N个长度为T的连续视频帧序列输入所述视频帧动作识别模型,得到每一个长度为T的连续视频帧序列中每一视频帧的动作分类输出结果和高斯分布值,所述动作分类输出结果为1或0,所述高斯分布值的取值在[0,1]之间;
对于每一所述视频帧,判断对应的动作分类输出结果是否为1;
如果为1,确定所述视频帧是动作,并获取所述视频帧的动作分类输出结果的置信度;
将第N-1个长度为T的连续视频帧序列与第N个长度为T的连续视频帧序列中相同视频帧的动作分类输出结果的置信度进行比较;
根据比较结果,选择置信度较高的动作分类结果作为所述视频帧识别模型识别出的所述视频帧的目标动作分类结果;
将所述目标动作分类结果对应的高斯分布值作为所述述视频帧识别模型识别出的所述视频帧的目标高斯分布值;
根据所述视频帧在所述待计数视频中出现的时间,顺序排列所述目标高斯分布值,得到所述高斯回归输出序列。
即,相邻两个相同长度的连续视频帧序列中存在部分相同的视频帧,通过这种对给点视频帧进行重叠预测的方式来确定重叠的视频帧的实际高斯分布值,进而得到能够准确反映实际情况的高斯回归输出序列。
此外,在另一个例子中,为了避免单纯采用单高斯拟合进行高斯建模容易陷入最优解,进而导致高斯拟合失败的问题,高斯建模处理模块502据所述高斯回归输出序列进行高斯建模,得到高斯模型时,具体为:
基于分段高斯拟合的启发式方法,根据所述高斯回归输出序列进行高斯建模,得到高斯模型。
此外,在另一个例子中,动作计数模块503在根据所述高斯模型中高斯分布的个数,进行动作计数时,具体为:
统计所述高斯模型中每一个完整的高斯分布,得到高斯分布的个数;
将所述高斯分布的个数作为所述待计数视频中包括的动作次数。
基于此,本申请实施例提供的动作计数装置,至少具有如下优点:
(1)本申请设计的3D卷积干路、动作分类输出支路与高斯回归输出支路的网络模型结构,借助3D卷积干路进行时序特征提取,不仅提取能力强,同时大大降低了计算复杂度,而动作分类输出支路的引入可以大大降低高斯回归输出支路的训练难度,同时加速网络收敛;
(2)本申请从概率统计学的角度出发,用高斯分布来表征一个完整的动作区间,高斯分布个数即代表动作次数,再使用高效的分段高斯拟合算法进行高斯拟合,获取高斯个数,本申请在实现场景动作计数应用中更加准确、更加鲁棒;
(3)得益于新颖的高斯分布动作表征方法,本申请在输出动作次数的同时,还可以根据高斯分布的3σ准则给出动作的起始时间点及终止时间点,对时序动作定位具有重要的指导意义;
(4)本申请在进行网络模型推理时,一方面,基于有重叠的单帧预测的策略,可以纠正识别错误,减少误识别;另一方面,通过循环输入固定帧数送入网络模型,完成所有视频帧的密集预测,因而本申请对视频的长度没有任何限制。
此外,不难发现,本实施例为与第一或第二实施例相对应的装置实施例,本实施例可与 第一或第二实施例互相配合实施。第一或第二实施例中提到的相关技术细节在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在第一或第二实施例中。
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
本申请的第四实施例涉及一种动作计数设备,如图6所示,包括:包括至少一个处理器601;以及,与至少一个处理器601通信连接的存储器602;其中,存储器602存储有可被至少一个处理器601执行的指令,指令被至少一个处理器601执行,以使至少一个处理器601能够执行上述方法实施例所描述的动作计数方法。
其中,存储器602和处理器601采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器601和存储器602的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器601处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器601。
处理器601负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器602可以被用于存储处理器601在执行操作时所使用的数据。
本申请的第五实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例所描述的动作计数方法。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请提出的动作计数方法、装置、设备及存储介质,从概率统计学的角度出发,用高斯分布来表征一个完整的动作区间,基于这一特性预先训练能够识别出待计数视频中每一视频帧的高斯回归值的视频帧动作识别模型,进而在对待计数视频进行动作计数时,利用视频帧动作识别模型对待计数视频进行识别,便可以得到能够表征整个待计数视频的高斯回归输出序列,进而通过根据高斯回归输出序列进行高斯建模,得到记录了待计数视频中每一个动作区间对应的高斯分布的高斯模型,最终通过统计高斯模型中高斯分布的个数,将统计的高斯分布的个数作为待计数视频中包括的动作次数即可实现对待计数视频的动作计数。该方式由于是针对每一个完整的动作区间,进行一次计数,因而相较于单纯利用动作的周期性或单 帧图像进行动作计数的方法,视频帧动作识别模型的鲁棒性更好,计数更加准确。
此外,本申请提出的动作计数方法、装置、设备及存储介质,基于高斯分布的动作计数方式,在输出动作次数的同时,还可以根据高斯分布给出动作的起始时间点和终止时间点,对时序动作定位具有重要的指导意义。
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (13)

  1. 一种动作计数方法,包括:
    利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列;
    根据所述高斯回归输出序列进行高斯建模,得到高斯模型;
    根据所述高斯模型中高斯分布的个数,进行动作计数。
  2. 如权利要求1所述的动作计数方法,其中,所述视频帧动作识别模型的训练,包括:
    获取动作视频样本;
    对所述动作视频样本中的动作区间进行高斯分布标注;
    利用标记好的所述动作视频样本,对预先设计好的网络模型结构进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型;
    其中,所述网络模型结构包括高斯回归输出支路。
  3. 如权利要求2所述的动作计数方法,其中,所述对所述动作视频样本中的动作区间进行高斯分布标注,包括:
    将所述动作视频样本中的动作起始帧时刻标记为t s,动作终止帧时刻标记为t e,关键帧时刻标记为t m,时序尺度因子标记为s;
    令动作的高斯分布值满足如下公式:
    Figure PCTCN2021134033-appb-100001
    其中,μ=st m
    Figure PCTCN2021134033-appb-100002
  4. 如权利要求2所述的动作计数方法,其中,所述利用标记好的所述动作视频样本,对预先设计好的网络模型结构进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型,包括:
    从标记好的所述动作视频样本中选择预设长度的连续视频帧,得到连续视频帧序列;
    将所述连续视频帧序列输入所述网络模型结构中的所述高斯回归输出支路;
    随机选择所述连续视频帧序列的起始位置,并采用带动量的随机梯度下降法对所述高斯回归输出支路进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
  5. 如权利要求2至4任一项所述的动作计数方法,其中,所述网络模型结构还包括动作分类输出支路;
    所述利用标记好的所述动作视频样本,对预先设计好的网络模型结构进行迭代训练至所述网络模型结构满足预设收敛条件,得到所述视频帧动作识别模型,包括:
    从标记好的所述动作视频样本中选择预设长度的连续视频帧,得到连续视频帧序列;
    将所述连续视频帧序列输入所述网络模型结构中的所述高斯回归输出支路;
    将所述连续视频帧序列中每一视频帧对应的高斯分布标签转换为二值标签,并将转换后的所述连续视频帧序列输入所述网络模型结构中的所述动作分类输出支路;
    随机选择所述连续视频帧序列的起始位置,并采用带动量的随机梯度下降法对所述高斯回归输出支路和所述动作分类输出支路进行迭代训练至满足预设收敛条件,得到所述视频帧动作识别模型。
  6. 如权利要求5所述的动作计数方法,其中,所述网络模型结构还包括3D卷积干路;
    所述将所述连续视频帧序列输入所述网络模型结构中的所述高斯回归输出支路,将所述连续视频帧序列中每一视频帧对应的高斯分布标签转换为二值标签,并将转换后的所述连续视频帧序列输入所述网络模型结构中的所述动作分类输出支路之前,所述方法还包括:
    将所述连续视频帧序列输入所述网络模型结构中的所述3D卷积干路,由所述3D卷积干路进行时序特征提取,并将提取到的时序特征作为需要分别输入所述网络模型结构中的所述高斯回归输出支路和所述动作分类输出支路的连续视频帧序列。
  7. 如权利要求6所述的动作计数方法,其中,所述利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列,包括:
    按照预设的重叠策略,利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列;
    其中,所述重叠策略规定第N-1个长度为T的连续视频帧序列中含有与第N个长度为T的连续视频帧序列中相同的L个视频帧,0<L<T。
  8. 如权利要求7所述的动作计数方法,其中,所述按照预设的重叠策略,利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列,包括:
    从所述待计数视频中选择固定长度T的连续视频帧,得到N个长度为T的连续视频帧序列;
    依次将N个长度为T的连续视频帧序列输入所述视频帧动作识别模型,得到每一个长度为T的连续视频帧序列中每一视频帧的动作分类输出结果和高斯分布值,所述动作分类输出结果为1或0,所述高斯分布值的取值在[0,1]之间;
    对于每一所述视频帧,判断对应的动作分类输出结果是否为1;
    如果为1,确定所述视频帧是动作,并获取所述视频帧的动作分类输出结果的置信度;
    将第N-1个长度为T的连续视频帧序列与第N个长度为T的连续视频帧序列中相同视频帧的动作分类输出结果的置信度进行比较;
    根据比较结果,选择置信度较高的动作分类结果作为所述视频帧识别模型识别出的所述视频帧的目标动作分类结果;
    将所述目标动作分类结果对应的高斯分布值作为所述述视频帧识别模型识别出的所述视频帧的目标高斯分布值;
    根据所述视频帧在所述待计数视频中出现的时间,顺序排列所述目标高斯分布值,得到所述高斯回归输出序列。
  9. 如权利要求6所述的动作计数方法,其中,所述根据所述高斯回归输出序列进行高斯建模,得到高斯模型,包括:
    基于分段高斯拟合的启发式方法,根据所述高斯回归输出序列进行高斯建模,得到高斯模型。
  10. 如权利要求6所述的动作计数方法,其中,所述根据所述高斯模型中高斯分布的个数,进行动作计数,包括:
    统计所述高斯模型中每一个完整的高斯分布,得到高斯分布的个数;
    将所述高斯分布的个数作为所述待计数视频中包括的动作次数。
  11. 一种动作计数装置,包括:
    网络模型推理模块,被设置为利用预先训练获得的视频帧动作识别模型对待计数视频进行识别,得到高斯回归输出序列;
    高斯建模处理模块,被设置为根据所述高斯回归输出序列进行高斯建模,得到高斯模型;
    动作计数模块,被设置为根据所述高斯模型中高斯分布的个数,进行动作计数。
  12. 一种动作计数设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至10任一所述的动作计数方法。
  13. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至10中任一项所述的动作计数方法。
PCT/CN2021/134033 2021-02-02 2021-11-29 动作计数方法、装置、设备及存储介质 WO2022166344A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110144646.7 2021-02-02
CN202110144646.7A CN114842546A (zh) 2021-02-02 2021-02-02 动作计数方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022166344A1 true WO2022166344A1 (zh) 2022-08-11

Family

ID=82562500

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134033 WO2022166344A1 (zh) 2021-02-02 2021-11-29 动作计数方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN114842546A (zh)
WO (1) WO2022166344A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115512341A (zh) * 2022-09-15 2022-12-23 粤丰科盈智能投资(广东)有限公司 一种基于高斯分布拟合的目标检测方法、装置及计算机介质
CN116306766A (zh) * 2023-03-23 2023-06-23 北京奥康达体育产业股份有限公司 一种基于骨骼识别技术的智慧单杠引体向上考核训练系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661919B (zh) * 2022-09-26 2023-08-29 珠海视熙科技有限公司 一种重复动作周期统计方法、装置、健身设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740945A (zh) * 2016-02-04 2016-07-06 中山大学 一种基于视频分析的人群计数方法
CN110705408A (zh) * 2019-09-23 2020-01-17 东南大学 基于混合高斯人数分布学习的室内人数统计方法及系统

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740945A (zh) * 2016-02-04 2016-07-06 中山大学 一种基于视频分析的人群计数方法
CN110705408A (zh) * 2019-09-23 2020-01-17 东南大学 基于混合高斯人数分布学习的室内人数统计方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WU YANCHUN: "Online Human Action Analysis Based on Deep Learning", CHINESE MASTER'S THESES FULL-TEXT DATABASE, no. 1, 1 June 2019 (2019-06-01), pages 1 - 69, XP055957280, ISSN: 1674-0246, DOI: 10.27166/d.cnki.gsdcc.2019.000119 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115512341A (zh) * 2022-09-15 2022-12-23 粤丰科盈智能投资(广东)有限公司 一种基于高斯分布拟合的目标检测方法、装置及计算机介质
CN115512341B (zh) * 2022-09-15 2023-10-27 粤丰科盈智能投资(广东)有限公司 基于高斯分布拟合的目标检测方法、装置及计算机介质
CN116306766A (zh) * 2023-03-23 2023-06-23 北京奥康达体育产业股份有限公司 一种基于骨骼识别技术的智慧单杠引体向上考核训练系统
CN116306766B (zh) * 2023-03-23 2023-09-22 北京奥康达体育产业股份有限公司 一种基于骨骼识别技术的智慧单杠引体向上考核训练系统

Also Published As

Publication number Publication date
CN114842546A (zh) 2022-08-02

Similar Documents

Publication Publication Date Title
WO2022166344A1 (zh) 动作计数方法、装置、设备及存储介质
CN109344884B (zh) 媒体信息分类方法、训练图片分类模型的方法及装置
JP6741357B2 (ja) マルチ関連ラベルを生成する方法及びシステム
WO2019100724A1 (zh) 训练多标签分类模型的方法和装置
US9400918B2 (en) Compact face representation
CN112115352B (zh) 基于用户兴趣的会话推荐方法及系统
WO2022104202A1 (en) A temporal bottleneck attention architecture for video action recognition
CN114519469A (zh) 一种基于Transformer框架的多变量长序列时间序列预测模型的构建方法
CN112257855B (zh) 一种神经网络的训练方法及装置、电子设备及存储介质
CN110222592B (zh) 一种基于互补时序行为提案生成的时序行为检测网络模型的构建方法
CN110781818B (zh) 视频分类方法、模型训练方法、装置及设备
CN114663798B (zh) 一种基于强化学习的单步视频内容识别方法
CN113780584A (zh) 标签预测方法、设备、存储介质及程序产品
CN113822264A (zh) 一种文本识别方法、装置、计算机设备和存储介质
US20230229897A1 (en) Distances between distributions for the belonging-to-the-distribution measurement of the image
CN111310918B (zh) 一种数据处理方法、装置、计算机设备及存储介质
CN116612417A (zh) 利用视频时序信息的特殊场景车道线检测方法及装置
Chatterjee et al. A hierarchical variational neural uncertainty model for stochastic video prediction
CN111144462A (zh) 一种雷达信号的未知个体识别方法及装置
CN113221628A (zh) 基于人体骨架点云交互学习的视频暴力识别方法、系统及介质
US20230297823A1 (en) Method and system for training a neural network for improving adversarial robustness
CN115269998A (zh) 信息推荐方法、装置、电子设备及存储介质
CN115761576A (zh) 视频动作识别方法、装置及存储介质
CN113569867A (zh) 一种图像处理方法、装置、计算机设备及存储介质
CN113822291A (zh) 一种图像处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21924339

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.12.2023)